Running a Pod with a segmentation dataset
By the end of this notebook, you will have learned how to run a Pod that uses a segmentation dataset and how to start up a Pod using a DataSourceContainer
with a DataFrameSource
.
Prerequisites
!pip install bitfount
Setting up
Let's import the relevant pieces from the API Reference:
import loggingfrom pathlib import Pathimport sysfrom PIL import Imageimport nest_asyncioimport numpy as npimport pandas as pdfrom bitfount import CSVSource, DatasourceContainerConfig, Pod, setup_loggersfrom bitfount.runners.config_schemas import PodConfig, PodDataConfig, PodDetailsConfigfrom bitfount.utils import ExampleSegmentationDataif ".." not in sys.path: sys.path.insert(0, "..")nest_asyncio.apply() # Needed because Jupyter also has an asyncio loop
Let's import the loggers, which allow you to monitor progress of your executed commands and raise errors in the event something goes wrong.
loggers = setup_loggers([logging.getLogger("bitfount")])
Setting up the Pod
We now specify the config for the Pod to run. For this tutorial we will generate synthetic images and masks and save them to the local system in a temporary directory.
# Set the directory where we save the imagesseg_dir = "segmentation"# Check if the folder exists and create it if notpath = Path(seg_dir + "/")path.mkdir(parents=True, exist_ok=True)# Set the number of images to generatecount = 25# Set the height and width of the imagesheight = 100width = 100# Get the example segmentation datasetsegmentation_data = ExampleSegmentationData()# Generate the imagesinput_images, target_masks = segmentation_data.generate_data(height, width, count=count)# Change channel-order and make 3 channelsinput_images_rgb = [x.astype(np.uint8) for x in input_images]# Map each channel (i.e. class) to each colortarget_masks_rgb = [ segmentation_data.masks_to_colorimg(x.astype(np.uint8)) for x in target_masks]img_names_list = []masks_names_list = []# Save imagesfor i in range(count): im2 = Image.fromarray((input_images_rgb[i]).astype(np.uint8)) im2.save(f"{seg_dir}/img_{i}.png") img_names_list.append(f"img_{i}.png")# Save masksfor i in range(count): im2 = Image.fromarray((target_masks_rgb[i]).astype(np.uint8)) im2.save(f"{seg_dir}/masks_{i}.png") masks_names_list.append(f"masks_{i}.png")# Create dataframe with image and masks locationssegmentation_df = pd.DataFrame( { "img": [str(seg_dir) + "/" + img_name for img_name in img_names_list], "masks": [str(seg_dir) + "/" + mask_name for mask_name in masks_names_list], }, columns=["img", "masks"],)csv_path = "segmentation_data.csv"segmentation_df.to_csv(csv_path, index=False)
Segmentation datasets are slightly different from image datasets (for which you saw an example in the "Training on Images" tutorial), as they allow both training and predictions on images. For segmentation datasets, the DataSource will need to have references to the images you want to train with as well as the images you use as the target for the machine learning task you are performing. Therefore, we must inform the Pod that the contents of this column hold references to both images for training and target images for the task. We achieve this by specifying the columns as "image"
through the force_stypes
parameter in the PodDataConfig
.
segmentation_data_config = PodDataConfig( force_stypes={"segmentation-data-demo-dataset": {"image": ["img", "masks"]}},)
# Configure a pod using the generated, synthetic images and masks.datasource = CSVSource(csv_path)datasource_details = PodDetailsConfig( display_name="Segmentation Demo Pod", description="This Pod contains generated, synthetic data for a segmentation task.",)pod = Pod( name="segmentation-data-demo", datasources=[ DatasourceContainerConfig( name="segmentation-data-demo-dataset", datasource=datasource, datasource_details=PodDetailsConfig( display_name="Segmentation Demo Pod", description="This Pod contains generated, synthetic data for a segmentation task.", ), data_config=segmentation_data_config, ) ],)
Running the Pod
That's all of the set up. Let's run the Pod. You'll notice that the notebook cell doesn't complete. That's because the Pod is set to run until it is interrupted!
pod.start()
You should now be able to see your Pod as registered in your Datasets page on the Bitfount Hub. To use the Pod, open up "Training a Custom Segmentation Model" in a separate tab, and we'll train a segmentation model on this Pod.
Contact our support team at support@bitfount.com if you have any questions.