Task Components
As we've seen, a Bitfount task is the brain of a project. It specifies what will run on any dataset linked to the project, in what order and over what view of the data. Tasks are written in YAML format, and at a high level, are made up of 3 key components:
- Protocol: orchestrates the run and lifecycle
- Algorithm(s): the units of work to execute (can be a list)
- Data structure: how to select, assign and transform input data
A minimal skeleton might look something like this:
task:
protocol:
name: bitfount.ResultsOnly
arguments: { ... }
algorithm:
- name: bitfount.ModelInference
arguments: { ... }
data_structure:
select:
include:
- image_path
Protocols
- What they are: the task's entry point that orchestrates the algorithms and handles communication between different parties within the task. A given protocol will only be compatible with a certain set of algorithms.
- How to specify: each entry takes a
nameandarguments. Use the prefixbitfount.followed by the protocol name. A full list of protocols can be found here. Theargumentsmay be optional and are used to configure the protocol. Search for the protocol in the API documentation to see its available arguments. - Examples:
bitfount.InferenceAndCSVReport: runs model inference and writes a CSV report from the results
task:
protocol:
name: bitfount.InferenceAndCSVReport
arguments: { ... }
Algorithms
- What they are: the concrete steps executed by the protocol. You can supply a single algorithm or a list; lists run each algorithm in order. Configuring how the output of one algorithm can be fed into the next algorithm is baked into the protocol in which they are used. Therefore, a given algorithm will only be compatible with a certain set of protocols.
- How to specify: each entry takes a
nameand optionalarguments. Use the prefixbitfount.followed by the algorithm name. A full list of algorithms can be found here. Theargumentsmay be optional and are used to configure the algorithm. Search for the algorithm in the API documentation to see its available arguments. Algorithms that require a model to be passed in accept a separatemodelblock, see Referencing a model for more information. - Common patterns:
- Model inference (e.g.,
bitfount.ModelInference,bitfount.HuggingFaceImageClassificationInference) - Post-processing (e.g., calculations, matching)
- Reporting (e.g.,
bitfount.CSVReportAlgorithm)
- Model inference (e.g.,
task:
algorithm:
- name: bitfount.ModelInference
arguments: { ... }
model: { ... } # see "Referencing a model"
- name: bitfount.CSVReportAlgorithm
arguments: { ... }
Data Structures
Defines what the data should look like before it is passed to the algorithms in the task.
tip
More information about the data structure arguments can be found here.
note
The data structure is currently only used to define the input data for tasks that use a model.
- table_config: optional configuration to select a specific table from the datasource if the datasource has multiple tables.
- select: choose columns to include/exclude from the data;
include_prefixcan be helpful for datasets that have multiple image columns. - assign: map column names to semantic roles (e.g.,
image_prefix,target). - transform: define dataset/batch/image transforms to apply to the data (e.g., Albumentations pipelines, grayscale handling). Important for tasks that use a model. More information about the transform arguments can be found here.
- data_split: optional configuration for defining how to split data into train/validation/test sets.
- compatible_datasources: list of dataset types that are compatible with this data structure configuration.
- schema_requirements: specify dataset schema requirements level (
"empty","partial", or"full"), or a dictionary mapping requirement levels to specific dataset types. Defaults to"partial".
task:
data_structure:
compatible_datasources:
- DICOMOphthalmologySource
- HeidelbergSource
schema_requirements: partial
data_split:
args:
shuffle: false
test_percentage: 100
validation_percentage: 0
data_splitter: percentage
assign:
image_prefix: Pixel Data
select:
include:
- Columns
- Rows
include_prefix: Pixel Data
transform:
image:
- albumentations:
step: test
output: true
transformations:
- ToTensorV2