Skip to main content

algorithms

Algorithms for remote processing of data.

Federated algorithm plugins can also be imported from this package.

Module

Submodules

Classes

BaseAlgorithmFactory

class BaseAlgorithmFactory(**kwargs: Any):

Base algorithm factory from which all other algorithms must inherit.

Attributes

  • class_name: The name of the algorithm class.

Ancestors

  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

CSVReportAlgorithm

class CSVReportAlgorithm(    save_path: Optional[Union[str, os.PathLike]] = None,    original_cols: Optional[list[str]] = None,    filter: Optional[list[ColumnFilter]] = None,    **kwargs: Any,):

Algorithm for generating the CSV results reports.

Arguments

  • save_path: The folder path where the csv report should be saved. The CSV report will have the same name as the taskID.
  • original_cols: The tabular columns from the datasource to include in the report. If not specified it will include all tabular columns from the datasource.
  • filter: A list of ColumnFilter instances on which we will filter the data on. Defaults to None. If supplied, columns will be added to the output csv indicating the records that match the specified criteria. If more than one ColumnFilter is given, and additional column will be added to the output csv indicating the datapoints that match all given criteria (as well as the individual matches)

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.csv_report_algorithm._WorkerSide:

Worker-side of the algorithm.

CSVReportGeneratorOphthalmologyAlgorithm

class CSVReportGeneratorOphthalmologyAlgorithm(    save_path: Optional[Union[str, os.PathLike]] = None,    trial_name: Optional[str] = None,    rename_columns: Optional[Mapping[str, str]] = None,    original_cols: Optional[list[str]] = None,    filter: Optional[list[ColumnFilter]] = None,    match_patient_visit: Optional[MatchPatientVisit] = None,    matched_csv_path: Optional[Union[str, os.PathLike]] = None,    produce_matched_only: bool = True,    csv_extensions: Optional[list[str]] = None,    produce_trial_notes_csv: bool = False,    sorting_columns: Optional[dict[str, DFSortType]] = None,    **kwargs: Any,):

Algorithm for generating the CSV results reports.

Arguments

  • save_path: The folder path where the csv report should be saved.
  • trial_name: The name of the trial for the csv report. If provided, the CSV will be saved as "trial_name"-prescreening-patients-"date".csv. Defaults to None.
  • original_cols: The tabular columns from the datasource to include in the report. If not specified it will include all tabular columns from the datasource.
  • rename_columns: A dictionary of columns to rename. Defaults to None.
  • filter: A list of ColumnFilter instances on which we will filter the data on. Defaults to None. If supplied, columns will be added to the output csv indicating the records that match the specified criteria. If more than one ColumnFilter is given, and additional column will be added to the output csv indicating the datapoints that match all given criteria (as well as the individual matches)
  • match_patient_visit: Used for matching the same patient visit.
  • matched_csv_path: Path to save the matched patients CSV to, if requested. Defaults to save_path (i.e. overwrites the non-matched CSV) if produce_matched_only is True. Otherwise, will create a file based off of the save_path argument.
  • produce_matched_only: If True, only the matched CSV will be generated at the end of the run. If False, both the non-matched and matched CSV will be generated.
  • produce_trial_notes_csv: If True, a CSV file containing the trial notes will be generated at the end of the run. Defaults to False.
  • csv_extensions: List of named CSV extension functions that will be applied to the output CSV just before saving to file.
  • sorting_columns: A dictionary of columns to sort the output CSV by. The keys are column names the values are either 'asc' or 'desc'. Defaults to None.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.csv_report_generation_ophth_algorithm._WorkerSide:

Worker-side of the algorithm.

ETDRSAlgorithm

class ETDRSAlgorithm(    laterality: str,    slo_photo_location_prefixes: Optional[SLOSegmentationLocationPrefix] = None,    slo_image_metadata_columns: Optional[SLOImageMetadataColumns] = None,    oct_image_metadata_columns: Optional[OCTImageMetadataColumns] = None,    threshold: float = 0.7,    calculate_on_oct: bool = False,    slo_mm_width: float = 8.8,    slo_mm_height: float = 8.8,    **kwargs: Any,):

Algorithm for computing ETDRS subfields.

Arguments

  • laterality: The column name of the column that contains the laterality of the scans.
  • oct_image_metadata_columns: A list of column names for the OCT image. Should include the width and depth size in mm. Defaults to None.
  • slo_photo_location_prefixes: The list of column names for the locations of the OCT segmentation on the SLO. Should include the location and end of the first image on both x and y-axis as well as the start location of the last image on both x and y-axis. Defaults to None.
  • slo_image_metadata_columns: A list of column names for the SLO image. Should include the width and height in mm. Defaults to None.
  • threshold: The threshold for the segmentation. Defaults to None.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.etdrs_calculation_algorithm._WorkerSide:

Worker-side of the algorithm.

FederatedModelTraining

class FederatedModelTraining(    *,    model: _DistributedModelTypeOrReference,    modeller_checkpointing: bool = True,    checkpoint_filename: Optional[str] = None,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Algorithm for training a model remotely and returning its updated parameters.

This algorithm is designed to be compatible with the FederatedAveraging protocol.

Arguments

  • model: The model to train on remote data.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • checkpoint_filename: The filename for the last checkpoint. Defaults to the task id and the last iteration number, i.e., {taskid}-iteration-{iteration_number}.pt.
  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model: The model to train on remote data.
  • modeller_checkpointing: Whether to save the last checkpoint on the modeller side. Defaults to True.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

  • bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
  • BaseAlgorithmFactory
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.federated_training._ModellerSide:

Returns the modeller side of the FederatedModelTraining algorithm.

worker

def worker(    self, hub: BitfountHub, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.federated_training._WorkerSide:

Returns the worker side of the FederatedModelTraining algorithm.

Arguments

  • hub: BitfountHub object to use for communication with the hub.
  • ****kwargs**: Additional keyword arguments to pass to the worker side.

Returns Worker side of the FederatedModelTraining algorithm.

FoveaCoordinatesAlgorithm

class FoveaCoordinatesAlgorithm(    bscan_width_col: str = 'size_width',    location_prefixes: Optional[SLOSegmentationLocationPrefix] = None,    **kwargs: Any,):

Computes the Fovea coordinates from the Fovea detection model predictions.

Arguments

  • bscan_width_col: The column name that contains the bscan width. Defaults to "size_width".
  • location_prefixes: A dataclass that contains the prefixes for the start and end of the images along both X and Y axis.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.fovea_coordinates_algorithm._WorkerSide:

Worker-side of the algorithm.

GATrialCalculationAlgorithmJade

class GATrialCalculationAlgorithmJade(    ga_area_include_segmentations: Optional[list[str]] = None,    ga_area_exclude_segmentations: Optional[list[str]] = None,    **kwargs: Any,):

Algorithm for calculating the GA Area and associated metrics.

Arguments

  • ga_area_include_segmentations: List of segmentation labels to be used for calculating the GA area. The logical AND of the masks for these labels will be used to calculate the GA area. If not provided, the default inclusion labels for the GA area will be used.
  • ga_area_exclude_segmentations: List of segmentation labels to be excluded from calculating the GA area. If any of these segmentations are present in the axial segmentation masks, that axis will be excluded from the GA area calculation. If not provided, the default exclusion labels for the GA area will be used.

Raises

  • ValueError: If an invalid segmentation label is provided.
  • ValueError: If a segmentation label is provided in both the include and exclude lists.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.ga_trial_calculation_algorithm_jade._WorkerSide:

Worker-side of the algorithm.

GATrialPDFGeneratorAlgorithmAmethyst

class GATrialPDFGeneratorAlgorithmAmethyst(    *,    report_metadata: ReportMetadata,    filter: Optional[list[ColumnFilter]] = None,    save_path: Optional[Union[str, os.PathLike]] = None,    filename_prefix: Optional[str] = None,    pdf_filename_columns: Optional[list[str]] = None,    trial_name: Optional[str] = None,    **kwargs: Any,):

Algorithm for generating the PDF results report for the GA Algorithm.

Arguments

  • report_metadata: A ReportMetadata for the pdf report metadata fields.
  • filter: A list of ColumnFilter objects to filter the data by.
  • save_path: The folder path where the pdf report should be saved.
  • filename_prefix: The prefix for the pdf filename. Defaults to None.
  • pdf_filename_columns: The columns from the datasource that should be used for the pdf filename. If not provided, the filename will be saved as "Patient_index_i.pdf" where i is the index in the filtered datasource. Defaults to None.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.ga_trial_pdf_algorithm_amethyst._WorkerSide:

Worker-side of the algorithm.

GATrialPDFGeneratorAlgorithmJade

class GATrialPDFGeneratorAlgorithmJade(    *,    report_metadata: ReportMetadata,    filter: Optional[list[ColumnFilter]] = None,    save_path: Optional[Union[str, os.PathLike]] = None,    filename_prefix: Optional[str] = None,    pdf_filename_columns: Optional[list[str]] = None,    total_ga_area_lower_bound: float = 2.5,    total_ga_area_upper_bound: float = 17.5,    trial_name: Optional[str] = None,    **kwargs: Any,):

Algorithm for generating the PDF results report for the GA Algorithm.

Arguments

  • report_metadata: A ReportMetadata for the pdf report metadata fields.
  • filter: A list of ColumnFilter objects to filter the data by.
  • save_path: The folder path where the pdf report should be saved.
  • filename_prefix: The prefix for the pdf filename. Defaults to None.
  • pdf_filename_columns: The columns from the datasource that should be used for the pdf filename. If not provided, the filename will be saved as "Patient_index_i.pdf" where i is the index in the filtered datasource. Defaults to None.
  • total_ga_area_lower_bound: The lower bound for the total GA area. Defaults to 2.5.
  • total_ga_area_upper_bound: The upper bound for the total GA area. Defaults to 17.5.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.ga_trial_pdf_algorithm_jade._WorkerSide:

Worker-side of the algorithm.

HuggingFaceImageClassificationInference

class HuggingFaceImageClassificationInference(    model_id: str,    image_column_name: str,    seed: int = 42,    apply_softmax_to_predictions: bool = True,    batch_size: int = 1,    top_k: int = 5,):

Inference for pre-trained Hugging Face image classification models.

Arguments

  • batch_size: The batch size for inference. Defaults to 1.
  • image_column_name: The image column on which the inference should be done.
  • model_id: The model id to use for image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • top_k: The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. Defaults to 5.

Attributes

  • batch_size: The batch size for inference. Defaults to 1.
  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • image_column_name: The image column on which the inference should be done.
  • model_id: The model id to use for image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • top_k: The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. Defaults to 5.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the HuggingFaceImageClassificationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_classification._WorkerSide:

Returns the worker side of the HuggingFaceImageClassification algorithm.

HuggingFaceImageSegmentationInference

class HuggingFaceImageSegmentationInference(    model_id: str,    image_column_name: str,    alpha: float = 0.3,    batch_size: int = 1,    dataframe_output: bool = False,    mask_threshold: float = 0.5,    overlap_mask_area_threshold: float = 0.5,    save_path: Union[str, os.PathLike] = PosixPath('.'),    seed: int = 42,    subtask: Optional[_Subtask] = None,    threshold: float = 0.9,):

Inference for pre-trained Hugging Face image segmentation models.

Perform segmentation (detect masks & classes) in the image(s) passed as inputs.

Arguments

  • alpha: the alpha for the mask overlay.
  • batch_size: The batch size for inference. Defaults to 1.
  • dataframe_output: Whether to output the prediction results in a dataframe format. Defaults to False.
  • image_column_name: The image column on which the inference should be done.
  • mask_threshold: Threshold to use when turning the predicted masks into binary values. Defaults to 0.5.
  • model_id: The model id to use for image segmentation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • overlap_mask_area_threshold: Mask overlap threshold to eliminate small, disconnected segments. Defaults to 0.5.
  • save_path: The folder path where the images with masks drawn on them should be saved. Defaults to the current working directory.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • subtask: Segmentation task to be performed, choose [semantic, instance and panoptic] depending on model capabilities. If not set, the pipeline will attempt to resolve in the following order: panoptic, instance, semantic.
  • threshold: Probability threshold to filter out predicted masks. Defaults to 0.9.

Attributes

  • alpha: the alpha for the mask overlay.
  • batch_size: The batch size for inference. Defaults to 1.
  • class_name: The name of the algorithm class.
  • dataframe_output: Whether to output the prediction results in a dataframe format. Defaults to False.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • image_column_name: The image column on which the inference should be done.
  • mask_threshold: Threshold to use when turning the predicted masks into binary values. Defaults to 0.5.
  • model_id: The model id to use for image segmentation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • overlap_mask_area_threshold: Mask overlap threshold to eliminate small, disconnected segments. Defaults to 0.5.
  • save_path: The folder path where the images with masks drawn on them should be saved. Defaults to the current working directory.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • subtask: Segmentation task to be performed, choose [semantic, instance and panoptic] depending on model capabilities. If not set, the pipeline will attempt to resolve in the following order: panoptic, instance, semantic.
  • threshold: Probability threshold to filter out predicted masks. Defaults to 0.9.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the HuggingFaceImageSegmentationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_segmentation._WorkerSide:

Returns the worker side of the HuggingFaceImageSegmentationInference algorithm.

HuggingFacePerplexityEvaluation

class HuggingFacePerplexityEvaluation(    model_id: str, text_column_name: str, stride: int = 512, seed: int = 42,):

Hugging Face Perplexity Algorithm.

Arguments

  • model_id: The model id to use for evaluating its perplexity. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.
  • seed: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.
  • stride: Sets the stride of the algorithm. Defaults to 512.
  • text_column_name: The single column to query against. Should contain text for generation.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model_id: The model id to use for evaluation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • seed: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.
  • stride: Sets the stride of the algorithm. Defaults to 512.
  • text_column_name: The single column to query against. Should contain text for generation.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the HuggingFacePerplexityEvaluation algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_perplexity._WorkerSide:

Returns the worker side of the HuggingFacePerplexityEvaluation algorithm.

HuggingFaceTextClassificationInference

class HuggingFaceTextClassificationInference(    model_id: str,    target_column_name: str,    batch_size: int = 1,    function_to_apply: Optional[_FunctionToApply] = None,    seed: int = 42,    top_k: int = 1,):

Inference for pre-trained Hugging Face text classification models.

Arguments

  • batch_size: The batch size for inference. Defaults to 1.
  • function_to_apply: The function to apply to the model outputs in order to retrieve the scores. Accepts four different values: if this argument is not specified, then it will apply the following functions according to the number of labels - if the model has a single label, will apply the sigmoid function on the output; if the model has several labels, will apply the softmax function on the output. Possible values are:
  • "sigmoid": Applies the sigmoid function on the output.
  • "softmax": Applies the softmax function on the output.
  • "none": Does not apply any function on the output. Default to None.
  • model_id: The model id to use for text classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • target_column_name: The target column on which the inference should be done.
  • top_k: The number of top labels that will be returned by the pipeline. Defaults to 1.

Attributes

  • batch_size: The batch size for inference. Defaults to 1.
  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • function_to_apply: The function to apply to the model outputs in order to retrieve the scores. Accepts four different values: if this argument is not specified, then it will apply the following functions according to the number of labels - if the model has a single label, will apply the sigmoid function on the output; if the model has several labels, will apply the softmax function on the output. Possible values are:
  • "sigmoid": Applies the sigmoid function on the output.
  • "softmax": Applies the softmax function on the output.
  • "none": Does not apply any function on the output. Default to None.
  • model_id: The model id to use for text classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • target_column_name: The target column on which the inference should be done.
  • top_k: The number of top labels that will be returned by the pipeline. Defaults to 1.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the HuggingFaceTextClassificationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_classification._WorkerSide:

Returns the worker side of the HuggingFaceTextClassificationInference algorithm.

HuggingFaceTextGenerationInference

class HuggingFaceTextGenerationInference(    model_id: str,    text_column_name: Optional[str] = None,    prompt_format: Optional[str] = None,    max_length: int = 50,    num_return_sequences: int = 1,    seed: int = 42,    min_new_tokens: int = 1,    repetition_penalty: float = 1.0,    num_beams: int = 1,    early_stopping: bool = True,    pad_token_id: Optional[int] = None,    eos_token_id: Optional[int] = None,    device: Optional[str] = None,    torch_dtype: "Literal['bfloat16', 'float16', 'float32', 'float64']" = 'float32',):

Hugging Face Text Generation Algorithm.

Arguments

  • device: The device to use for the model. Defaults to None. On the worker side, will be set to the environment variable BITFOUNT_DEFAULT_TORCH_DEVICE if specified, otherwise "cpu".
  • early_stopping: Whether to stop the generation as soon as there are num_beams complete candidates. Defaults to True.
  • eos_token_id: The id of the token to use as the last token for each sequence. If None (default), it will default to the eos_token_id of the tokenizer.
  • max_length: The maximum length of the sequence to be generated. Defaults to 50.
  • min_new_tokens: The minimum number of new tokens to add to the prompt. Defaults to 1.
  • model_id: The model id to use for text generation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.
  • num_beams: Number of beams for beam search. 1 means no beam search. Defaults to 1.
  • num_return_sequences: The number of sequence candidates to return for each input. Defaults to 1.
  • pad_token_id: The id of the token to use as padding token. If None (default), it will default to the pad_token_id of the tokenizer.
  • prompt_format: The format of the prompt as a string with a single {context} placeholder which is where the pod's input will be inserted. For example, You are a Language Model. This is the context: {context}. Please summarize it.. This only applies if text_column_name is provided, it is not used for dynamic prompting. Defaults to None.
  • repetition_penalty: The parameter for repetition penalty. 1.0 means no penalty. Defaults to 1.0.
  • seed: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.
  • text_column_name: The single column to query against. Should contain text for generation. If not provided, the algorithm must be used with a protocol which dynamically provides the text to be used for prompting.
  • torch_dtype: The torch dtype to use for the model. Defaults to "float32".

Attributes

  • class_name: The name of the algorithm class.
  • device: The device to use for the model. Defaults to None. On the worker side, will be set to the environment variable BITFOUNT_DEFAULT_TORCH_DEVICE if specified, otherwise "cpu".
  • early_stopping: Whether to stop the generation as soon as there are num_beams complete candidates. Defaults to True.
  • eos_token_id: The id of the token to use as the last token for each sequence. If None (default), it will default to the eos_token_id of the tokenizer.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • max_length: The maximum length of the sequence to be generated. Defaults to 50.
  • min_new_tokens: The minimum number of new tokens to add to the prompt. Defaults to 1.
  • model_id: The model id to use for text generation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • num_beams: Number of beams for beam search. 1 means no beam search. Defaults to 1.
  • num_return_sequences: The number of sequence candidates to return for each input. Defaults to 1.
  • pad_token_id: The id of the token to use as padding token. If None (default), it will default to the pad_token_id of the tokenizer.
  • prompt_format: The format of the prompt as a string with a single {context} placeholder which is where the pod's input will be inserted. For example, You are a Language Model. This is the context: {context}. Please summarize it.. This only applies if text_column_name is provided, it is not used for dynamic prompting. Defaults to None.
  • repetition_penalty: The parameter for repetition penalty. 1.0 means no penalty. Defaults to 1.0.
  • seed: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.
  • text_column_name: The single column to query against. Should contain text for generation. If not provided, the algorithm must be used with a protocol which dynamically provides the text to be used for prompting.
  • torch_dtype: The torch dtype to use for the model. Defaults to "float32".

Raises

  • ValueError: If prompt_format is provided without text_column_name.
  • ValueError: If prompt_format does not contain a single {context} placeholder.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the HuggingFaceTextGenerationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_generation._WorkerSide:

Returns the worker side of the HuggingFaceTextGenerationInference algorithm.

ModelEvaluation

class ModelEvaluation(    *,    model: _DistributedModelTypeOrReference,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Algorithm for evaluating a model and returning metrics.

note

The metrics cannot currently be specified by the user.

Arguments

  • model: The model to evaluate on remote data.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model: The model to evaluate on remote data.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

  • bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
  • BaseAlgorithmFactory
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.evaluate._ModellerSide:

Returns the modeller side of the ModelEvaluation algorithm.

worker

def worker(    self, hub: BitfountHub, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.evaluate._WorkerSide:

Returns the worker side of the ModelEvaluation algorithm.

Arguments

  • hub: BitfountHub object to use for communication with the hub.
  • ****kwargs**: Additional keyword arguments.

Returns The worker side of the ModelEvaluation algorithm.

ModelInference

class ModelInference(    *,    model: _DistributedModelTypeOrReference,    class_outputs: Optional[list[str]] = None,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Algorithm for running inference on a model and returning the predictions.

danger

This algorithm could potentially return the data unfiltered so should only be used when the other party is trusted.

Arguments

  • class_outputs: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.
  • model: The model to infer on remote data.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • class_outputs: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model: The model to infer on remote data.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

  • bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
  • BaseAlgorithmFactory
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.inference._ModellerSide:

Returns the modeller side of the ModelInference algorithm.

worker

def worker(    self, hub: BitfountHub, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.inference._WorkerSide:

Returns the worker side of the ModelInference algorithm.

Arguments

  • hub: BitfountHub object to use for communication with the hub.
  • ****kwargs**: Additional keyword arguments to pass to the worker side.

Returns Worker side of the ModelInference algorithm.

ModelTrainingAndEvaluation

class ModelTrainingAndEvaluation(    *,    model: _DistributedModelTypeOrReference,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Algorithm for training a model, evaluating it and returning metrics.

note

The metrics cannot currently be specified by the user.

Arguments

  • model: The model to train and evaluate on remote data.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model: The model to train and evaluate on remote data.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

  • bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
  • BaseAlgorithmFactory
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.train_and_evaluate._ModellerSide:

Returns the modeller side of the ModelTrainingAndEvaluation algorithm.

worker

def worker(    self, hub: BitfountHub, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.train_and_evaluate._WorkerSide:

Returns the worker side of the ModelTrainingAndEvaluation algorithm.

Arguments

  • hub: BitfountHub object to use for communication with the hub.
  • ****kwargs**: Additional keyword arguments to pass to the worker side.

Returns Worker side of the ModelTrainingAndEvaluation algorithm.

PrivateSqlQuery

class PrivateSqlQuery(    *,    query: str,    epsilon: float,    delta: float,    column_ranges: ColumnRangesType,    table: Optional[str] = None,    db_schema: Optional[str] = None,):

Simple algorithm for running a SQL query on a table, with privacy.

note

The values provided for the privacy budget (i.e. epsilon and delta) will be applied individually to all columns included in the SQL query provided. If the total values of the epsilon and delta exceed the maximum allowed by the pod, the provided values will be reduced to the maximum values required to remain within the allowed privacy budget.

Arguments

  • column_ranges: A dictionary of column names and their ranges.
  • db_schema: The name of the schema for a database connection. If not provided, it will be set to the default schema name for the database.
  • delta: The target delta to use for the privacy budget.
  • epsilon: The maximum epsilon to use for the privacy budget.
  • query: The SQL query to execute.
  • table: The target table name. For single table pod datasources, this will default to the pod name.

Attributes

  • class_name: The name of the algorithm class.
  • column_ranges: A dictionary of column names and their ranges.
  • db_schema: The name of the schema for a database connection. If not provided, it will be set to the default schema name for the database.
  • delta: The target delta to use for the privacy budget.
  • epsilon: The maximum epsilon to use for the privacy budget.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • query: The SQL query to execute.
  • table: The target table name. For single table pod datasources, this will default to the pod name.

Raises

  • DatabaseSchemaNotFoundError: If a non-existent db_schema name is provided.
  • PrivateSqlError: If there is an error executing the private SQL query (e.g. DP misconfiguration or bad query specified).
  • ValueError: If a pod identifier is not supplied, or if a join is attempted.

Ancestors

  • BaseAlgorithmFactory
  • bitfount.federated.mixins._ModellessAlgorithmMixIn
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

execute

def execute(    self,    pod_identifiers: list[str],    username: Optional[str] = None,    bitfounthub: Optional[BitfountHub] = None,    ms_config: Optional[MessageServiceConfig] = None,    message_service: Optional[_MessageService] = None,    pod_public_key_paths: Optional[Mapping[str, Path]] = None,    identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE,    private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None,    idp_url: Optional[str] = None,    require_all_pods: bool = False,    aggregator: Optional[_BaseAggregatorFactory] = None,    project_id: Optional[str] = None,)> list[pd.DataFrame]:

Execute ResultsOnly compatible algorithm.

Syntactic sugar to allow the modeller to call .execute(...) on ResultsOnly compatible algorithms.

modeller

def modeller(    self, **kwargs: Any,)> ResultsOnlyModellerAlgorithm:

Returns the modeller side of the PrivateSqlQuery algorithm.

Arguments

  • ****kwargs**: Additional keyword arguments to pass to the modeller side.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.private_sql_query._WorkerSide:

Returns the worker side of the PrivateSqlQuery algorithm.

Arguments

  • **kwargs: Additional keyword arguments to pass to the worker side. hub must be one of these keyword arguments which provides aBitfountHub instance.

SqlQuery

class SqlQuery(*, query: str, table: Optional[str] = None):

Simple algorithm for running a SQL query on a table.

info

The default table for single-table datasources is the pod identifier without the username, in between backticks(``). Please ensure your SQL query operates on that table. The table name should be put inside backticks(``) in the query statement, to make sure it is correctly parsed e.g. SELECT MAX(G) AS MAX_OF_G FROM `df` . This is the standard quoting mechanism used by MySQL (and also included in SQLite).

info

If you are using a multi-table datasource, ensure that your SQL query syntax matches the syntax required by the Pod database backend.

Arguments

  • query: The SQL query to execute.
  • table: The target table name. For single table pod datasources, this will default to the pod name.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • query: The SQL query to execute.
  • table: The target table name. For single table pod datasources, this will default to the pod name.

Ancestors

  • BaseAlgorithmFactory
  • bitfount.federated.mixins._ModellessAlgorithmMixIn
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn
  • bitfount.federated.types._DataLessAlgorithm

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

execute

def execute(    self,    pod_identifiers: list[str],    username: Optional[str] = None,    bitfounthub: Optional[BitfountHub] = None,    ms_config: Optional[MessageServiceConfig] = None,    message_service: Optional[_MessageService] = None,    pod_public_key_paths: Optional[Mapping[str, Path]] = None,    identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE,    private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None,    idp_url: Optional[str] = None,    require_all_pods: bool = False,    aggregator: Optional[_BaseAggregatorFactory] = None,    project_id: Optional[str] = None,)> list[pd.DataFrame]:

Execute ResultsOnly compatible algorithm.

Syntactic sugar to allow the modeller to call .execute(...) on ResultsOnly compatible algorithms.

modeller

def modeller(    self, **kwargs: Any,)> ResultsOnlyModellerAlgorithm:

Returns the modeller side of the SqlQuery algorithm.

worker

def worker(self, **kwargs: Any)> bitfount.federated.algorithms.sql_query._WorkerSide:

Returns the worker side of the SqlQuery algorithm.

TIMMFineTuning

class TIMMFineTuning(    model_id: str,    schema: Optional[BitfountSchema] = None,    datastructure: Optional[DataStructure] = None,    image_column_name: Optional[str] = None,    target_column_name: Optional[str] = None,    labels: Optional[list[str]] = None,    args: Optional[TIMMTrainingConfig] = None,    batch_transformations: Optional[Union[list[Union[str, _JSONDict]], dict[_TimmBatchTransformationStep, list[Union[str, _JSONDict]]]]] = None,    return_weights: bool = False,    save_path: Optional[Union[str, os.PathLike]] = None,):

HuggingFace TIMM Fine Tuning Algorithm.

Arguments

  • ****kwargs**: Additional keyword arguments passed to the Worker side.
  • args: The training configuration.
  • batch_transformations: The batch transformations to be applied to the batches. Can be a list of strings or a list of dictionaries, which will be applied to both training and validation, or a dictionary with keys "train" and "validation" mapped to a list of strings or a list of dictionaries, specifying the batch transformations to be applied at each individual step. They are only applied if datastructure is not passed. Defaults to apply DEFAULT_IMAGE_TRANSFORMATIONS to both training and validation.
  • datastructure: The datastructure relating to the dataset to be trained on. Defaults to None.
  • image_column_name: The column name of the image column used in training. Defaults to None.
  • labels: The labels of the target column. Defaults to None.
  • model_id: The Hugging Face model ID.
  • return_weights: Whether to return the weights of the model.
  • save_path: The path to save the model to.
  • schema: The schema of the dataset to be trained on. Defaults to None.
  • target_column_name: The column name of the target column. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the TIMMFineTuning algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.timm_fine_tuning._WorkerSide:

Returns the worker side of the TIMMFineTuning algorithm.

TIMMInference

class TIMMInference(    model_id: str,    image_column_name: str,    num_classes: Optional[int] = None,    batch_transformations: Optional[list[dict[str, _JSONDict]]] = None,    batch_size: int = 1,    checkpoint_path: Optional[Union[os.PathLike, str]] = None,    class_outputs: Optional[list[str]] = None,):

HuggingFace TIMM Inference Algorithm.

Arguments

  • checkpoint_path: The path to a checkpoint file local to the Pod. Defaults to None.
  • class_outputs: A list of explict class outputs to use as labels. Defaults to None.
  • image_column_name: The column name of the image paths.
  • model_id: The model id to use from the Hugging Face Hub.
  • num_classes: The number of classes in the model. Defaults to None.

Attributes

  • checkpoint_path: The path to a checkpoint file local to the Pod. Defaults to None.
  • class_name: The name of the algorithm class.
  • class_outputs: A list of explict class outputs to use as labels. Defaults to None.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • image_column_name: The column name of the image paths.
  • model_id: The model id to use from the Hugging Face Hub.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • num_classes: The number of classes in the model. Defaults to None.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the TIMMInference algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.timm_inference._WorkerSide:

Returns the worker side of the TIMMInference algorithm.

TrialInclusionCriteriaMatchAlgorithmAmethyst

class TrialInclusionCriteriaMatchAlgorithmAmethyst(    cnv_threshold: float = 0.5,    largest_ga_lesion_lower_bound: float = 1.26,    total_ga_area_lower_bound: float = 2.5,    total_ga_area_upper_bound: float = 17.5,    **kwargs: Any,):

Algorithm for establishing number of patients that match clinical criteria.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.ga_trial_inclusion_criteria_match_algorithm_amethyst._WorkerSide:

Worker-side of the algorithm.

TrialInclusionCriteriaMatchAlgorithmJade

class TrialInclusionCriteriaMatchAlgorithmJade(    renamed_columns: Optional[Mapping[str, str]] = None, **kwargs: Any,):

Algorithm for establishing number of patients that match clinical criteria.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.ga_trial_inclusion_criteria_match_algorithm_jade._WorkerSide:

Worker-side of the algorithm.

_BaseModelAlgorithmFactory

class _BaseModelAlgorithmFactory(    *,    model: _DistributedModelTypeOrReference,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Base factory for algorithms involving an underlying model.

Arguments

  • model: The model for the federated algorithm.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • model: The model for the federated algorithm.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

Variables

_SimpleCSVAlgorithm

class _SimpleCSVAlgorithm(    save_path: Optional[Union[str, os.PathLike]] = None, **kwargs: Any,):

Algorithm that allows simple outputting of dataframes to CSV.

Allows the data to be saved to CSV from either the worker or the modeller (or both).

Create a new _SimpleCSVAlgorithm.

Arguments

  • save_path: Optional. The path to save the CSV to from the worker.
  • ****kwargs**: Passed to parent.

Ancestors

Variables

Methods


modeller

def modeller(    self, **kwargs: Any,)> NoResultsModellerAlgorithm:

Modeller-side of the algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.ophthalmology.simple_csv_algorithm._WorkerSide:

Worker-side of the algorithm.