Skip to main content

ehr_ner_protocol

Protocol for combining EHR patient query with NER inference and CSV reporting.

This protocol:

  1. Queries EHR for patient conditions and procedures
  2. Extracts condition/procedure text as separate entries (one per condition/procedure)
  3. Runs NER inference on each text entry separately
  4. Aggregates entities per patient and generates a CSV report with NER results linked to patient IDs

Classes

EHRNERProtocol

class EHRNERProtocol(*, algorithm: Sequence[BaseCompatibleAlgoFactory], **kwargs: Any):

Protocol for running EHR patient query, NER inference, and CSV reporting.

This protocol:

  1. Queries EHR for patient conditions and procedures
  2. Extracts text from medical codes as separate entries (one per condition/procedure)
  3. Runs NER inference on each text entry separately
  4. Aggregates entities per patient and generates a CSV report with NER results linked to patient IDs

Arguments

  • algorithm: A sequence of three algorithms: 1. EHRPatientQueryAlgorithm 2. HuggingFaceNERInference 3. CSVReportAlgorithm

Variables

Methods


dump

def dump(self)> SerializedProtocol:

Inherited from:

BaseProtocolFactory.dump :

Returns the JSON-serializable representation of the protocol.

modeller

def modeller(    self, *, mailbox: _ModellerMailbox, context: ProtocolContext, **kwargs: Any,)> bitfount.federated.protocols.ehr.ehr_ner_protocol._ModellerSide:

Returns the Modeller side of the protocol.

run

def run(    self,    pod_identifiers: Collection[str],    session: Optional[BitfountSession] = None,    username: Optional[str] = None,    hub: Optional[BitfountHub] = None,    ms_config: Optional[MessageServiceConfig] = None,    message_service: Optional[_MessageService] = None,    pod_public_key_paths: Optional[Mapping[str, Path]] = None,    identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE,    private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None,    idp_url: Optional[str] = None,    require_all_pods: bool = False,    run_on_new_data_only: bool = False,    project_id: Optional[str] = None,    batched_execution: Optional[bool] = None,    test_run: bool = False,    force_rerun_failed_files: bool = True,)> Optional[Any]:

Inherited from:

BaseProtocolFactory.run :

Sets up a local Modeller instance and runs the protocol.

Arguments

  • pod_identifiers: The BitfountHub pod identifiers to run against.
  • session: Optional. Session to use for authenticated requests. Created if needed.
  • username: Username to run as. Defaults to logged in user.
  • hub: BitfountHub instance. Default: hub.bitfount.com.
  • ms_config: Message service config. Default: messaging.bitfount.com.
  • message_service: Message service instance, created from ms_config if not provided. Defaults to "messaging.bitfount.com".
  • pod_public_key_paths: Public keys of pods to be checked against.
  • identity_verification_method: The identity verification method to use.
  • private_key_or_file: Private key (to be removed).
  • idp_url: The IDP URL.
  • require_all_pods: If true raise PodResponseError if at least one pod identifier specified rejects or fails to respond to a task request.
  • run_on_new_data_only: Whether to run the task on new datapoints only. Defaults to False.
  • project_id: The project ID to run the task under.
  • batched_execution: Whether to run the task in batched mode. Defaults to False.
  • test_run: If True, runs the task in test mode, on a limited number of datapoints. Defaults to False.
  • force_rerun_failed_files: If True, forces a rerun on files that the task previously failed on. If False, the task will skip files that have previously failed. Note: This option can only be enabled if both enable_batch_resilience and individual_file_retry_enabled are True. Defaults to True.

Returns Results of the protocol.

Raises

  • PodResponseError: If require_all_pods is true and at least one pod identifier specified rejects or fails to respond to a task request.
  • ValueError: If attempting to train on multiple pods, and the DataStructure table name is given as a string.

worker

def worker(    self,    *,    mailbox: _WorkerMailbox,    hub: BitfountHub,    context: ProtocolContext,    **kwargs: Any,)> _WorkerSide:

Returns worker side of the EHR NER protocol.

Arguments

  • mailbox: Worker mailbox instance to allow communication to the modeller.
  • hub: BitfountHub object to use for communication with the hub.
  • context: Run-time protocol context details for running.
  • ****kwargs**: Additional keyword arguments.