Skip to main content

dataframe_generation_extensions

Additional functionality for DataFrame processing.

Provides functions that can be used for additional column generation.

Module

Functions

generate_bitfount_patient_id

def generate_bitfount_patient_id(    df: pd.DataFrame,    name_col: str = "Patient's Name",    dob_col: str = "Patient's Birth Date",)> pandas.core.frame.DataFrame:

Adds a BitfountPatientID column to the provided DataFrame.

This mutates the input dataframe with the new column.

The generated IDs are the hash of the concatenated string of a Bitfount-specific key, full name, and date of birth.

id_safe_string

def id_safe_string(s: str)> str:

Converts a string to a normalised version safe for use in IDs.

In particular, converts accented/diacritic characters to their closest ASCII representation, ensures lowercase, and replaces any non-word characters with underscores.

This allows us to map potentially different spellings (e.g. Francois John-Smith vs François John Smith) to the same string (francois_john_smith).

safe_format_date

def safe_format_date(value: Any)> Any:

Safely format a date string.

Arguments

  • value: The input value, which can be a date string, integer, or NaN.

Returns Formatted date string or the original value as a string if formatting fails.

Classes

DataFrameExtensionError

class DataFrameExtensionError(*args, **kwargs):

Indicates an error whilst trying to apply an extension function.

Ancestors