dataframe_generation_extensions
Additional functionality for DataFrame processing.
Provides functions that can be used for additional column generation.
Module
Functions
generate_bitfount_patient_id
def generate_bitfount_patient_id( df: pd.DataFrame, name_col: str = "Patient's Name", dob_col: str = "Patient's Birth Date",) ‑> pandas.core.frame.DataFrame:
Adds a BitfountPatientID column to the provided DataFrame.
This mutates the input dataframe with the new column.
The generated IDs are the hash of the concatenated string of a Bitfount-specific key, full name, and date of birth.
id_safe_string
def id_safe_string(s: str) ‑> str:
Converts a string to a normalised version safe for use in IDs.
In particular, converts accented/diacritic characters to their closest ASCII representation, ensures lowercase, and replaces any non-word characters with underscores.
This allows us to map potentially different spellings (e.g. Francois John-Smith vs François John Smith) to the same string (francois_john_smith).
safe_format_date
def safe_format_date(value: Any) ‑> Any:
Safely format a date string.
Arguments
value
: The input value, which can be a date string, integer, or NaN.
Returns Formatted date string or the original value as a string if formatting fails.
Classes
DataFrameExtensionError
class DataFrameExtensionError(*args, **kwargs):
Indicates an error whilst trying to apply an extension function.
Ancestors
- BitfountError
- builtins.Exception
- builtins.BaseException