utils
Utility functions concerning data.
Module
Functions
check_datastructure_schema_compatibility
def check_datastructure_schema_compatibility( datastructure: DataStructure, schema: BitfountSchema, data_identifier: Optional[str] = None,) ‑> tuple[DataStructureSchemaCompatibility, list[str]]:
Compare a datastructure from a task and a data schema for compatibility.
Currently, this checks that requested columns exist in the target schema.
Query-based datastructures are not supported.
Arguments
datastructure
: The datastructure for the task.schema
: The overall schema for the pod in question.data_identifier
: If the datastructure specifies multiple pods then the data identifier is needed to identify which part of the datastructure refers to the pod in question.
Returns A tuple of the compatibility level (DataStructureSchemaCompatibility value), and a list of strings which are all compatibility warnings/issues found.
partition
def partition(iterable: Iterable[_I], partition_size: int = 1) ‑> Iterable[Sequence[~_I]]:
Takes an iterable and yields partitions of size partition_size
.
The final partition may be less than size partition_size
due to the variable
length of the iterable.
The partitions will be yielded as tuples of elements from the original iterable, unless the original iterable is a list, in which case the partitions are also yielded as lists.
Classes
DataStructureSchemaCompatibility
class DataStructureSchemaCompatibility( value, names=None, *, module=None, qualname=None, type=None, start=1,):
The level of compatibility between a datastructure and a pod/table schema.
Denotes 4 different levels of compatibility: - COMPATIBLE: Compatible to our knowledge. - WARNING: Might be compatible but there might still be runtime incompatibility issues. - INCOMPATIBLE: Clearly incompatible. - ERROR: An error occurred whilst trying to check compatibility.