Skip to main content

utils

Utility functions concerning data.

Module

Functions

check_datastructure_schema_compatibility

def check_datastructure_schema_compatibility(    datastructure: DataStructure,    schema: BitfountSchema,    data_identifier: Optional[str] = None,)> tuple[DataStructureSchemaCompatibility, list[str]]:

Compare a datastructure from a task and a data schema for compatibility.

Currently, this checks that requested columns exist in the target schema.

Query-based datastructures are not supported.

Arguments

  • datastructure: The datastructure for the task.
  • schema: The overall schema for the pod in question.
  • data_identifier: If the datastructure specifies multiple pods then the data identifier is needed to identify which part of the datastructure refers to the pod in question.

Returns A tuple of the compatibility level (DataStructureSchemaCompatibility value), and a list of strings which are all compatibility warnings/issues found.

partition

def partition(iterable: Iterable[_I], partition_size: int = 1)> Iterable[Sequence[~_I]]:

Takes an iterable and yields partitions of size partition_size.

The final partition may be less than size partition_size due to the variable length of the iterable.

The partitions will be yielded as tuples of elements from the original iterable, unless the original iterable is a list, in which case the partitions are also yielded as lists.

Classes

DataStructureSchemaCompatibility

class DataStructureSchemaCompatibility(    value, names=None, *, module=None, qualname=None, type=None, start=1,):

The level of compatibility between a datastructure and a pod/table schema.

Denotes 4 different levels of compatibility: - COMPATIBLE: Compatible to our knowledge. - WARNING: Might be compatible but there might still be runtime incompatibility issues. - INCOMPATIBLE: Clearly incompatible. - ERROR: An error occurred whilst trying to check compatibility.

Ancestors

Variables

  • static COMPATIBLE
  • static ERROR
  • static INCOMPATIBLE
  • static WARNING