processor
Classes for dealing with Transformation Processing.
Classes
TransformationProcessor
class TransformationProcessor( transformations: list[Transformation], schema: Optional[TableSchema] = None, col_refs: Optional[set[str]] = None,):
Processes Transformations on a given dataframe.
The Transformation processor does not add any of the newly created columns to the Schema. This must be done separately after processing the transformations.
Arguments
transformations
: The list of transformations to apply.schema
: The schema of the data to be transformed.col_refs
: The set of columns referenced in those transformations.
Attributes
transformations
: The list of transformations to apply.schema
: The schema of the data to be transformed.col_refs
: The set of columns referenced in those transformations.
Methods
batch_transform
def batch_transform(self, data: np.ndarray, step: DataSplit) ‑> numpy.ndarray:
Performs batch transformations.
Arguments
data
: The data to be transformed at batch time as a numpy array.step
: The step at which the data should be transformed.
Returns np.ndarray: The transformed data as a numpy array.
Raises
InvalidBatchTransformationError
: If one of the specified transformations does not inherit fromBatchTimeOperation
.
transform
def transform(self, data: pd.DataFrame) ‑> pandas.core.frame.DataFrame:
Performs self.transformations
on data
sequentially.
Arguments to an operation are extracted by first checking if they are referencing another transformed column by checking for the name attribute. If not, we then check if they are referencing a non-transformed column by using a regular expression. Finally, if the regex comes back empty we take the argument 'as is' e.g. a string, integer, etc. After the transformations are complete, finally removes any columns that shouldn't be part of the final output.
Arguments
data
: Thepandas
dataframe to be transformed.
Raises
MissingColumnReferenceError
: If there is a reference to a non-existing column.TypeError
: if there are clashes between column names or if unable to apply transformation.