dataset_operations
Dataset-related transformations.
This module contains the base class and concrete classes for dataset transformations, those that potentially act over the entire dataset.
Classes
CleanDataTransformation
class CleanDataTransformation( *, name: str = None, output: bool = True, cols: Union[str, list[str]] = 'all',):
Dataset transformation that will "clean" the specified columns.
For continuous columns this will replace all infinities and NaNs with 0. For categorical columns this will replace all NaN's with "nan" explicitly.
Arguments
cols
: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.name
: The name of the transformation. If not provided a unique name will be generated from the class name.output
: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.
Raises
TransformationRegistryError
: If the transformation name is already in use.TransformationRegistryError
: If the transformation name hasn't been provided and the transformation is not registered.ValueError
: Ifoutput
is False.
Method generated by attrs for class CleanDataTransformation.
Ancestors
Static methods
schema
def schema() ‑> marshmallow.schema.Schema:
Inherited from:
DatasetTransformation.schema :
Gets an instance of the Schema associated with this Transformation.
Raises
TypeError
: If the transformation doesn't have aTransformationSchema
as the schema.
DatasetTransformation
class DatasetTransformation( *, name: str = None, output: bool = True, cols: Union[str, list[str]] = 'all',):
Base transformation for all dataset transformation classes.
User can specify "all" to have it act on every relevant column as defined in the schema.
Arguments
cols
: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.name
: The name of the transformation. If not provided a unique name will be generated from the class name.output
: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.
Raises
TransformationRegistryError
: If the transformation name is already in use.TransformationRegistryError
: If the transformation name hasn't been provided and the transformation is not registered.ValueError
: Ifoutput
is False.
Method generated by attrs for class DatasetTransformation.
Ancestors
Subclasses
Static methods
schema
def schema() ‑> marshmallow.schema.Schema:
Inherited from:
Gets an instance of the Schema associated with this Transformation.
Raises
TypeError
: If the transformation doesn't have aTransformationSchema
as the schema.
NormalizeDataTransformation
class NormalizeDataTransformation( *, name: str = None, output: bool = True, cols: Union[str, list[str]] = 'float',):
Dataset transformation that will normalise the specified continuous columns.
Arguments
cols
: The columns to act on as a list of strings. By default, this transformation will only apply to columns of type float.name
: The name of the transformation. If not provided a unique name will be generated from the class name.output
: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.
Raises
TransformationRegistryError
: If the transformation name is already in use.TransformationRegistryError
: If the transformation name hasn't been provided and the transformation is not registered.ValueError
: Ifoutput
is False.
Method generated by attrs for class NormalizeDataTransformation.
Ancestors
Variables
- static
cols : Union[str, list[str]]
Static methods
schema
def schema() ‑> marshmallow.schema.Schema:
Inherited from:
DatasetTransformation.schema :
Gets an instance of the Schema associated with this Transformation.
Raises
TypeError
: If the transformation doesn't have aTransformationSchema
as the schema.
ScalarAdditionDataTransformation
class ScalarAdditionDataTransformation( *, name: str = None, output: bool = True, cols: Union[str, list[str]] = 'all', scalar: Union[int, float, Mapping[str, Union[int, float]]] = 0,):
Dataset transformation that adds a scalar to the specified columns.
Transformation applied to the dataset in place. Only applies to continuous columns.
Arguments
cols
: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.name
: The name of the transformation. If not provided a unique name will be generated from the class name.output
: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.scalar
: the scalar to be used for multiplication. It can be provided as a number, in which case all numerical columns will be multiplied by the respective scalar or as a dictionary mapping column names to scalars for multiplication. Defaults to 0.
Raises
TransformationApplicationError
: if the scalar variable is not correctly instantiated.TransformationRegistryError
: If the transformation name is already in use.TransformationRegistryError
: If the transformation name hasn't been provided and the transformation is not registered.ValueError
: Ifoutput
is False.
Method generated by attrs for class ScalarAdditionDataTransformation.
Ancestors
Variables
- static
scalar : Union[int, float, collections.abc.Mapping[str, Union[int, float]]]
Static methods
schema
def schema() ‑> marshmallow.schema.Schema:
Inherited from:
DatasetTransformation.schema :
Gets an instance of the Schema associated with this Transformation.
Raises
TypeError
: If the transformation doesn't have aTransformationSchema
as the schema.
ScalarMultiplicationDataTransformation
class ScalarMultiplicationDataTransformation( *, name: str = None, output: bool = True, cols: Union[str, list[str]] = 'all', scalar: Union[int, float, Mapping[str, Union[int, float]]] = 1,):
Dataset transformation that multiplies the specified columns by a scalar.
Transformation applied to the dataset in place. Only applies to continuous columns.
Arguments
cols
: The columns to act on as a list of strings. Defaults to "all" which acts on all columns in the dataset.name
: The name of the transformation. If not provided a unique name will be generated from the class name.output
: Whether or not this transformation should be included in the final output. This must be True for all dataset transformations. Defaults to True.scalar
: the scalar to be used for multiplication. It can be provided as a number, in which case all numerical columns will be multiplied by the respective scalar or as a dictionary mapping column names to scalars for multiplication. Defaults to 1.
Raises
TransformationApplicationError
: if the scalar variable is not correctly instantiated.TransformationRegistryError
: If the transformation name is already in use.TransformationRegistryError
: If the transformation name hasn't been provided and the transformation is not registered.ValueError
: Ifoutput
is False.
Method generated by attrs for class ScalarMultiplicationDataTransformation.
Ancestors
Variables
- static
scalar : Union[int, float, collections.abc.Mapping[str, Union[int, float]]]
Static methods
schema
def schema() ‑> marshmallow.schema.Schema:
Inherited from:
DatasetTransformation.schema :
Gets an instance of the Schema associated with this Transformation.
Raises
TypeError
: If the transformation doesn't have aTransformationSchema
as the schema.