base
Base interface for data persistence implementations.
Classes
BulkResult
class BulkResult( file_name_column: str, cached: Optional[pd.DataFrame], misses: list[Path],):
Container for the results of a bulk_get result.
Variables
- static
cached : Optional[pd.DataFrame]
- static
file_name_column : str
- static
misses : list[Path]
data
- Ordered DataFrame with cached data, excluding the file names.
hits
- Ordered Series of file name hits, possibly including duplicates.
Methods
get_cached_by_filename
def get_cached_by_filename(self, file_name: str) ‑> Optional[pd.DataFrame]:
Dataframe with cached data for a single file.
May contain multiple lines (e.g. for e2e files that contain several images).
DataPersister
class DataPersister( file_name_column: str, lock: Optional[_Lock] = None, bulk_partition_size: Optional[int] = None,):
Abstract interface for data persistence/caching implementations.
Subclasses
Static methods
prep_data_for_caching
def prep_data_for_caching( data: pd.DataFrame, image_cols: Optional[Collection[str]] = None,) ‑> pd.DataFrame:
Prepares data ready for caching.
This involves removing/replacing things that aren't supposed to be cached or that it makes no sense to cache, such as image data or file paths that won't be relevant except for when the files are actually being used.
Does not mutate input dataframe.
Methods
bulk_get
def bulk_get(self, files: list[Union[str, Path]]) ‑> BulkResult:
Get the persisted data for several files.
Returns only misses if no data has been persisted, if it is out of date, or an error was otherwise encountered.
bulk_set
def bulk_set( self, data: pd.DataFrame, original_file_col: str = '_original_filename',) ‑> None:
Bulk set a bunch of cache entries from a dataframe.
The dataframe must indicate the original file that each row is associated
with. This is the _original_filename
column by default.
get
def get(self, file: Union[str, Path]) ‑> Optional[pd.DataFrame]:
Get the persisted data for a given file.
Returns None if no data has been persisted, if it is out of date, or an error was otherwise encountered.
set
def set(self, file: Union[str, Path], data: pd.DataFrame) ‑> None:
Set the persisted data for a given file.
If existing data is already set, it will be overwritten.
The data should only be the data that is related to that file.
unset
def unset(self, file: Union[str, Path]) ‑> None:
Deletes the persisted data for a given file.