sqlite
A data persistance implementation backed by an SQLite database.
Classes
CacheInfoTableBase
class CacheInfoTableBase():
Cache information entry ORM.
Represents the table in the database that corresponds to cache validity
information. In particular, stores the primary key of the cache, file
,
which is the canonical path of the file in question, and the time the cache
was last updated for that file.
This is a mix-in designed to be used with the EntityName pattern: https://github.com/sqlalchemy/sqlalchemy/wiki/EntityName
Variables
- static
cache_updated_at : sqlalchemy.orm.base.Mapped[datetime.datetime]
- static
data
- static
file : sqlalchemy.orm.base.Mapped[str]
DataTableBase
class DataTableBase():
Cached data entry ORM.
The specific structure of this table will depend on the data being stored in
it (hence why deferred reflection is used); the table is initialised at the
first set()
call and its schema determined at that point.
Some things are consistent though; the data must have: - an integer primary key column (data_cache_id
) - a column of text called _source_canonical_path
(which stores a canonical
filepath) and has a foreign key constraint on the cache info table.
This is a mix-in designed to be used with the EntityName pattern: https://github.com/sqlalchemy/sqlalchemy/wiki/EntityName
SQLiteDataPersister
class SQLiteDataPersister(sqlite_path: Path, *args: Any, **kwargs: Any):
A data caching implementation that uses an SQLite database.
Ancestors
Variables
db_prepped : bool
- Whether the database has been fully initialised.
Static methods
prep_data_for_caching
def prep_data_for_caching( data: pd.DataFrame, image_cols: Optional[Collection[str]] = None,) ‑> pd.DataFrame:
Inherited from:
DataPersister.prep_data_for_caching :
Prepares data ready for caching.
This involves removing/replacing things that aren't supposed to be cached or that it makes no sense to cache, such as image data or file paths that won't be relevant except for when the files are actually being used.
Does not mutate input dataframe.
Methods
bulk_get
def bulk_get(self, files: list[Union[str, Path]]) ‑> BulkResult:
Inherited from:
Get the persisted data for several files.
Returns only misses if no data has been persisted, if it is out of date, or an error was otherwise encountered.
bulk_set
def bulk_set( self, data: pd.DataFrame, original_file_col: str = '_original_filename',) ‑> None:
Inherited from:
Bulk set a bunch of cache entries from a dataframe.
The dataframe must indicate the original file that each row is associated
with. This is the _original_filename
column by default.
get
def get(self, file: Union[str, Path]) ‑> Optional[pd.DataFrame]:
Inherited from:
Get the persisted data for a given file.
Returns None if no data has been persisted, if it is out of date, or an error was otherwise encountered.
set
def set(self, file: Union[str, Path], data: pd.DataFrame) ‑> None:
Inherited from:
Set the persisted data for a given file.
If existing data is already set, it will be overwritten.
The data should only be the data that is related to that file.
unset
def unset(self, file: Union[str, Path]) ‑> None:
Deletes the persisted data for the given file.