dataloaders
HuggingFace compatible dataloaders.
Classes
HuggingFaceBitfountDataLoader
class HuggingFaceBitfountDataLoader( dataset: Union[_HuggingFaceDataset, _IterableHuggingFaceDataset], batch_size: int = 1, shuffle: bool = False,):
Wraps a PyTorch DataLoader with bitfount functions.
Arguments
batch_size
: The batch size for the dataloader. Defaults to 1.dataset
: An pytorch compatible dataset.shuffle
: A boolean value indicating whether the values in the dataset should be shuffled. Defaults to False.
Attributes
batch_size
: The batch size for the dataloader. Defaults to 1.shuffle
: A boolean value indicating whether the values in the dataset should be shuffled. Defaults to False.
Ancestors
- bitfount.data.huggingface.dataloaders._BaseHuggingFaceBitfountDataLoader
- BitfountDataLoader
Methods
expect_key_in_iter
def expect_key_in_iter(self) ‑> bool:
Will there be a data key entry in the output from iteration?
HuggingFaceIterableBitfountDataLoader
class HuggingFaceIterableBitfountDataLoader( dataset: _IterableBitfountDataset, batch_size: int = 1, shuffle: bool = False, secure_rng: bool = False,):
Wraps a PyTorch DataLoader with bitfount functions.
Arguments
batch_size
: The batch size for the dataloader. Defaults to None.dataset
: An HuggingFace compatible dataset.
Ancestors
Variables
- static
dataset : bitfount.data.huggingface.datasets._IterableHuggingFaceDataset
-
buffer_size : int
- Number of elements to buffer.The size of the buffer is the greater of the batch size and default buffer size unless the dataset is smaller than the default buffer in which case the dataset size is used. PyTorch already ensures that the batch size is not greater than the dataset size under the hood.
Static methods
convert_input_target
def convert_input_target( batch: _DataBatchAllowingText,) ‑> list[typing.Union[torch.Tensor, collections.abc.Sequence[torch.Tensor]]]:
Convert the input and target to match the hugging face expected inputs_.
convert_input_target_key
def convert_input_target_key( batch: _DataBatchAllowingTextWithKey,) ‑> list[typing.Union[torch.Tensor, collections.abc.Sequence[torch.Tensor], collections.abc.Sequence[str]]]:
Convert the input and target to match the hugging face expected inputs_.
Ensures that the data key is also passed through to the output.
Methods
expect_key_in_iter
def expect_key_in_iter(self) ‑> bool:
Will there be a data key entry in the output from iteration?