Skip to main content

dataloaders

HuggingFace compatible dataloaders.

Classes

HuggingFaceBitfountDataLoader

class HuggingFaceBitfountDataLoader(    dataset: Union[_HuggingFaceDataset, _IterableHuggingFaceDataset],    batch_size: int = 1,    shuffle: bool = False,):

Wraps a PyTorch DataLoader with bitfount functions.

Arguments

  • batch_size: The batch size for the dataloader. Defaults to 1.
  • dataset: An pytorch compatible dataset.
  • shuffle: A boolean value indicating whether the values in the dataset should be shuffled. Defaults to False.

Attributes

  • batch_size: The batch size for the dataloader. Defaults to 1.
  • shuffle: A boolean value indicating whether the values in the dataset should be shuffled. Defaults to False.

Ancestors

  • bitfount.data.huggingface.dataloaders._BaseHuggingFaceBitfountDataLoader
  • BitfountDataLoader

Methods


expect_key_in_iter

def expect_key_in_iter(self)> bool:

Will there be a data key entry in the output from iteration?

HuggingFaceIterableBitfountDataLoader

class HuggingFaceIterableBitfountDataLoader(    dataset: _IterableBitfountDataset,    batch_size: int = 1,    shuffle: bool = False,    secure_rng: bool = False,):

Wraps a PyTorch DataLoader with bitfount functions.

Arguments

  • batch_size: The batch size for the dataloader. Defaults to None.
  • dataset: An HuggingFace compatible dataset.

Variables

  • static dataset : bitfount.data.huggingface.datasets._IterableHuggingFaceDataset
  • buffer_size : int - Number of elements to buffer.

    The size of the buffer is the greater of the batch size and default buffer size unless the dataset is smaller than the default buffer in which case the dataset size is used. PyTorch already ensures that the batch size is not greater than the dataset size under the hood.

Static methods


convert_input_target

def convert_input_target(    batch: _DataBatchAllowingText,)> list[typing.Union[torch.Tensor, collections.abc.Sequence[torch.Tensor]]]:

Convert the input and target to match the hugging face expected inputs_.

convert_input_target_key

def convert_input_target_key(    batch: _DataBatchAllowingTextWithKey,)> list[typing.Union[torch.Tensor, collections.abc.Sequence[torch.Tensor], collections.abc.Sequence[str]]]:

Convert the input and target to match the hugging face expected inputs_.

Ensures that the data key is also passed through to the output.

Methods


expect_key_in_iter

def expect_key_in_iter(self)> bool:

Will there be a data key entry in the output from iteration?