DataLoader
DataLoader(dataset: Dataset, batch_size: int = 1, shuffle: bool | None = None, sampler: Sampler | None = None, batch_sampler: Sampler | None = None, num_workers: int = 0, collate_fn: Callable[..., object] | None = None, drop_last: bool = False, timeout: float = 0.0, worker_init_fn: Callable[..., object] | None = None, multiprocessing_context: object = None, generator: object = None, prefetch_factor: int | None = None, persistent_workers: bool = False)Combine a dataset with a sampler to provide iteration over mini-batches.
Wraps a Dataset to provide batching, optional shuffling,
parallel data loading via worker processes, and customisable
collation. Iteration yields one collated batch per step until the
underlying sampler is exhausted.
Parameters
datasetDatasetDataset) or iterable-style (IterableDataset).batch_sizeint= 1batch_sampler is
provided.shufflebool= NoneTrue, the default sampler is RandomSampler;
otherwise SequentialSampler. Mutually exclusive with
sampler.samplerSampler= Noneshuffle.batch_samplerSampler= Nonebatch_size / shuffle / sampler /
drop_last.num_workersint= 00 runs
single-process in the main thread; > 0 spawns a worker pool.collate_fncallable= Nonedefault_collate).drop_lastbool= FalseTrue, drop the trailing batch when the dataset length is
not divisible by batch_size.timeoutfloat= 0.0RuntimeError. 0 blocks indefinitely.worker_init_fncallable= Noneworker_init_fn(worker_id) at the start of each
worker process — useful for per-worker RNG seeding.prefetch_factorint= None2 when
num_workers > 0). Higher values trade memory for throughput.persistent_workersbool= Falsenum_workers > 0.pin_memorybool= Falsegeneratoroptional= NoneRandomSampler.Notes
Worker processes communicate via multiprocessing.Queue: each
worker owns one index queue, all workers share a single result
queue, and the main process reorders results back into sampler
order before yielding. Sequence numbers ensure deterministic
delivery regardless of completion order across workers.
Examples
>>> dl = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4)
>>> for batch in dl:
... ...Methods (3)
__init__
→None__init__(dataset: Dataset, batch_size: int = 1, shuffle: bool | None = None, sampler: Sampler | None = None, batch_sampler: Sampler | None = None, num_workers: int = 0, collate_fn: Callable[..., object] | None = None, drop_last: bool = False, timeout: float = 0.0, worker_init_fn: Callable[..., object] | None = None, multiprocessing_context: object = None, generator: object = None, prefetch_factor: int | None = None, persistent_workers: bool = False)Configure a DataLoader; see the class docstring for parameter
semantics.
Raises
ValueErrorNotes
sampler / batch_sampler / shuffle / batch_size /
drop_last interact: passing batch_sampler precludes the
other four; passing sampler precludes shuffle. When no
sampler is supplied, a SequentialSampler (shuffle=False)
or RandomSampler (shuffle=True) is constructed
automatically. persistent_workers requires num_workers > 0.
__iter__
→Iterator[Tensor | tuple[Tensor, ...]]__iter__()Yield collated mini-batches for one full pass over the dataset.
Dispatches to either the single-process iterator (num_workers == 0) or the multi-process iterator. When persistent_workers is
enabled the multi-process worker pool survives between epochs;
otherwise workers are spawned and joined per call.
__len__
→int__len__()Return the number of batches per epoch (len(batch_sampler)).