data

WorkerInfo

WorkerInfo(id: int, num_workers: int, seed: int, dataset: Dataset)

source

Per-worker context published inside a DataLoader worker process.

Returned by get_worker_info when called from within a worker's __getitem__ / __iter__ / worker_init_fn. Lets user code shard work, seed RNGs differently per worker, or open per-worker file handles.

Parameters

idint

The worker's integer index in [0, num_workers).

num_workersint

Total number of worker processes for this DataLoader.

seedint

The per-worker random seed (typically base_seed + id). The loader seeds Python random and (when available) numpy with this value before invoking worker_init_fn.

datasetDataset

The dataset copy owned by this worker. Because spawn is used, this is a deep-copied instance — mutations in one worker are not visible to others.

Notes

Use get_worker_info rather than constructing this dataclass directly; the per-thread storage is what actually wires it up.

Examples

>>> def worker_init_fn(worker_id):
...     info = get_worker_info()
...     # Shard an IterableDataset across workers:
...     info.dataset.start = info.id * shard_size

Methods (1)

dunder

init

→None

__init__(id: int, num_workers: int, seed: int, dataset: Dataset)

source

Configure a DataLoader; see the class docstring for parameter semantics.

Raises

ValueError

On any of the above mutual-exclusion / range violations.

Notes

sampler / batch_sampler / shuffle / batch_size / drop_last interact: passing batch_sampler precludes the other four; passing sampler precludes shuffle. When no sampler is supplied, a SequentialSampler (shuffle=False) or RandomSampler (shuffle=True) is constructed automatically. persistent_workers requires num_workers > 0.