class

DistributedSampler

extendsSampler
DistributedSampler(dataset: Dataset, num_replicas: int = 1, rank: int = 0, shuffle: bool = True, seed: int = 0, drop_last: bool = False)
source

Subset-and-shuffle sampler for distributed training.

Lucid is single-process, single-machine — there is no real distributed backend to coordinate with — but DistributedSampler is part of the standard DataLoader surface and user code routinely instantiates it even in single-rank contexts (e.g. num_replicas=1, rank=0). This implementation supports exactly that: it partitions the dataset into num_replicas slabs and yields the indices belonging to rank. With the default num_replicas=1 it degenerates to a plain sequential / random sampler that respects shuffle and seed.

A multi-process backend would require a process group + collective communication; the surface stays compatible should that land later.

Notes

The index range range(len(dataset)) is partitioned into num_replicas interleaved slabs — rank r receives every num_replicas-th index starting at r. Per-epoch shuffles are driven by random.Random(seed + epoch), so every replica sees a different slab while remaining globally deterministic. Call set_epoch once per epoch to rotate the shuffle — forgetting to do so produces identical orderings each pass.

Examples

>>> sampler = DistributedSampler(my_dataset, num_replicas=4, rank=2)
>>> for epoch in range(num_epochs):
...     sampler.set_epoch(epoch)
...     for idx in sampler:
...         x = my_dataset[idx]

Methods (4)

dunder

__init__

None
__init__(dataset: Dataset, num_replicas: int = 1, rank: int = 0, shuffle: bool = True, seed: int = 0, drop_last: bool = False)
source

Configure the distributed sampler.

Parameters

datasetDataset
Dataset whose __len__ defines the index range.
num_replicasint= 1
Number of participating replicas (default 1 — degenerate single-process case).
rankint= 0
Replica id in [0, num_replicas).
shufflebool= True
If True, indices are shuffled by random.Random(seed + epoch) before slabbing.
seedint= 0
Base seed for the shuffling RNG. Combined with set_epoch for deterministic per-epoch shuffles.
drop_lastbool= False
If True, drop the trailing remainder so every replica sees the same number of samples without padding. If False, wrap-pad the index list.

Raises

ValueError
If num_replicas < 1 or rank is out of range.
fn

set_epoch

None
set_epoch(epoch: int)
source

Set the epoch number — affects the shuffling RNG seed.

dunder

__iter__

Iterator[int]
__iter__()
source

Yield this replica's slab of indices for the current epoch.

Indices are optionally shuffled with seed self.seed + self.epoch, then either truncated to total_size (drop_last=True) or wrap-padded (drop_last=False). The replica picks every num_replicas-th index starting at rank.

dunder

__len__

int
__len__()
source

Return per-replica sample count — same on every rank.