DistributedSampler
SamplerDistributedSampler(dataset: Dataset, num_replicas: int = 1, rank: int = 0, shuffle: bool = True, seed: int = 0, drop_last: bool = False)Subset-and-shuffle sampler for distributed training.
Lucid is single-process, single-machine — there is no real distributed
backend to coordinate with — but DistributedSampler is part of the
standard DataLoader surface and user code routinely instantiates it
even in single-rank contexts (e.g. num_replicas=1, rank=0). This
implementation supports exactly that: it partitions the dataset into
num_replicas slabs and yields the indices belonging to rank.
With the default num_replicas=1 it degenerates to a plain
sequential / random sampler that respects shuffle and seed.
A multi-process backend would require a process group + collective communication; the surface stays compatible should that land later.
Notes
The index range range(len(dataset)) is partitioned into
num_replicas interleaved slabs — rank r receives every
num_replicas-th index starting at r. Per-epoch shuffles
are driven by random.Random(seed + epoch), so every replica
sees a different slab while remaining globally deterministic.
Call set_epoch once per epoch to rotate the shuffle —
forgetting to do so produces identical orderings each pass.
Examples
>>> sampler = DistributedSampler(my_dataset, num_replicas=4, rank=2)
>>> for epoch in range(num_epochs):
... sampler.set_epoch(epoch)
... for idx in sampler:
... x = my_dataset[idx]Methods (4)
__init__
→None__init__(dataset: Dataset, num_replicas: int = 1, rank: int = 0, shuffle: bool = True, seed: int = 0, drop_last: bool = False)Configure the distributed sampler.
Parameters
datasetDataset__len__ defines the index range.num_replicasint= 11 — degenerate
single-process case).rankint= 0[0, num_replicas).shufflebool= TrueTrue, indices are shuffled by random.Random(seed + epoch) before slabbing.seedint= 0set_epoch for deterministic per-epoch shuffles.drop_lastbool= FalseTrue, drop the trailing remainder so every replica
sees the same number of samples without padding. If
False, wrap-pad the index list.Raises
ValueErrornum_replicas < 1 or rank is out of range.set_epoch
→Noneset_epoch(epoch: int)Set the epoch number — affects the shuffling RNG seed.
__iter__
→Iterator[int]__iter__()Yield this replica's slab of indices for the current epoch.
Indices are optionally shuffled with seed self.seed + self.epoch,
then either truncated to total_size (drop_last=True) or
wrap-padded (drop_last=False). The replica picks every
num_replicas-th index starting at rank.
__len__
→int__len__()Return per-replica sample count — same on every rank.