class

DistributedSampler

extendsSampler

DistributedSampler(dataset: Dataset, num_replicas: int = 1, rank: int = 0, shuffle: bool = True, seed: int = 0, drop_last: bool = False)

source

Subset-and-shuffle sampler for distributed training.

Lucid is single-process, single-machine — there is no real distributed backend to coordinate with — but DistributedSampler is part of the standard DataLoader surface and user code routinely instantiates it even in single-rank contexts (e.g. num_replicas=1, rank=0). This implementation supports exactly that: it partitions the dataset into num_replicas slabs and yields the indices belonging to rank. With the default num_replicas=1 it degenerates to a plain sequential / random sampler that respects shuffle and seed.

A multi-process backend would require a process group + collective communication; the surface stays compatible should that land later.

Notes

The index range range(len(dataset)) is partitioned into num_replicas interleaved slabs — rank r receives every num_replicas-th index starting at r. Per-epoch shuffles are driven by random.Random(seed + epoch), so every replica sees a different slab while remaining globally deterministic. Call set_epoch once per epoch to rotate the shuffle — forgetting to do so produces identical orderings each pass.

Examples

>>> sampler = DistributedSampler(my_dataset, num_replicas=4, rank=2)
>>> for epoch in range(num_epochs):
...     sampler.set_epoch(epoch)
...     for idx in sampler:
...         x = my_dataset[idx]

Methods (4)

dunder

init

→None

__init__(dataset: Dataset, num_replicas: int = 1, rank: int = 0, shuffle: bool = True, seed: int = 0, drop_last: bool = False)

source

Configure the distributed sampler.

Parameters

datasetDataset

Dataset whose __len__ defines the index range.

num_replicasint= 1

Number of participating replicas (default 1 — degenerate single-process case).

rankint= 0

Replica id in [0, num_replicas).

shufflebool= True

If True, indices are shuffled by random.Random(seed + epoch) before slabbing.

seedint= 0

Base seed for the shuffling RNG. Combined with set_epoch for deterministic per-epoch shuffles.

drop_lastbool= False

If True, drop the trailing remainder so every replica sees the same number of samples without padding. If False, wrap-pad the index list.

Raises

ValueError

If num_replicas < 1 or rank is out of range.

set_epoch

→None

set_epoch(epoch: int)

source

Set the epoch number — affects the shuffling RNG seed.

dunder

iter

→Iterator[int]

__iter__()

source

Yield this replica's slab of indices for the current epoch.

Indices are optionally shuffled with seed self.seed + self.epoch, then either truncated to total_size (drop_last=True) or wrap-padded (drop_last=False). The replica picks every num_replicas-th index starting at rank.

dunder

len

→int

__len__()

source

Return per-replica sample count — same on every rank.