random_split

→list of Subset

random_split(dataset: Dataset, lengths: list[int] | list[float], generator: object = None)

source

Randomly split a dataset into non-overlapping Subset views.

Shuffles range(len(dataset)) and slices it into chunks of the requested lengths, wrapping each slice in a Subset. The children do not copy the underlying samples — they hold the parent dataset by reference.

Parameters

datasetDataset

Source dataset to split.

lengthslist of int or list of float

Either absolute split sizes summing to len(dataset), or fractions in [0, 1] summing (approximately) to 1.0. In the fractional case, rounding error is absorbed by the final split so the totals stay consistent.

generatoroptional= None

Seed-like object forwarded to random.Random for reproducibility. If None, the global random state is used.

Returns

list of Subset

One Subset per requested split, in registration order.

Raises

ValueError

If fractional lengths do not sum to 1.0 (within 1e-6) or integer lengths do not sum to len(dataset).

Notes

The split is permutation-based: range(len(dataset)) is shuffled once and then sliced into the requested chunks. Reproducibility is obtained by seeding the global RNG via lucid.manual_seed, or by passing an explicit generator seed; the same generator state always yields the same partition.

Examples

>>> full = TensorDataset(X, y)
>>> train, val, test = random_split(full, [0.8, 0.1, 0.1])
>>> len(train), len(val), len(test)
(80, 10, 10)