MNIST

class lucid.datasets.MNIST(root: str | Path, train: bool | None = True, download: bool | None = False, transform: Module | Compose | None = None, target_transform: Module | Compose | None = None, test_size: float = 0.2, to_tensor: bool = True)

The MNIST class provides access to the MNIST dataset of handwritten digits, a widely-used benchmark in the field of machine learning.

This dataset contains 70,000 grayscale images of handwritten digits (28x28 pixels) and their corresponding labels (0-9).

Class Signature

class lucid.datasets.MNIST(
    root: str | Path,
    train: bool = True,
    download: bool = False,
    transform: nn.Module | Compose | None = None,
    target_transform: nn.Module | Compose | None = None,
    test_size: float = 0.2,
    to_tensor: bool = True,
)

Parameters

  • root (str | Path): The directory where the dataset will be stored.

  • train (bool, optional): If True, loads the training set (80% of the dataset by default). If False, loads the test set (remaining 20%). Defaults to True.

  • download (bool, optional): If True, downloads the dataset from OpenML if it is not available in the specified root directory. Defaults to False.

  • transform (nn.Module | Compose | None, optional): A function or transform pipeline to apply to the images. Defaults to None.

  • target_transform (nn.Module | Compose | None, optional): A function or transform pipeline to apply to the labels. Defaults to None.

  • test_size (float, optional): The proportion of the dataset to be used as the test set. Defaults to 0.2.

  • to_tensor (bool, optional): If True, converts the data into lucid.Tensors. Defaults to True.

Attributes

  • data (lucid.Tensor): A tensor containing the images in the dataset.

  • targets (lucid.Tensor): A tensor containing the labels corresponding to the images.

Methods

  • __getitem__(index: int) -> Tuple[Tensor, Tensor]: Returns a tuple containing the image and label at the specified index.

  • __len__() -> int: Returns the total number of samples in the dataset.

Examples

Loading and Accessing Data

from lucid.datasets import MNIST

# Load the training set, downloading it if necessary
mnist_train = MNIST(root="./data", train=True, download=True)

# Get the first image and label
image, label = mnist_train[0]

print(f"Image Shape: {image.shape}, Label: {label}")

Applying Transformations

from lucid.datasets import MNIST
from lucid.transforms import Normalize

transform = Normalize(mean=[...], std=[...])

mnist_train = MNIST(root="./data", train=True, download=True, transform=transform)
image, label = mnist_train[0]

print(f"Normalized Image: {image}")

Note

  • The dataset is stored in .npy format for efficient loading.

  • Integration with OpenML ensures reliable access to the dataset.

References