class

CrossEntropyLoss

extendsModule

CrossEntropyLoss(weight: Tensor | None = None, ignore_index: int = -100, reduction: Reduction = 'mean', label_smoothing: float = 0.0)

source edit

Cross-entropy loss for multi-class classification.

This criterion combines a log-softmax and a negative log-likelihood step in a single numerically stable operation. For a batch of N samples, each of class index y_n from C classes, and raw logit vector x_n:

\mathcal{L}(x, y) = -x_{n,y_n} + \log \sum_{c=1}^{C} \exp(x_{n,c})

The log-sum-exp trick is applied internally so that large logit values do not cause overflow or underflow.

Label smoothing — when label_smoothing $= \varepsilon > 0$ the hard target is softened to a mixture of the one-hot label and the uniform distribution:

\tilde{y}_{n,c} = (1-\varepsilon)\,\mathbf{1}[c = y_n] + \frac{\varepsilon}{C}

which replaces the loss with:

\mathcal{L}_\varepsilon = (1-\varepsilon)\,\mathcal{L}(x,y) + \frac{\varepsilon}{C} \sum_{c=1}^{C} \mathcal{L}(x,c)

Parameters

weightTensor of shape (C,)= None

Manual rescaling weight assigned to each class. Useful for imbalanced datasets. Must be a 1-D float tensor of length C.

ignore_indexint= -100

Specifies a target value that is ignored and does not contribute to the gradient. Default -100.

reductionstr= 'mean'

'none' | 'mean' (default) | 'sum'.

label_smoothingfloat= 0.0

Smoothing parameter

\varepsilon \in [0, 1)

. Default 0.0 (no smoothing).

Attributes

weightTensor or None

Per-class weight tensor, or None if not provided.

ignore_indexint

Target index excluded from loss and gradient computation.

reductionstr

The reduction mode.

label_smoothingfloat

The smoothing coefficient

\varepsilon

Notes

Input x : $(N, C)$ or $(N, C, d_1, d_2, \ldots)$ — raw unnormalised logits.
Target y : $(N,)$ or $(N, d_1, d_2, \ldots)$ — integer class indices in $[0, C)$ .
Output : scalar when reduction is 'mean' or 'sum'; $(N,)$ or $(N, d_1, \ldots)$ for 'none'.

Passing logits rather than softmax probabilities is strongly recommended for numerical stability — the internal log-sum-exp implementation avoids catastrophic cancellation.
Equivalent to NLLLoss(LogSoftmax(x, dim=1), y) but computed in a single pass.

Examples

Three-class classification with a batch of two samples:
>>> import lucid
>>> import lucid.nn as nn
>>> criterion = nn.CrossEntropyLoss()
>>> x = lucid.tensor([[0.1, 0.9, 0.0], [2.0, 0.5, 0.1]])
>>> y = lucid.tensor([1, 0])
>>> loss = criterion(x, y)  # scalar
With label smoothing and per-class weights:
>>> import lucid
>>> import lucid.nn as nn
>>> criterion = nn.CrossEntropyLoss(
...     weight=lucid.tensor([1.0, 2.0, 1.0]),
...     label_smoothing=0.1,
... )
>>> x = lucid.tensor([[1.0, 2.0, 0.5], [0.2, 0.8, 1.5]])
>>> y = lucid.tensor([1, 2])
>>> loss = criterion(x, y)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(weight: Tensor | None = None, ignore_index: int = -100, reduction: Reduction = 'mean', label_smoothing: float = 0.0)

source edit

Initialise the CrossEntropyLoss module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor, target: Tensor)

source edit

Compute the loss between predictions and targets.

Parameters

xTensor

Input tensor.

targetTensor

Input tensor.

Returns

Tensor

Scalar loss (or unreduced tensor depending on reduction).

Three-class classification with a batch of two samples: >>> import lucid >>> import lucid.nn as nn >>> criterion = nn.CrossEntropyLoss() >>> x = lucid.tensor([[0.1, 0.9, 0.0], [2.0, 0.5, 0.1]]) >>> y = lucid.tensor([1, 0]) >>> loss = criterion(x, y) # scalar With label smoothing and per-class weights: >>> import lucid >>> import lucid.nn as nn >>> criterion = nn.CrossEntropyLoss( ... weight=lucid.tensor([1.0, 2.0, 1.0]), ... label_smoothing=0.1, ... ) >>> x = lucid.tensor([[1.0, 2.0, 0.5], [0.2, 0.8, 1.5]]) >>> y = lucid.tensor([1, 2]) >>> loss = criterion(x, y)