class

CrossEntropyLoss

extendsModule
CrossEntropyLoss(weight: Tensor | None = None, ignore_index: int = -100, reduction: str = 'mean', label_smoothing: float = 0.0)
source

Cross-entropy loss for multi-class classification.

This criterion combines a log-softmax and a negative log-likelihood step in a single numerically stable operation. For a batch of N samples, each of class index y_n from C classes, and raw logit vector x_n:

L(x,y)=xn,yn+logc=1Cexp(xn,c)\mathcal{L}(x, y) = -x_{n,y_n} + \log \sum_{c=1}^{C} \exp(x_{n,c})

The log-sum-exp trick is applied internally so that large logit values do not cause overflow or underflow.

Label smoothing — when label_smoothing =ε>0= \varepsilon > 0 the hard target is softened to a mixture of the one-hot label and the uniform distribution:

y~n,c=(1ε)1[c=yn]+εC\tilde{y}_{n,c} = (1-\varepsilon)\,\mathbf{1}[c = y_n] + \frac{\varepsilon}{C}

which replaces the loss with:

Lε=(1ε)L(x,y)+εCc=1CL(x,c)\mathcal{L}_\varepsilon = (1-\varepsilon)\,\mathcal{L}(x,y) + \frac{\varepsilon}{C} \sum_{c=1}^{C} \mathcal{L}(x,c)

Parameters

weightTensor of shape (C,)= None
Manual rescaling weight assigned to each class. Useful for imbalanced datasets. Must be a 1-D float tensor of length C.
ignore_indexint= -100
Specifies a target value that is ignored and does not contribute to the gradient. Default -100.
reductionstr= 'mean'
'none' | 'mean' (default) | 'sum'.
label_smoothingfloat= 0.0
Smoothing parameter ε[0,1)\varepsilon \in [0, 1). Default 0.0 (no smoothing).

Attributes

weightTensor or None
Per-class weight tensor, or None if not provided.
ignore_indexint
Target index excluded from loss and gradient computation.
reductionstr
The reduction mode.
label_smoothingfloat
The smoothing coefficient ε\varepsilon.

Notes

  • Input x : (N,C)(N, C) or (N,C,d1,d2,)(N, C, d_1, d_2, \ldots) — raw unnormalised logits.
  • Target y : (N,)(N,) or (N,d1,d2,)(N, d_1, d_2, \ldots) — integer class indices in [0,C)[0, C).
  • Output : scalar when reduction is 'mean' or 'sum'; (N,)(N,) or (N,d1,)(N, d_1, \ldots) for 'none'.
  • Passing logits rather than softmax probabilities is strongly recommended for numerical stability — the internal log-sum-exp implementation avoids catastrophic cancellation.
  • Equivalent to NLLLoss(LogSoftmax(x, dim=1), y) but computed in a single pass.

Examples

Three-class classification with a batch of two samples:
>>> import lucid
>>> import lucid.nn as nn
>>> criterion = nn.CrossEntropyLoss()
>>> x = lucid.tensor([[0.1, 0.9, 0.0], [2.0, 0.5, 0.1]])
>>> y = lucid.tensor([1, 0])
>>> loss = criterion(x, y)  # scalar
With label smoothing and per-class weights:
>>> import lucid
>>> import lucid.nn as nn
>>> criterion = nn.CrossEntropyLoss(
...     weight=lucid.tensor([1.0, 2.0, 1.0]),
...     label_smoothing=0.1,
... )
>>> x = lucid.tensor([[1.0, 2.0, 0.5], [0.2, 0.8, 1.5]])
>>> y = lucid.tensor([1, 2])
>>> loss = criterion(x, y)

Methods (3)

dunder

__init__

None
__init__(weight: Tensor | None = None, ignore_index: int = -100, reduction: str = 'mean', label_smoothing: float = 0.0)
source

Initialise the CrossEntropyLoss module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor, target: Tensor)
source

Compute the loss between predictions and targets.

Parameters

xTensor
Input tensor.
targetTensor
Input tensor.

Returns

Tensor

Scalar loss (or unreduced tensor depending on reduction).

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.