fn

cross_entropy

Tensor
cross_entropy(x: Tensor, target: Tensor, weight: Tensor | None = None, ignore_index: int = -100, reduction: str = 'mean', label_smoothing: float = 0.0)
source

Cross-entropy loss for multi-class classification.

The canonical training objective for categorical classifiers. Combines lucid.nn.functional.log_softmax and nll_loss into a single, numerically stable expression — operating directly on raw logits avoids the catastrophic cancellation that arises when softmax probabilities are taken to log(). Implements the full contract: per-class weight rescaling, ignore_index masking, and label-smoothing regularisation.

Parameters

xTensor
Raw logits of shape (N,C)(N, C) or (N,C,d1,,dk)(N, C, d_1, \dots, d_k).
targetTensor
Either integer class indices of shape (N,)(N,) / (N,d1,,dk)(N, d_1, \dots, d_k) or per-class probabilities of shape matching x.
weightTensor or None= None
Per-class weight vector of shape (C,)(C,) — useful for class-imbalanced training.
ignore_indexint= -100
Class index whose samples are skipped entirely (default -100). Common for masked / padded targets in sequence models.
reductionstr= 'mean'
"mean" (default), "sum", or "none". Under "mean", the divisor is the sum of effective sample weights (after weight and ignore_index), not the raw element count.
label_smoothingfloat= 0.0
Interpolation factor α[0,1)\alpha \in [0, 1) between hard one-hot targets and a uniform distribution (Szegedy et al. 2016). Acts as a regulariser by discouraging over-confident predictions.

Returns

Tensor

Scalar ("mean"/"sum") or per-sample tensor ("none").

Notes

Per-sample loss:

Li=cwcyi,clogsoftmax(xi)cL_i = -\sum_c w_c \, y_{i,c} \, \log \mathrm{softmax}(x_i)_c

where yi,cy_{i,c} is the (smoothed) target distribution. With label_smoothing = \alpha,

yi,c=(1α)1[c=ti]+α/C.y_{i,c} = (1-\alpha)\,\mathbb{1}[c = t_i] + \alpha / C.

Gradient w.r.t. the logits has the well-known clean form softmax(xi)yi\mathrm{softmax}(x_i) - y_i (up to weighting), which is the reason cross-entropy is preferred over MSE for classification: no sigmoid saturation, no vanishing gradient.

Examples

>>> import lucid
>>> from lucid.nn.functional import cross_entropy
>>> logits = lucid.tensor([[2.0, 0.5, 0.1], [0.0, 1.5, 0.2]])
>>> target = lucid.tensor([0, 1])
>>> cross_entropy(logits, target)
Tensor(0.3490...)