cross_entropy

→Tensor

cross_entropy(x: Tensor, target: Tensor, weight: Tensor | None = None, ignore_index: int = -100, reduction: Reduction = 'mean', label_smoothing: float = 0.0)

source edit

Implementing kernel

Cross-entropy loss for multi-class classification.

The canonical training objective for categorical classifiers. Combines lucid.nn.functional.log_softmax and nll_loss into a single, numerically stable expression — operating directly on raw logits avoids the catastrophic cancellation that arises when softmax probabilities are taken to log(). Implements the full contract: per-class weight rescaling, ignore_index masking, and label-smoothing regularisation.

Parameters

xTensor

Raw logits of shape

(N, C)

(N, C, d_1, \dots, d_k)

targetTensor

Either integer class indices of shape

(N,)

(N, d_1, \dots, d_k)

or per-class probabilities of shape matching x.

weightTensor or None= None

Per-class weight vector of shape

(C,)

— useful for class-imbalanced training.

ignore_indexint= -100

Class index whose samples are skipped entirely (default -100). Common for masked / padded targets in sequence models.

reductionstr= 'mean'

"mean" (default), "sum", or "none". Under "mean", the divisor is the sum of effective sample weights (after weight and ignore_index), not the raw element count.

label_smoothingfloat= 0.0

Interpolation factor

\alpha \in [0, 1)

between hard one-hot targets and a uniform distribution (Szegedy et al. 2016). Acts as a regulariser by discouraging over-confident predictions.

Returns

Tensor

Scalar ("mean"/"sum") or per-sample tensor ("none").

Notes

Per-sample loss:

L_i = -\sum_c w_c \, y_{i,c} \, \log \mathrm{softmax}(x_i)_c

where $y_{i,c}$ is the (smoothed) target distribution. With label_smoothing = \alpha,

y_{i,c} = (1-\alpha)\,\mathbb{1}[c = t_i] + \alpha / C.

Gradient w.r.t. the logits has the well-known clean form $\mathrm{softmax}(x_i) - y_i$ (up to weighting), which is the reason cross-entropy is preferred over MSE for classification: no sigmoid saturation, no vanishing gradient.

Examples

>>> import lucid
>>> from lucid.nn.functional import cross_entropy
>>> logits = lucid.tensor([[2.0, 0.5, 0.1], [0.0, 1.5, 0.2]])
>>> target = lucid.tensor([0, 1])
>>> cross_entropy(logits, target)
Tensor(0.3490...)

Used by 2

>>> import lucid >>> from lucid.nn.functional import cross_entropy >>> logits = lucid.tensor([[2.0, 0.5, 0.1], [0.0, 1.5, 0.2]]) >>> target = lucid.tensor([0, 1]) >>> cross_entropy(logits, target) Tensor(0.3490...)