fn

softmax

Tensor
softmax(x: Tensor, dim: int | None = None)
source

Apply the softmax function along a dimension.

Converts a vector of real-valued logits into a probability distribution: the outputs are non-negative and sum to one along dim. The central tool for multi-class classification heads and attention weights.

Parameters

xTensor
Input tensor of any shape (the "logits").
dimint= None
Dimension along which softmax is computed. Defaults to -1 (the last axis).

Returns

Tensor

Same-shape tensor whose entries along dim form a probability simplex (each non-negative, summing to 1).

Notes

Mathematical definition (per-vector along dim):

softmax(x)i=exijexj\text{softmax}(x)_i = \frac{e^{x_i}}{\sum_j e^{x_j}}

A naïve implementation overflows for large positive x; the engine uses the standard log-sum-exp shift xiximaxjxjx_i \mapsto x_i - \max_j x_j to evaluate it in finite precision.

For loss computation, prefer log_softmax followed by nll_loss (or cross_entropy end-to-end), since the composition of log(softmax(...)) loses precision. The gradient has the convenient closed form pi/xj=pi(δijpj)\partial p_i / \partial x_j = p_i (\delta_{ij} - p_j) — the Jacobian factorises out cleanly during backprop through softmax-CE.

Examples

>>> import lucid
>>> from lucid.nn.functional import softmax
>>> logits = lucid.tensor([[1.0, 2.0, 3.0]])
>>> p = softmax(logits, dim=1)
>>> p
Tensor([[0.0900, 0.2447, 0.6652]])
>>> p.sum(dim=1)
Tensor([1.0000])