softmax
→Tensorsoftmax(x: Tensor, dim: int | None = None)Apply the softmax function along a dimension.
Converts a vector of real-valued logits into a probability
distribution: the outputs are non-negative and sum to one along
dim. The central tool for multi-class classification heads and
attention weights.
Parameters
xTensordimint= None-1
(the last axis).Returns
TensorSame-shape tensor whose entries along dim form a probability
simplex (each non-negative, summing to 1).
Notes
Mathematical definition (per-vector along dim):
A naïve implementation overflows for large positive x; the engine
uses the standard log-sum-exp shift
to evaluate it in finite precision.
For loss computation, prefer log_softmax followed by
nll_loss (or cross_entropy end-to-end), since the
composition of log(softmax(...)) loses precision. The gradient
has the convenient closed form
—
the Jacobian factorises out cleanly during backprop through softmax-CE.
Examples
>>> import lucid
>>> from lucid.nn.functional import softmax
>>> logits = lucid.tensor([[1.0, 2.0, 3.0]])
>>> p = softmax(logits, dim=1)
>>> p
Tensor([[0.0900, 0.2447, 0.6652]])
>>> p.sum(dim=1)
Tensor([1.0000])