class

Softmax

extendsModule

Softmax(dim: int | None = None)

source edit

Implementing kernel

C++SoftmaxBackwardclass

Softmax activation function.

Applies to each slice along dim:

\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}

Normalises the input to a proper probability distribution: all outputs are non-negative and sum to 1 along the specified dimension. Used as the final layer of multi-class classifiers and in attention mechanisms.

Parameters

dimint or None= None

The dimension along which softmax is computed. Must be specified explicitly for most use cases; None is retained for compatibility but raises a warning at runtime. Default: None.

Notes

Input: $(*)$ — any shape.
Output: $(*)$ — same shape as input; values along dim sum to 1.

For numerical stability the implementation subtracts the maximum value along dim before exponentiation (log-sum-exp trick), preventing overflow without changing the result.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.Softmax(dim=-1)
>>> x = lucid.tensor([[1.0, 2.0, 3.0]])
>>> m(x)
tensor([[0.0900, 0.2447, 0.6652]])
>>> # Attention weight normalisation over sequence length
>>> scores = lucid.randn(4, 8, 64)   # (batch, heads, seq_len)
>>> weights = nn.Softmax(dim=-1)(scores)
>>> weights.shape
(4, 8, 64)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(dim: int | None = None)

source edit

Initialise the Softmax module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Apply the activation function element-wise.

Parameters

inputTensor

Input tensor of arbitrary shape.

Returns

Tensor

Output tensor of the same shape as input.

>>> import lucid >>> import lucid.nn as nn >>> m = nn.Softmax(dim=-1) >>> x = lucid.tensor([[1.0, 2.0, 3.0]]) >>> m(x) tensor([[0.0900, 0.2447, 0.6652]]) >>> # Attention weight normalisation over sequence length >>> scores = lucid.randn(4, 8, 64) # (batch, heads, seq_len) >>> weights = nn.Softmax(dim=-1)(scores) >>> weights.shape (4, 8, 64)