class
Softmax
extends
ModuleSoftmax(dim: int | None = None)Softmax activation function.
Applies to each slice along dim:
Normalises the input to a proper probability distribution: all outputs are non-negative and sum to 1 along the specified dimension. Used as the final layer of multi-class classifiers and in attention mechanisms.
Parameters
dimint or None= NoneThe dimension along which softmax is computed. Must be specified
explicitly for most use cases;
None is retained for compatibility
but raises a warning at runtime. Default: None.Notes
- Input: — any shape.
- Output: — same shape as input; values along
dimsum to 1.
For numerical stability the implementation subtracts the maximum value
along dim before exponentiation (log-sum-exp trick), preventing
overflow without changing the result.
Examples
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.Softmax(dim=-1)
>>> x = lucid.tensor([[1.0, 2.0, 3.0]])
>>> m(x)
tensor([[0.0900, 0.2447, 0.6652]])
>>> # Attention weight normalisation over sequence length
>>> scores = lucid.randn(4, 8, 64) # (batch, heads, seq_len)
>>> weights = nn.Softmax(dim=-1)(scores)
>>> weights.shape
(4, 8, 64)Methods (3)
dunder
__init__
→None__init__(dim: int | None = None)Initialise the Softmax module. See the class docstring for parameter semantics.
fn
forward
→Tensorforward(x: Tensor)Apply the activation function element-wise.
Parameters
inputTensorInput tensor of arbitrary shape.
Returns
TensorOutput tensor of the same shape as input.
fn
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.