class

Softmax

extendsModule
Softmax(dim: int | None = None)
source

Softmax activation function.

Applies to each slice along dim:

Softmax(xi)=exijexj\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}

Normalises the input to a proper probability distribution: all outputs are non-negative and sum to 1 along the specified dimension. Used as the final layer of multi-class classifiers and in attention mechanisms.

Parameters

dimint or None= None
The dimension along which softmax is computed. Must be specified explicitly for most use cases; None is retained for compatibility but raises a warning at runtime. Default: None.

Notes

  • Input: ()(*) — any shape.
  • Output: ()(*) — same shape as input; values along dim sum to 1.

For numerical stability the implementation subtracts the maximum value along dim before exponentiation (log-sum-exp trick), preventing overflow without changing the result.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.Softmax(dim=-1)
>>> x = lucid.tensor([[1.0, 2.0, 3.0]])
>>> m(x)
tensor([[0.0900, 0.2447, 0.6652]])
>>> # Attention weight normalisation over sequence length
>>> scores = lucid.randn(4, 8, 64)   # (batch, heads, seq_len)
>>> weights = nn.Softmax(dim=-1)(scores)
>>> weights.shape
(4, 8, 64)

Methods (3)

dunder

__init__

None
__init__(dim: int | None = None)
source

Initialise the Softmax module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Apply the activation function element-wise.

Parameters

inputTensor
Input tensor of arbitrary shape.

Returns

Tensor

Output tensor of the same shape as input.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.