class

LayerNorm

extendsModule

LayerNorm(normalized_shape: int | list[int] | tuple[int, ...], eps: float = 1e-05, elementwise_affine: bool = True, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++LayerNormBackwardclass

Layer normalization over the trailing dimensions of the input.

Normalises each sample independently by computing mean and variance over the axes defined by normalized_shape:

y = \frac{x - \mu}{\sqrt{\sigma^2 + \varepsilon}} \cdot \gamma + \beta

where $\mu$ and $\sigma^2$ are computed over the last len(normalized_shape) dimensions of the input tensor.

Unlike batch normalization, Layer Norm does not depend on the batch dimension, making it well-suited to variable-length sequences, transformer architectures, and settings where the batch size may be as small as 1.

Parameters

normalized_shapeint or list[int] or tuple[int, ...]

Shape of the trailing dimensions to normalize over. If an integer d is given it is treated as (d,), normalizing only the last axis. For a (N, T, C) input with normalized_shape=(C,) the mean and variance are computed independently for each (n, t) position.

epsfloat= 1e-05

Small constant added to the denominator for numerical stability. Default: 1e-5.

elementwise_affinebool= True

If True, learns per-element scale

\gamma

and (optionally) shift

\beta

of shape normalized_shape. If False, no affine parameters are created and the output is purely normalized. Default: True.

biasbool= True

Only meaningful when elementwise_affine=True. If False, the module learns only a scale

\gamma

with no additive shift. Default: True.

deviceDeviceLike= None

Device on which to allocate the learnable parameters. Default: None (uses the default device).

dtypeDTypeLike= None

Data type of the learnable parameters. Default: None (inherits from the input).

Attributes

weightParameter or None

Learnable per-element scale

\gamma

of shape normalized_shape. None when elementwise_affine=False.

biasParameter or None

Learnable per-element shift

\beta

of shape normalized_shape. None when elementwise_affine=False or bias=False.

Notes

Input: $(*, \text{normalized\_shape})$ — any leading batch dimensions followed by the normalized trailing dimensions.
Output: same shape as the input.

The mean and variance are computed with correction=0 (biased estimator), consistent with the standard layer-norm convention.
Weights are initialised to 1 and biases to 0 so that the transformation is an identity at the start of training.
When using elementwise_affine=True, bias=False the module matches the "scale-only" layer norm used in some modern architectures.

Examples

Normalize the last dimension of a sequence model's hidden states:
>>> import lucid
>>> import lucid.nn as nn
>>> ln = nn.LayerNorm(512)
>>> x = lucid.randn(32, 64, 512)   # (batch, seq_len, hidden_dim)
>>> out = ln(x)
>>> out.shape
(32, 64, 512)
Normalize over multiple trailing dimensions (e.g. height and width):
>>> ln2d = nn.LayerNorm((28, 28))
>>> x2d = lucid.randn(8, 1, 28, 28)
>>> out2d = ln2d(x2d)
>>> out2d.shape
(8, 1, 28, 28)

Used by 2

Constructors

dunder

init

→None

__init__(normalized_shape: int | list[int] | tuple[int, ...], eps: float = 1e-05, elementwise_affine: bool = True, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Initialise the LayerNorm module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Apply normalisation to the input tensor.

Parameters

inputTensor

Input tensor whose shape is documented in the class docstring.

Returns

Tensor

Normalised tensor of the same shape as input.

Normalize the last dimension of a sequence model's hidden states: >>> import lucid >>> import lucid.nn as nn >>> ln = nn.LayerNorm(512) >>> x = lucid.randn(32, 64, 512) # (batch, seq_len, hidden_dim) >>> out = ln(x) >>> out.shape (32, 64, 512) Normalize over multiple trailing dimensions (e.g. height and width): >>> ln2d = nn.LayerNorm((28, 28)) >>> x2d = lucid.randn(8, 1, 28, 28) >>> out2d = ln2d(x2d) >>> out2d.shape (8, 1, 28, 28)