fn
layer_norm
→Tensorlayer_norm(x: Tensor, normalized_shape: list[int] | tuple[int, ...], weight: Tensor | None = None, bias: Tensor | None = None, eps: float = 1e-05)Layer normalization (Ba, Kiros & Hinton, 2016).
Normalises each sample independently across the last
normalized_shape dimensions. Unlike batch_norm, no
batch statistics are involved — making LayerNorm the default
choice for transformers and other models where batches may be
small or sequence lengths variable.
Parameters
xTensorInput whose trailing dims match
normalized_shape.normalized_shapelist of int or tuple of intTrailing dims to normalise over. E.g.
(d,) for a
token-wise normalisation in a transformer with hidden size
d.weightTensor= NonePer-element scale of shape
normalized_shape.
Defaults to ones.biasTensor= NonePer-element shift of shape
normalized_shape.
Defaults to zeros.epsfloat= 1e-05Numerical safety added inside the square root.
Returns
TensorSame shape as x.
Notes
Math (reduction taken over the last axes, ):
Because the reduction is per-sample, behaviour is identical at train and eval time — no running statistics needed. This is what makes LayerNorm so prevalent in sequence models (RNNs, transformers).
Examples
>>> import lucid
>>> from lucid.nn.functional import layer_norm
>>> x = lucid.randn(2, 10, 64)
>>> y = layer_norm(x, normalized_shape=(64,))
>>> y.shape
(2, 10, 64)