fn
rms_norm
→Tensorrms_norm(x: Tensor, normalized_shape: list[int] | tuple[int, ...], weight: Tensor | None = None, eps: float = 1e-08)Root-mean-square layer normalization (Zhang & Sennrich, 2019).
A simplified variant of layer_norm that drops mean
centering and the additive bias. Cheaper to compute and shown to
perform competitively in large language models (LLaMA, T5,
PaLM, ...), where its lower latency and memory footprint matter at
scale.
Parameters
xTensorInput whose last dimension matches
normalized_shape.normalized_shapelist of int or tuple of intTrailing dims to normalise over. Typically
(d,) where
d is the model hidden size.weightTensor= NonePer-element scale . Defaults to ones.
epsfloat= 1e-08Numerical safety added inside the square root.
Returns
TensorSame shape as x.
Notes
Math:
The dropped mean-centering step costs LayerNorm one extra reduction and a subtraction; RMSNorm trades that for a small amount of expressivity (no shift invariance). In practice the loss is negligible at scale.
Examples
>>> import lucid
>>> from lucid.nn.functional import rms_norm
>>> x = lucid.randn(2, 128, 512)
>>> y = rms_norm(x, normalized_shape=(512,))
>>> y.shape
(2, 128, 512)