fn

rms_norm

Tensor
rms_norm(x: Tensor, normalized_shape: list[int] | tuple[int, ...], weight: Tensor | None = None, eps: float = 1e-08)
source

Root-mean-square layer normalization (Zhang & Sennrich, 2019).

A simplified variant of layer_norm that drops mean centering and the additive bias. Cheaper to compute and shown to perform competitively in large language models (LLaMA, T5, PaLM, ...), where its lower latency and memory footprint matter at scale.

Parameters

xTensor
Input whose last dimension matches normalized_shape.
normalized_shapelist of int or tuple of int
Trailing dims to normalise over. Typically (d,) where d is the model hidden size.
weightTensor= None
Per-element scale γ\gamma. Defaults to ones.
epsfloat= 1e-08
Numerical safety added inside the square root.

Returns

Tensor

Same shape as x.

Notes

Math:

RMS(x)=1SjSxj2+ϵy=γxRMS(x)\begin{aligned} \text{RMS}(x) &= \sqrt{\frac{1}{|S|}\sum_{j \in S} x_j^2 + \epsilon} \\ y &= \gamma \cdot \frac{x}{\text{RMS}(x)} \end{aligned}

The dropped mean-centering step costs LayerNorm one extra reduction and a subtraction; RMSNorm trades that for a small amount of expressivity (no shift invariance). In practice the loss is negligible at scale.

Examples

>>> import lucid
>>> from lucid.nn.functional import rms_norm
>>> x = lucid.randn(2, 128, 512)
>>> y = rms_norm(x, normalized_shape=(512,))
>>> y.shape
(2, 128, 512)