rms_norm

→Tensor

rms_norm(x: Tensor, normalized_shape: list[int] | tuple[int, ...], weight: Tensor | None = None, eps: float = 1e-08)

source edit

Implementing kernel

C++rms_norm_opfree fn

Root-mean-square layer normalization (Zhang & Sennrich, 2019).

A simplified variant of layer_norm that drops mean centering and the additive bias. Cheaper to compute and shown to perform competitively in large language models (LLaMA, T5, PaLM, ...), where its lower latency and memory footprint matter at scale.

Parameters

xTensor

Input whose last dimension matches normalized_shape.

normalized_shapelist of int or tuple of int

Trailing dims to normalise over. Typically (d,) where d is the model hidden size.

weightTensor= None

Per-element scale

\gamma

. Defaults to ones.

epsfloat= 1e-08

Numerical safety added inside the square root.

Returns

Tensor

Same shape as x.

Notes

Math:

\begin{aligned} \text{RMS}(x) &= \sqrt{\frac{1}{|S|}\sum_{j \in S} x_j^2 + \epsilon} \\ y &= \gamma \cdot \frac{x}{\text{RMS}(x)} \end{aligned}

The dropped mean-centering step costs LayerNorm one extra reduction and a subtraction; RMSNorm trades that for a small amount of expressivity (no shift invariance). In practice the loss is negligible at scale.

Examples

>>> import lucid
>>> from lucid.nn.functional import rms_norm
>>> x = lucid.randn(2, 128, 512)
>>> y = rms_norm(x, normalized_shape=(512,))
>>> y.shape
(2, 128, 512)

Used by 2