apply_rotary_emb

→tuple[Tensor, Tensor]

apply_rotary_emb(q: Tensor, k: Tensor, cos: Tensor, sin: Tensor, position_ids: Tensor | None = None, interleaved: bool = False)

source edit

Apply Rotary Position Embedding (RoPE) to query and key tensors.

Encodes absolute position by rotating each consecutive pair of features through an angle that grows linearly with the token index. Unlike additive sinusoidal encodings, RoPE acts multiplicatively inside attention — it endows the dot product $q_m \cdot k_n$ with an explicit dependence on the relative offset $m - n$ , giving the model translation-equivariance "for free". This is the encoding used by LLaMA, GPT-NeoX, PaLM, and most modern open-weights LLMs.

Parameters

qTensor

Query tensor of shape (*, seq_len, d_head). Any leading broadcast-compatible dims (batch, head, ...) are allowed.

kTensor

Key tensor of shape (*, seq_len, d_head).

cosTensor

Precomputed cosine table of shape (max_pos, d_head). Typically constructed once by lucid.nn.RotaryEmbedding.

sinTensor

Precomputed sine table of shape (max_pos, d_head).

position_idsTensor= None

Integer positions of shape (seq_len,) to gather from cos / sin. When None (default), the first seq_len rows of the tables are used — appropriate for left-to-right tokenisation. Pass custom IDs for sequence packing, KV caching, or sliding-window attention.

interleavedbool= False

Selects the feature-pairing convention (default False). When False the half-split pairing

(x_i, x_{i+d/2})

of the LLaMA / GPT-NeoX family is used and the cos / sin tables are expected in the half-split layout (cos = [c_0, ..., c_{d/2-1}, c_0, ..., c_{d/2-1}]). When True the original RoPE pairing

(x_{2i}, x_{2i+1})

of Su et al. (2021) is used and the tables are expected to repeat each frequency twice (cos = [c_0, c_0, c_1, c_1, ...]). RoFormer reference checkpoints require interleaved=True.

Returns

tuple[Tensor, Tensor]

(q_rotated, k_rotated) — same shapes as the inputs.

Notes

Mathematical form for each feature pair $(x_i, x_{i+d/2})$ at position $p$ :

\begin{pmatrix} x'_i \\ x'_{i+d/2} \end{pmatrix} = \begin{pmatrix} \cos\theta_{p,i} & -\sin\theta_{p,i} \\ \sin\theta_{p,i} & \cos\theta_{p,i} \end{pmatrix} \begin{pmatrix} x_i \\ x_{i+d/2} \end{pmatrix}

By default Lucid uses the "half-rotation" pairing $(x_i, x_{i+d/2})$ — the LLaMA / GPT-NeoX layout — rather than the original paper's interleaved pairing $(x_{2i}, x_{2i+1})$ . The two are equivalent up to a permutation of dimensions, and the half-rotation form is friendlier to contiguous matmul kernels. Set interleaved=True to opt into the original paired layout required by RoFormer reference checkpoints.

Inside attention the rotated dot product satisfies $q_m'^\top k_n' = q_m^\top R(m-n) \, k_n$ , so it depends only on the relative offset — the property RoPE is designed to produce.

Examples

>>> import lucid
>>> from lucid.nn.functional import apply_rotary_emb
>>> q = lucid.randn(2, 8, 16, 64)          # (B, H, S, d_head)
>>> k = lucid.randn(2, 8, 16, 64)
>>> cos = lucid.randn(64, 64)              # precomputed up to max_pos=64
>>> sin = lucid.randn(64, 64)
>>> q_rot, k_rot = apply_rotary_emb(q, k, cos, sin)
>>> q_rot.shape, k_rot.shape
((2, 8, 16, 64), (2, 8, 16, 64))

Used by 1

lucid.nn.functional

>>> import lucid >>> from lucid.nn.functional import apply_rotary_emb >>> q = lucid.randn(2, 8, 16, 64) # (B, H, S, d_head) >>> k = lucid.randn(2, 8, 16, 64) >>> cos = lucid.randn(64, 64) # precomputed up to max_pos=64 >>> sin = lucid.randn(64, 64) >>> q_rot, k_rot = apply_rotary_emb(q, k, cos, sin) >>> q_rot.shape, k_rot.shape ((2, 8, 16, 64), (2, 8, 16, 64))