fn

apply_rotary_emb

tuple[Tensor, Tensor]
apply_rotary_emb(q: Tensor, k: Tensor, cos: Tensor, sin: Tensor, position_ids: Tensor | None = None)
source

Apply Rotary Position Embedding (RoPE) to query and key tensors.

Encodes absolute position by rotating each consecutive pair of features through an angle that grows linearly with the token index. Unlike additive sinusoidal encodings, RoPE acts multiplicatively inside attention — it endows the dot product qmknq_m \cdot k_n with an explicit dependence on the relative offset mnm - n, giving the model translation-equivariance "for free". This is the encoding used by LLaMA, GPT-NeoX, PaLM, and most modern open-weights LLMs.

Parameters

qTensor
Query tensor of shape (*, seq_len, d_head). Any leading broadcast-compatible dims (batch, head, ...) are allowed.
kTensor
Key tensor of shape (*, seq_len, d_head).
cosTensor
Precomputed cosine table of shape (max_pos, d_head). Typically constructed once by lucid.nn.RotaryEmbedding.
sinTensor
Precomputed sine table of shape (max_pos, d_head).
position_idsTensor= None
Integer positions of shape (seq_len,) to gather from cos / sin. When None (default), the first seq_len rows of the tables are used — appropriate for left-to-right tokenisation. Pass custom IDs for sequence packing, KV caching, or sliding-window attention.

Returns

tuple[Tensor, Tensor]

(q_rotated, k_rotated) — same shapes as the inputs.

Notes

Mathematical form for each feature pair (xi,xi+d/2)(x_i, x_{i+d/2}) at position pp:

(xixi+d/2)=(cosθp,isinθp,isinθp,icosθp,i)(xixi+d/2)\begin{pmatrix} x'_i \\ x'_{i+d/2} \end{pmatrix} = \begin{pmatrix} \cos\theta_{p,i} & -\sin\theta_{p,i} \\ \sin\theta_{p,i} & \cos\theta_{p,i} \end{pmatrix} \begin{pmatrix} x_i \\ x_{i+d/2} \end{pmatrix}

Lucid uses the "half-rotation" pairing (xi,xi+d/2)(x_i, x_{i+d/2}) — the HuggingFace / LLaMA layout — rather than the original paper's interleaved pairing (x2i,x2i+1)(x_{2i}, x_{2i+1}). The two are equivalent up to a permutation of dimensions, and the half-rotation form is friendlier to contiguous matmul kernels.

Inside attention the rotated dot product satisfies qmkn=qmR(mn)knq_m'^\top k_n' = q_m^\top R(m-n) \, k_n, so it depends only on the relative offset — the property RoPE is designed to produce.

Examples

>>> import lucid
>>> from lucid.nn.functional import apply_rotary_emb
>>> q = lucid.randn(2, 8, 16, 64)          # (B, H, S, d_head)
>>> k = lucid.randn(2, 8, 16, 64)
>>> cos = lucid.randn(64, 64)              # precomputed up to max_pos=64
>>> sin = lucid.randn(64, 64)
>>> q_rot, k_rot = apply_rotary_emb(q, k, cos, sin)
>>> q_rot.shape, k_rot.shape
((2, 8, 16, 64), (2, 8, 16, 64))