apply_rotary_emb
→tuple[Tensor, Tensor]apply_rotary_emb(q: Tensor, k: Tensor, cos: Tensor, sin: Tensor, position_ids: Tensor | None = None)Apply Rotary Position Embedding (RoPE) to query and key tensors.
Encodes absolute position by rotating each consecutive pair of features through an angle that grows linearly with the token index. Unlike additive sinusoidal encodings, RoPE acts multiplicatively inside attention — it endows the dot product with an explicit dependence on the relative offset , giving the model translation-equivariance "for free". This is the encoding used by LLaMA, GPT-NeoX, PaLM, and most modern open-weights LLMs.
Parameters
qTensor(*, seq_len, d_head). Any leading
broadcast-compatible dims (batch, head, ...) are allowed.kTensor(*, seq_len, d_head).cosTensor(max_pos, d_head).
Typically constructed once by lucid.nn.RotaryEmbedding.sinTensor(max_pos, d_head).position_idsTensor= None(seq_len,) to gather from cos
/ sin. When None (default), the first seq_len rows
of the tables are used — appropriate for left-to-right
tokenisation. Pass custom IDs for sequence packing, KV caching,
or sliding-window attention.Returns
tuple[Tensor, Tensor](q_rotated, k_rotated) — same shapes as the inputs.
Notes
Mathematical form for each feature pair at position :
Lucid uses the "half-rotation" pairing — the HuggingFace / LLaMA layout — rather than the original paper's interleaved pairing . The two are equivalent up to a permutation of dimensions, and the half-rotation form is friendlier to contiguous matmul kernels.
Inside attention the rotated dot product satisfies , so it depends only on the relative offset — the property RoPE is designed to produce.
Examples
>>> import lucid
>>> from lucid.nn.functional import apply_rotary_emb
>>> q = lucid.randn(2, 8, 16, 64) # (B, H, S, d_head)
>>> k = lucid.randn(2, 8, 16, 64)
>>> cos = lucid.randn(64, 64) # precomputed up to max_pos=64
>>> sin = lucid.randn(64, 64)
>>> q_rot, k_rot = apply_rotary_emb(q, k, cos, sin)
>>> q_rot.shape, k_rot.shape
((2, 8, 16, 64), (2, 8, 16, 64))