class

RotaryEmbedding

extendsModule
RotaryEmbedding(head_dim: int, max_position_embeddings: int, base: float = 10000.0)
source

Precomputed cos / sin tables for rotary positional embedding.

Owns no learnable parameters — a thin wrapper around two registered buffers so the tables move with .to(device=...) and serialise with the rest of the model state.

Args: head_dim: Per-head feature dim d_head (must be even). max_position_embeddings: Largest sequence length the model will see. base: Frequency base θ_0 in the formula θ_i = base ** (-2 i / d_head). Defaults to 10000.0 per the RoFormer / LLaMA / GPT-NeoX convention; some models (e.g. CodeLlama at long context) use 1_000_000.

Forward: forward() returns the precomputed (cos, sin) pair. Callers pass them into lucid.nn.functional.apply_rotary_emb along with the query / key tensors.

Notes

The cos_cached / sin_cached tables are built once at construction and registered as non-persistent buffers. They follow .to(device=...) automatically but are not saved in state_dict (RoPE has no learnable state — regenerate at load time). The module form is the right choice for any transformer that reuses the same max_position_embeddings across calls; the functional lucid.nn.functional.apply_rotary_pos_emb consumes the cached pair and applies the rotation in place to q and k. See that function for the rotation math.

Examples

>>> import lucid.nn as nn
>>> rope = nn.RotaryEmbedding(head_dim=64, max_position_embeddings=2048)
>>> cos, sin = rope()
>>> cos.shape, sin.shape
((2048, 64), (2048, 64))

Methods (2)

dunder

__init__

None
__init__(head_dim: int, max_position_embeddings: int, base: float = 10000.0)
source
fn

forward

tuple[Tensor, Tensor]
forward()
source

Return (cos, sin) lookup tables.

Shapes: each (max_position_embeddings, head_dim). Callers index into them with the position ids of the current minibatch.