RotaryEmbedding
ModuleRotaryEmbedding(head_dim: int, max_position_embeddings: int, base: float = 10000.0)Precomputed cos / sin tables for rotary positional embedding.
Owns no learnable parameters — a thin wrapper around two registered
buffers so the tables move with .to(device=...) and serialise with
the rest of the model state.
Args:
head_dim: Per-head feature dim d_head (must be even).
max_position_embeddings: Largest sequence length the model will see.
base: Frequency base θ_0 in the formula
θ_i = base ** (-2 i / d_head). Defaults to 10000.0
per the RoFormer / LLaMA / GPT-NeoX convention; some
models (e.g. CodeLlama at long context) use 1_000_000.
Forward:
forward() returns the precomputed (cos, sin) pair. Callers
pass them into lucid.nn.functional.apply_rotary_emb
along with the query / key tensors.
Notes
The cos_cached / sin_cached tables are built once at
construction and registered as non-persistent buffers. They follow
.to(device=...) automatically but are not saved in state_dict
(RoPE has no learnable state — regenerate at load time). The module
form is the right choice for any transformer that reuses the same
max_position_embeddings across calls; the functional
lucid.nn.functional.apply_rotary_pos_emb consumes the cached
pair and applies the rotation in place to q and k. See that
function for the rotation math.
Examples
>>> import lucid.nn as nn
>>> rope = nn.RotaryEmbedding(head_dim=64, max_position_embeddings=2048)
>>> cos, sin = rope()
>>> cos.shape, sin.shape
((2048, 64), (2048, 64))Methods (2)
__init__
→None__init__(head_dim: int, max_position_embeddings: int, base: float = 10000.0)forward
→tuple[Tensor, Tensor]forward()Return (cos, sin) lookup tables.
Shapes: each (max_position_embeddings, head_dim). Callers index
into them with the position ids of the current minibatch.