class

RotaryEmbedding

extendsModule

RotaryEmbedding(head_dim: int, max_position_embeddings: int, base: float = 10000.0)

source edit

Precomputed cos / sin tables for rotary positional embedding.

Owns no learnable parameters — a thin wrapper around two registered buffers so the tables move with .to(device=...) and serialise with the rest of the model state. forward() returns the precomputed (cos, sin) pair; callers pass them into lucid.nn.functional.apply_rotary_emb along with the query / key tensors.

Parameters

head_dimint

Per-head feature dim

d_\text{head}

(must be even).

max_position_embeddingsint

Largest sequence length the model will see.

basefloat= 10000.0

Frequency base

\theta_0

in the formula

\theta_i = \text{base}^{-2 i / d_\text{head}}

. Default 10000.0 per the RoFormer / LLaMA / GPT-NeoX convention; some long-context models (e.g. CodeLlama) use 1_000_000.

Notes

The cos_cached / sin_cached tables are built once at construction and registered as non-persistent buffers. They follow .to(device=...) automatically but are not saved in state_dict (RoPE has no learnable state — regenerate at load time). The module form is the right choice for any transformer that reuses the same max_position_embeddings across calls; the functional lucid.nn.functional.apply_rotary_emb consumes the cached pair and applies the rotation in place to q and k. See that function for the rotation math.

Examples

>>> import lucid.nn as nn
>>> rope = nn.RotaryEmbedding(head_dim=64, max_position_embeddings=2048)
>>> cos, sin = rope()
>>> cos.shape, sin.shape
((2048, 64), (2048, 64))

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(head_dim: int, max_position_embeddings: int, base: float = 10000.0)

source edit

Instance methods

forward

→tuple[Tensor, Tensor]

forward()

source edit

Return (cos, sin) lookup tables.

Shapes: each (max_position_embeddings, head_dim). Callers index into them with the position ids of the current minibatch.

class

RotaryEmbedding

extendsModule

RotaryEmbedding(head_dim: int, max_position_embeddings: int, base: float = 10000.0)

source edit

Precomputed cos / sin tables for rotary positional embedding.

Parameters

head_dimint

Per-head feature dim

d_\text{head}

(must be even).

max_position_embeddingsint

Largest sequence length the model will see.

basefloat= 10000.0

Frequency base

\theta_0

in the formula

\theta_i = \text{base}^{-2 i / d_\text{head}}

. Default 10000.0 per the RoFormer / LLaMA / GPT-NeoX convention; some long-context models (e.g. CodeLlama) use 1_000_000.

Notes

Examples

>>> import lucid.nn as nn
>>> rope = nn.RotaryEmbedding(head_dim=64, max_position_embeddings=2048)
>>> cos, sin = rope()
>>> cos.shape, sin.shape
((2048, 64), (2048, 64))

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(head_dim: int, max_position_embeddings: int, base: float = 10000.0)

source edit

Instance methods

forward

→tuple[Tensor, Tensor]

forward()

source edit

Return (cos, sin) lookup tables.

Shapes: each (max_position_embeddings, head_dim). Callers index into them with the position ids of the current minibatch.