nn.functional.rotary_pos_embedding¶
- lucid.nn.functional.rotary_pos_embedding(input_: Tensor, position_ids: Tensor | None = None, interleaved: bool = True) Tensor¶
The rotary_pos_embedding function applies Rotary Position Embedding (RoPE) to the last dimension of an input tensor. It rotates each even/odd channel pair using position-dependent angles and preserves the original tensor shape.
Function Signature¶
def rotary_pos_embedding(
input_: Tensor,
position_ids: Tensor | None = None,
interleaved: bool = True,
) -> Tensor
Parameters¶
input_ (Tensor): Input tensor of shape (…, seq_len, embed_dim). The last dimension embed_dim must be even.
position_ids (Tensor | None, optional): Optional 1-D position tensor of shape (seq_len,). If None, positions are generated as 0, 1, …, seq_len - 1.
interleaved (bool, optional): If True, applies adjacent-pair rotation layout (0,1), (2,3), …. If False, applies half-split layout between first and second half channels.
Returns¶
Tensor: Tensor with the same shape as input_, after rotary position embedding is applied.
RoPE Formulation¶
RoPE applies a position-dependent rotation to each query/key channel pair. For position index \(m\) and pair index \(i\):
2D Rotation Block¶
For a single 2D pair \((u, v)\), RoPE uses:
so the rotated pair is \(\mathbf{R}(\phi_{m,i})[u, v]^\top\).
Generalized \(D\)-Dimensional Rotation¶
For even \(D\), split channels into \(D/2\) independent 2D subspaces. The full rotary matrix is block diagonal:
equivalently:
How It Is Applied in Attention¶
RoPE is applied after linear projection and before attention score computation:
Then:
with:
This is the key point: relative position \((n-m)\) appears directly in the attention dot product, consistent with RoFormer Eq. (16).
Computationally Efficient Equivalent Form¶
Instead of explicit block-matrix multiplication, implementation uses:
where \(\mathrm{rotate_half}(x)\) swaps each pair as \((x_{2i}, x_{2i+1}) \mapsto (-x_{2i+1}, x_{2i})\). This is equivalent to the efficient realization described in RoFormer Eq. (34).
Implementation Notes¶
The function expects the sequence axis at -2 and embedding axis at -1.
Raises ValueError when embed_dim is odd.
RoPE is parameter-free and deterministic.
Examples¶
>>> import lucid
>>> import lucid.nn.functional as F
>>> x = lucid.random.randn(2, 8, 64) # (batch, seq_len, embed_dim)
>>> y = F.rotary_pos_embedding(x)
>>> print(y.shape)
(2, 8, 64)
>>> q = lucid.random.randn(2, 12, 8, 64) # (batch, heads, seq_len, head_dim)
>>> q_rope = F.rotary_pos_embedding(q)
>>> print(q_rope.shape)
(2, 12, 8, 64)