fn

sinusoidal_embedding

Tensor
sinusoidal_embedding(num_positions: int, embedding_dim: int, base: float = 10000.0, device: str = 'cpu')
source

Build the 1-D sinusoidal positional encoding table from "Attention Is All You Need".

Returns a fixed (non-learnable) lookup table that injects absolute position information into token embeddings without adding parameters. Successive frequencies form a geometric progression spanning wavelengths from 2π2\pi to 2πbase2\pi \cdot \text{base}, so the encoding's components vary at vastly different rates and uniquely identify each position even at large sequence lengths.

Parameters

num_positionsint
Number of distinct positions p[0,num_positions)p \in [0, \text{num\_positions}).
embedding_dimint
Per-position embedding size dd. Must be even — half the entries hold sin values and half hold cos values.
basefloat= 10000.0
Frequency base θ0\theta_0. Vaswani et al. use 10_000; larger values give longer effective context windows at the cost of finer per-step discrimination.
devicestr= 'cpu'
Target device ("cpu" or "metal") for the resulting buffer.

Returns

Tensor

(num_positions, embedding_dim) float tensor.

Raises

ValueError
If embedding_dim is not even.

Notes

Equation (5) of Vaswani et al. (2017):

PEp,2i=sin ⁣(p/base2i/d)PEp,2i+1=cos ⁣(p/base2i/d)\begin{aligned} \mathrm{PE}_{p,\,2i} &= \sin\!\left(p / \text{base}^{2i / d}\right) \\ \mathrm{PE}_{p,\,2i+1} &= \cos\!\left(p / \text{base}^{2i / d}\right) \end{aligned}

The table is pure (deterministic in its arguments) so callers can safely share the result across model instances when dimensions match. For training-time hot paths prefer the module form in lucid.nn, which caches the table as a buffer.

Examples

>>> import lucid
>>> from lucid.nn.functional import sinusoidal_embedding
>>> pe = sinusoidal_embedding(num_positions=128, embedding_dim=64)
>>> pe.shape
(128, 64)