sinusoidal_embedding

→Tensor

sinusoidal_embedding(num_positions: int, embedding_dim: int, base: float = 10000.0, device: str = 'cpu')

source edit

Build the 1-D sinusoidal positional encoding table from "Attention Is All You Need".

Returns a fixed (non-learnable) lookup table that injects absolute position information into token embeddings without adding parameters. Successive frequencies form a geometric progression spanning wavelengths from $2\pi$ to $2\pi \cdot \text{base}$ , so the encoding's components vary at vastly different rates and uniquely identify each position even at large sequence lengths.

Parameters

num_positionsint

Number of distinct positions

p \in [0, \text{num\_positions})

embedding_dimint

Per-position embedding size

d

. Must be even — half the entries hold sin values and half hold cos values.

basefloat= 10000.0

Frequency base

\theta_0

. Vaswani et al. use 10_000; larger values give longer effective context windows at the cost of finer per-step discrimination.

devicestr= 'cpu'

Target device ("cpu" or "metal") for the resulting buffer.

Returns

Tensor

(num_positions, embedding_dim) float tensor.

Raises

ValueError

If embedding_dim is not even.

Notes

Equation (5) of Vaswani et al. (2017):

\begin{aligned} \mathrm{PE}_{p,\,2i} &= \sin\!\left(p / \text{base}^{2i / d}\right) \\ \mathrm{PE}_{p,\,2i+1} &= \cos\!\left(p / \text{base}^{2i / d}\right) \end{aligned}

The table is pure (deterministic in its arguments) so callers can safely share the result across model instances when dimensions match. For training-time hot paths prefer the module form in lucid.nn, which caches the table as a buffer.

Examples

>>> import lucid
>>> from lucid.nn.functional import sinusoidal_embedding
>>> pe = sinusoidal_embedding(num_positions=128, embedding_dim=64)
>>> pe.shape
(128, 64)

Used by 2