sinusoidal_embedding_2d
→Tensorsinusoidal_embedding_2d(height: int, width: int, embedding_dim: int, base: float = 10000.0, device: str = 'cpu')Build the 2-D sinusoidal positional encoding from DETR (Carion et al., 2020).
Extends the 1-D sinusoidal encoding to spatial feature maps by concatenating two independent encodings — one for the column index and one for the row index — each occupying half of the embedding dimension. This gives a position-unique vector for every grid cell without learnable parameters, and is the encoding used by DETR (§A.4), DiT, and other 2-D image transformers.
Parameters
heightintwidthintembedding_dimintsin / cos values, so the half itself must be even.basefloat= 10000.010_000.devicestr= 'cpu'"cpu" or "metal").Returns
Tensor(height * width, embedding_dim) float tensor, ordered
row-major (outer loop r ∈ [0, H), inner loop c ∈ [0, W)).
Raises
ValueErrorembedding_dim is not divisible by 4.Notes
Layout per position :
where each axis-table is the standard 1-D encoding at dimension . Flatten the result into a sequence of length and add it to flattened image features before the first transformer block.
Examples
>>> import lucid
>>> from lucid.nn.functional import sinusoidal_embedding_2d
>>> pe = sinusoidal_embedding_2d(height=16, width=16, embedding_dim=128)
>>> pe.shape
(256, 128)