class

Embedding

extendsModule
Embedding(num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)
source

Learnable dense lookup table that maps integer token indices to vectors.

An embedding table is a matrix WRV×DW \in \mathbb{R}^{V \times D} where VV = num_embeddings (vocabulary size) and DD = embedding_dim. The forward pass is a simple index operation:

y=W[idx]y = W[\text{idx}]

Each row W[i]W[i] is the dense representation (embedding vector) of token ii. Because indexing is not differentiable with respect to the integer indices themselves, gradients flow only into the rows of WW that were selected during the forward pass.

Padding index. When padding_idx is set, the corresponding row is initialised to zero and its gradient is masked to zero during backpropagation. This is the standard way to mark a special <PAD> token whose embedding should never influence the model.

Max-norm renormalisation. When max_norm is set, every row whose LpL_p-norm exceeds max_norm is rescaled in place at each forward call before the lookup:

W[i]W[i]max_normW[i]pif W[i]p>max_normW[i] \leftarrow W[i] \cdot \frac{\texttt{max\_norm}}{\|W[i]\|_p} \quad \text{if } \|W[i]\|_p > \texttt{max\_norm}

Parameters

num_embeddingsint
Size of the embedding dictionary (vocabulary size VV).
embedding_dimint
Dimensionality of each embedding vector (DD).
padding_idxint or None= None
If provided, the embedding for this index is fixed at zero and receives no gradient updates. Negative values are normalised to padding_idx + num_embeddings. Default: None.
max_normfloat or None= None
If provided, rows with LpL_p-norm exceeding this value are renormalised in place at every forward call. Default: None.
norm_typefloat= 2.0
The pp in the LpL_p-norm used by max_norm. Default: 2.0.
scale_grad_by_freqbool= False
Not yet implemented. Raises NotImplementedError if True. Default: False.
sparsebool= False
Accepted for API compatibility; sparse gradient emission is not yet supported. Default: False.
deviceDeviceLike= None
Device for the weight tensor.
dtypeDTypeLike= None
Data type for the weight tensor.

Attributes

weightParameter, shape ``(num_embeddings, embedding_dim)``
The embedding matrix WW. Rows are initialised from N(0,1)\mathcal{N}(0, 1). If padding_idx is set, that row is zeroed immediately after initialisation.

Notes

  • Input x: (*) — integer tensor of arbitrary shape with values in [0, num_embeddings).
  • Output: (*, embedding_dim) — the input shape with an extra trailing dimension of size embedding_dim.

The weight matrix is initialised from a standard normal distribution N(0,1)\mathcal{N}(0, 1), which gives each embedding a unit-order magnitude. For downstream layers that are sensitive to input scale (e.g. transformers), consider dividing by D\sqrt{D} after construction.

Embedding is commonly used in natural-language processing (token embeddings, position encodings), recommendation systems (item / user embeddings), and any domain where discrete categorical inputs must be projected into a continuous representation space.

EmbeddingBag : Efficient embedding lookup with per-bag reduction.

Examples

Simple token embedding for a vocabulary of 100 with 16-dim vectors:
>>> import lucid, lucid.nn as nn
>>> emb = nn.Embedding(num_embeddings=100, embedding_dim=16)
>>> idx = lucid.tensor([[1, 5, 3], [0, 2, 7]], dtype=lucid.int64)
>>> y = emb(idx)
>>> y.shape    # (2, 3, 16)
(2, 3, 16)
Using ``padding_idx`` to mark a ``<PAD>`` token (index 0):
>>> emb_pad = nn.Embedding(50, 8, padding_idx=0)
>>> # Row 0 is always zero and never updated
>>> import lucid.linalg
>>> float(lucid.linalg.norm(emb_pad.weight[0])) == 0.0
True
>>> idx2 = lucid.tensor([0, 3, 0, 7], dtype=lucid.int64)
>>> emb_pad(idx2).shape
(4, 8)

Methods (3)

dunder

__init__

None
__init__(num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the Embedding module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Look up embeddings for the given indices.

Parameters

inputTensor
Tensor of integer indices.

Returns

Tensor

Tensor of embedding vectors of shape (*input.shape, embedding_dim).

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.