Embedding
ModuleEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)Learnable dense lookup table that maps integer token indices to vectors.
An embedding table is a matrix
where = num_embeddings (vocabulary size) and
= embedding_dim. The forward pass is a simple index
operation:
Each row is the dense representation (embedding vector) of token . Because indexing is not differentiable with respect to the integer indices themselves, gradients flow only into the rows of that were selected during the forward pass.
Padding index. When padding_idx is set, the corresponding
row is initialised to zero and its gradient is masked to zero during
backpropagation. This is the standard way to mark a special
<PAD> token whose embedding should never influence the model.
Max-norm renormalisation. When max_norm is set, every row
whose -norm exceeds max_norm is rescaled in place
at each forward call before the lookup:
Parameters
num_embeddingsintembedding_dimintpadding_idxint or None= Nonepadding_idx + num_embeddings. Default: None.max_normfloat or None= NoneNone.norm_typefloat= 2.0max_norm.
Default: 2.0.scale_grad_by_freqbool= FalseNotImplementedError if
True. Default: False.sparsebool= FalseFalse.deviceDeviceLike= NonedtypeDTypeLike= NoneAttributes
weightParameter, shape ``(num_embeddings, embedding_dim)``padding_idx is set, that
row is zeroed immediately after initialisation.Notes
- Input
x:(*)— integer tensor of arbitrary shape with values in[0, num_embeddings). - Output:
(*, embedding_dim)— the input shape with an extra trailing dimension of sizeembedding_dim.
The weight matrix is initialised from a standard normal distribution , which gives each embedding a unit-order magnitude. For downstream layers that are sensitive to input scale (e.g. transformers), consider dividing by after construction.
Embedding is commonly used in natural-language processing (token embeddings, position encodings), recommendation systems (item / user embeddings), and any domain where discrete categorical inputs must be projected into a continuous representation space.
EmbeddingBag : Efficient embedding lookup with per-bag reduction.
Examples
Simple token embedding for a vocabulary of 100 with 16-dim vectors:
>>> import lucid, lucid.nn as nn
>>> emb = nn.Embedding(num_embeddings=100, embedding_dim=16)
>>> idx = lucid.tensor([[1, 5, 3], [0, 2, 7]], dtype=lucid.int64)
>>> y = emb(idx)
>>> y.shape # (2, 3, 16)
(2, 3, 16)
Using ``padding_idx`` to mark a ``<PAD>`` token (index 0):
>>> emb_pad = nn.Embedding(50, 8, padding_idx=0)
>>> # Row 0 is always zero and never updated
>>> import lucid.linalg
>>> float(lucid.linalg.norm(emb_pad.weight[0])) == 0.0
True
>>> idx2 = lucid.tensor([0, 3, 0, 7], dtype=lucid.int64)
>>> emb_pad(idx2).shape
(4, 8)Methods (3)
__init__
→None__init__(num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)Initialise the Embedding module. See the class docstring for parameter semantics.
forward
→Tensorforward(x: Tensor)Look up embeddings for the given indices.
Parameters
inputTensorReturns
TensorTensor of embedding vectors of shape (*input.shape, embedding_dim).
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.