class

Embedding

extendsModule

Embedding(num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++EmbeddingBackwardclass

Learnable dense lookup table that maps integer token indices to vectors.

An embedding table is a matrix $W \in \mathbb{R}^{V \times D}$ where $V$ = num_embeddings (vocabulary size) and $D$ = embedding_dim. The forward pass is a simple index operation:

y = W[\text{idx}]

Each row $W[i]$ is the dense representation (embedding vector) of token $i$ . Because indexing is not differentiable with respect to the integer indices themselves, gradients flow only into the rows of $W$ that were selected during the forward pass.

Padding index. When padding_idx is set, the corresponding row is initialised to zero and its gradient is masked to zero during backpropagation. This is the standard way to mark a special <PAD> token whose embedding should never influence the model.

Max-norm renormalisation. When max_norm is set, every row whose $L_p$ -norm exceeds max_norm is rescaled in place at each forward call before the lookup:

W[i] \leftarrow W[i] \cdot \frac{\texttt{max\_norm}}{\|W[i]\|_p} \quad \text{if } \|W[i]\|_p > \texttt{max\_norm}

Parameters

num_embeddingsint

Size of the embedding dictionary (vocabulary size

V

embedding_dimint

Dimensionality of each embedding vector (

D

padding_idxint or None= None

If provided, the embedding for this index is fixed at zero and receives no gradient updates. Negative values are normalised to padding_idx + num_embeddings. Default: None.

max_normfloat or None= None

If provided, rows with

L_p

-norm exceeding this value are renormalised in place at every forward call. Default: None.

norm_typefloat= 2.0

The

p

in the

L_p

-norm used by max_norm. Default: 2.0.

scale_grad_by_freqbool= False

Not yet implemented. Raises NotImplementedError if True. Default: False.

sparsebool= False

Accepted for API compatibility; sparse gradient emission is not yet supported. Default: False.

deviceDeviceLike= None

Device for the weight tensor.

dtypeDTypeLike= None

Data type for the weight tensor.

Attributes

weightParameter, shape (num_embeddings, embedding_dim)

The embedding matrix

W

. Rows are initialised from

\mathcal{N}(0, 1)

. If padding_idx is set, that row is zeroed immediately after initialisation.

Notes

Input x: (*) — integer tensor of arbitrary shape with values in [0, num_embeddings).
Output: (*, embedding_dim) — the input shape with an extra trailing dimension of size embedding_dim.

The weight matrix is initialised from a standard normal distribution $\mathcal{N}(0, 1)$ , which gives each embedding a unit-order magnitude. For downstream layers that are sensitive to input scale (e.g. transformers), consider dividing by $\sqrt{D}$ after construction.

Embedding is commonly used in natural-language processing (token embeddings, position encodings), recommendation systems (item / user embeddings), and any domain where discrete categorical inputs must be projected into a continuous representation space.

Examples

Simple token embedding for a vocabulary of 100 with 16-dim vectors:
>>> import lucid, lucid.nn as nn
>>> emb = nn.Embedding(num_embeddings=100, embedding_dim=16)
>>> idx = lucid.tensor([[1, 5, 3], [0, 2, 7]], dtype=lucid.int64)
>>> y = emb(idx)
>>> y.shape    # (2, 3, 16)
(2, 3, 16)
Using padding_idx to mark a <PAD> token (index 0):
>>> emb_pad = nn.Embedding(50, 8, padding_idx=0)
>>> # Row 0 is always zero and never updated
>>> import lucid.linalg
>>> float(lucid.linalg.norm(emb_pad.weight[0])) == 0.0
True
>>> idx2 = lucid.tensor([0, 3, 0, 7], dtype=lucid.int64)
>>> emb_pad(idx2).shape
(4, 8)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Initialise the Embedding module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Look up embeddings for the given indices.

Parameters

inputTensor

Tensor of integer indices.

Returns

Tensor

Tensor of embedding vectors of shape (*input.shape, embedding_dim).

Embedding(num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)

Simple token embedding for a vocabulary of 100 with 16-dim vectors: >>> import lucid, lucid.nn as nn >>> emb = nn.Embedding(num_embeddings=100, embedding_dim=16) >>> idx = lucid.tensor([[1, 5, 3], [0, 2, 7]], dtype=lucid.int64) >>> y = emb(idx) >>> y.shape # (2, 3, 16) (2, 3, 16) Using padding_idx to mark a <PAD> token (index 0): >>> emb_pad = nn.Embedding(50, 8, padding_idx=0) >>> # Row 0 is always zero and never updated >>> import lucid.linalg >>> float(lucid.linalg.norm(emb_pad.weight[0])) == 0.0 True >>> idx2 = lucid.tensor([0, 3, 0, 7], dtype=lucid.int64) >>> emb_pad(idx2).shape (4, 8)

__init__(num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: DeviceLike = None, dtype: DTypeLike = None)

Embedding

Implementing kernel

Parameters

Attributes

Notes

Examples

See Also

Used by 1

Constructors

init

Instance methods

extra_repr

forward

Parameters

Returns

Embedding

Implementing kernel

Parameters

Attributes

Notes

Examples

See Also

Used by 1

Constructors

init

Instance methods

extra_repr

forward

Parameters

Returns