fn
embedding_bag
→Tensorembedding_bag(x: Tensor, weight: Tensor, offsets: Tensor | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, mode: str = 'mean', sparse: bool = False, per_sample_weights: Tensor | None = None, include_last_offset: bool = False, padding_idx: int | None = None)Aggregate embeddings into per-bag pooled vectors.
Conceptually equivalent to looking up each index with
embedding and then reducing across the bag axis, but fused
into a single op that avoids materialising the per-token embedding
matrix — essential for very large vocabularies in recommendation
and NLP models.
Given a flat index sequence x partitioned into bags by
offsets, the i-th output row is
where reduce is one of sum, mean, or max.
Parameters
xTensorEither a 1-D index tensor (use with
offsets) or a 2-D
(num_bags, seq_len) tensor of indices where each row is a
bag of equal length.weightTensorEmbedding table of shape
(num_embeddings, embedding_dim).offsetsTensor= NoneRequired when
x is 1-D. Integer tensor whose i-th
element is the starting index of bag i within x.max_normfloat= NoneRenormalise embedding rows with norm exceeding
max_norm before lookup.norm_typefloat= 2.0 exponent for
max_norm. Default 2.0.scale_grad_by_freqbool= FalseScale gradients of each embedding row by inverse mini-batch
frequency.
modestr= 'mean'Bag reduction:
"sum", "mean" (default), or "max".sparsebool= FalseRequest a sparse gradient (accepted for compatibility).
per_sample_weightsTensor= NoneOptional per-element weights applied before reduction. Same
shape as
x (only valid for mode="sum" in most
reference implementations).include_last_offsetbool= FalseIf
True, offsets has length num_bags + 1 and its
last entry is the total number of indices in x.padding_idxint= NoneEmbedding row to mask out (its lookup result contributes zero).
Returns
TensorPooled output of shape (num_bags, embedding_dim).
Notes
Compared with embedding + manual reduction, embedding_bag
saves a full materialisation of the per-token table and fuses the
reduction into a single scatter-add (or scatter-max) pass.
Examples
>>> import lucid
>>> from lucid.nn.functional import embedding_bag
>>> w = lucid.randn(10, 4)
>>> ids = lucid.tensor([1, 2, 4, 5, 4, 3, 2, 9], dtype=lucid.int64)
>>> off = lucid.tensor([0, 4], dtype=lucid.int64)
>>> out = embedding_bag(ids, w, offsets=off, mode="mean")
>>> out.shape
(2, 4)