class

TransformerEncoder

extendsModule
TransformerEncoder(encoder_layer: TransformerEncoderLayer, num_layers: int, norm: Module | None = None)
source

A stack of NN identical TransformerEncoderLayer modules.

The encoder maps a source sequence of continuous representations (x1,,xS)(x_1, \ldots, x_S) into another sequence of the same shape. Each layer refines the representation by attending to all positions simultaneously (self-attention has no ordering constraint), allowing every position to gather context from the entire sequence in a single forward pass.

Formally, letting Layeri\text{Layer}_i denote the ii-th encoder layer:

h(0)=srch^{(0)} = \text{src} h(i)=Layeri ⁣(h(i1),  mask,  src_key_padding_mask),i=1,,Nh^{(i)} = \text{Layer}_i\!\left(h^{(i-1)},\; \text{mask},\; \text{src\_key\_padding\_mask}\right), \quad i = 1, \ldots, N output={LayerNorm ⁣(h(N))if norm is seth(N)otherwise\text{output} = \begin{cases} \text{LayerNorm}\!\left(h^{(N)}\right) & \text{if norm is set} \\ h^{(N)} & \text{otherwise} \end{cases}

Parameters

encoder_layerTransformerEncoderLayer
A single configured encoder layer whose hyperparameters (d_model, nhead, dim_feedforward, dropout, activation, batch_first, norm_first) are used to construct num_layers independent copies. The provided instance itself is not reused as one of the copies; a fresh layer is always instantiated per index.
num_layersint
Number of encoder layers NN to stack.
normModule or None= None
Optional final normalisation module applied to the output of the last layer. Commonly a LayerNorm of size d_model. Default: None.

Attributes

layerslist[TransformerEncoderLayer]
The NN encoder layer instances, also registered as sub-modules "0", "1", …, "N-1" for proper parameter tracking and serialisation.
normModule or None
The optional post-stack normalisation module.
num_layersint
The number of stacked layers.

Notes

  • Input src: (S,N,E)(S, N, E) when batch_first=False, or (N,S,E)(N, S, E) when batch_first=True.
  • Output: same shape as src.

where SS is the source sequence length, NN is the batch size, and EE = d_model.

The mask and src_key_padding_mask arguments are propagated unchanged to every layer in the stack. A boolean mask follows the convention that True indicates positions to ignore (mask out), matching the additive -inf semantics used internally.

When norm is provided it is registered as a sub-module named "norm", so its parameters appear in state_dict() and are updated by the optimiser.

Examples

**Six-layer encoder with final LayerNorm** (standard transformer):
>>> import lucid
>>> import lucid.nn as nn
>>> layer = nn.TransformerEncoderLayer(d_model=512, nhead=8,
...                                    dim_feedforward=2048)
>>> norm = nn.LayerNorm(512)
>>> encoder = nn.TransformerEncoder(layer, num_layers=6, norm=norm)
>>> src = lucid.randn(30, 4, 512)       # (seq, batch, d_model)
>>> memory = encoder(src)
>>> memory.shape
(30, 4, 512)
**Encoder with source padding mask** (variable-length sequences):
>>> encoder = nn.TransformerEncoder(
...     nn.TransformerEncoderLayer(d_model=128, nhead=4,
...                                batch_first=True),
...     num_layers=3,
... )
>>> src = lucid.randn(2, 10, 128)
>>> # True = padding position to be ignored
>>> pad_mask = lucid.tensor([[False] * 7 + [True] * 3,
...                          [False] * 10])
>>> out = encoder(src, src_key_padding_mask=pad_mask)
>>> out.shape
(2, 10, 128)

Methods (3)

dunder

__init__

None
__init__(encoder_layer: TransformerEncoderLayer, num_layers: int, norm: Module | None = None)
source

Initialise the TransformerEncoder module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(src: Tensor, mask: Tensor | None = None, src_key_padding_mask: Tensor | None = None)
source

Run the forward pass of the module.

Parameters

srcTensor
See the class docstring.
maskTensor= None
See the class docstring.
src_key_padding_maskTensor= None
See the class docstring.

Returns

Tensor

Output tensor; refer to the class docstring for the exact shape.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.