nn.TransformerEncoder¶
- class lucid.nn.TransformerEncoder(encoder_layer: TransformerEncoderLayer | Module, num_layers: int, norm: Module | None = None)¶
Overview¶
The TransformerEncoder module stacks multiple TransformerEncoderLayer instances to form a complete Transformer encoder. It sequentially processes the input data through multiple encoder layers and optionally applies layer normalization.
Class Signature¶
class lucid.nn.TransformerEncoder(
encoder_layer: TransformerEncoderLayer | nn.Module,
num_layers: int,
norm: nn.Module | None = None,
)
Parameters¶
encoder_layer (TransformerEncoderLayer | nn.Module): A single instance of TransformerEncoderLayer that will be replicated for num_layers times.
num_layers (int): The number of encoder layers in the stack.
norm (nn.Module | None, optional): An optional layer normalization module applied to the final output. Default is None.
Forward Method¶
def forward(
src: Tensor,
src_mask: Tensor | None = None,
src_key_padding: Tensor | None = None,
is_causal: bool = False
) -> Tensor
Computes the forward pass of the Transformer encoder.
Inputs:
src (Tensor): The input tensor of shape \((N, L, d_{model})\), where: - \(N\) is the batch size. - \(L\) is the sequence length. - \(d_{model}\) is the embedding dimension.
src_mask (Tensor | None, optional): A mask of shape \((L, L)\) applied to attention weights. Default is None.
src_key_padding (Tensor | None, optional): A mask of shape \((N, L)\), where non-zero values indicate positions that should be ignored. Default is None.
is_causal (bool, optional, default=False): If True, enforces a lower-triangular mask to prevent positions from attending to future positions.
Output:
Tensor: The output tensor of shape \((N, L, d_{model})\).
Mathematical Details¶
The Transformer encoder processes input through a sequence of encoder layers as follows:
Iterative Encoding
Each input tensor \(X\) is passed through num_layers encoder layers:
\[X_0 = X X_{i+1} = \operatorname{EncoderLayer}(X_i), \quad \forall i \in [0, \text{num\_layers}-1]\]Optional Normalization
If norm is provided, it is applied to the final output:
\[Y = \operatorname{LayerNorm}(X_{\text{num\_layers}})\]Otherwise, the final encoder layer output is returned.
Usage Example¶
import lucid
import lucid.nn as nn
# Create an encoder layer
encoder_layer = nn.TransformerEncoderLayer(d_model=512, num_heads=8)
# Stack multiple encoder layers into a Transformer encoder
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
# Create random input tensor
src = lucid.random.randn(16, 10, 512) # (batch, seq_len, embed_dim)
# Compute encoder output
output = transformer_encoder(src)
print(output.shape) # Expected output: (16, 10, 512)