class

GLU

extendsModule

GLU(dim: int = -1)

source

Gated Linear Unit activation function.

Splits the input tensor into two equal halves along dim, then applies an element-wise gate:

\text{GLU}(x) = x_1 \otimes \sigma(x_2)

where $x_1$ and $x_2$ are the two halves of $x$ along dim, $\otimes$ is element-wise multiplication, and $\sigma$ is the logistic sigmoid. The sigmoid gate controls how much information from the first half flows through, enabling the network to learn a soft feature-selection mechanism.

Parameters

dimint= -1

Dimension along which the input is split. The size along this dimension must be even. Default: -1.

Notes

Input: $(\ldots,\; 2N,\; \ldots)$ — size along dim must be even.
Output: $(\ldots,\; N,\; \ldots)$ — output is half the size of the input along dim.

GLU was introduced for language modelling with convolutional sequence models and is also used in transformer feed-forward blocks as an alternative to ReLU/GELU projection layers.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.GLU(dim=-1)
>>> x = lucid.tensor([[1.0, 2.0, 3.0, 4.0]])   # split into [1,2] and [3,4]
>>> m(x)
tensor([[0.9526, 1.9640]])
>>> # Feed-forward block with GLU gating
>>> ff = nn.Sequential(nn.Linear(256, 512), nn.GLU(dim=-1))
>>> x = lucid.randn(4, 64, 256)
>>> out = ff(x)
>>> out.shape
(4, 64, 256)

Methods (3)

dunder

init

→None

__init__(dim: int = -1)

source

Initialise the GLU module. See the class docstring for parameter semantics.

forward

→Tensor

forward(x: Tensor)

source

Apply the activation function element-wise.

Parameters

inputTensor

Input tensor of arbitrary shape.

Returns

Tensor

Output tensor of the same shape as input.

extra_repr

→str

extra_repr()

source

Return a string representation of the layer's configuration.