class

GLU

extendsModule
GLU(dim: int = -1)
source

Gated Linear Unit activation function.

Splits the input tensor into two equal halves along dim, then applies an element-wise gate:

GLU(x)=x1σ(x2)\text{GLU}(x) = x_1 \otimes \sigma(x_2)

where x1x_1 and x2x_2 are the two halves of xx along dim, \otimes is element-wise multiplication, and σ\sigma is the logistic sigmoid. The sigmoid gate controls how much information from the first half flows through, enabling the network to learn a soft feature-selection mechanism.

Parameters

dimint= -1
Dimension along which the input is split. The size along this dimension must be even. Default: -1.

Notes

  • Input: (,  2N,  )(\ldots,\; 2N,\; \ldots) — size along dim must be even.
  • Output: (,  N,  )(\ldots,\; N,\; \ldots) — output is half the size of the input along dim.

GLU was introduced for language modelling with convolutional sequence models and is also used in transformer feed-forward blocks as an alternative to ReLU/GELU projection layers.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.GLU(dim=-1)
>>> x = lucid.tensor([[1.0, 2.0, 3.0, 4.0]])   # split into [1,2] and [3,4]
>>> m(x)
tensor([[0.9526, 1.9640]])
>>> # Feed-forward block with GLU gating
>>> ff = nn.Sequential(nn.Linear(256, 512), nn.GLU(dim=-1))
>>> x = lucid.randn(4, 64, 256)
>>> out = ff(x)
>>> out.shape
(4, 64, 256)

Methods (3)

dunder

__init__

None
__init__(dim: int = -1)
source

Initialise the GLU module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Apply the activation function element-wise.

Parameters

inputTensor
Input tensor of arbitrary shape.

Returns

Tensor

Output tensor of the same shape as input.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.