fn

glu

Tensor
glu(x: Tensor, dim: int = -1)
source

Gated Linear Unit (Dauphin et al. 2017).

Splits the input in half along dim and multiplies the first half element-wise by the sigmoid of the second. Acts as a learnable gate that lets the network modulate information flow per channel; widely used in sequence models and Conformer-style speech architectures.

Parameters

xTensor
Input tensor. The size along dim must be even — it is split into two halves aa and bb of equal size.
dimint= -1
Dimension along which to split. Default -1.

Returns

Tensor

Output tensor whose shape matches x except along dim, where the size is halved.

Notes

GLU(x)=aσ(b),x=[a;b] split along dim\text{GLU}(x) = a \odot \sigma(b), \qquad x = [a; b] \text{ split along } \text{dim}

The sigmoid gate σ(b)\sigma(b) selects how much of aa to pass through. Because the gate is multiplicative the derivative is well-behaved (no saturation in aa's direction), which helps train deeper stacks than a plain feed-forward layer would allow.

Examples

>>> import lucid
>>> from lucid.nn.functional import glu
>>> x = lucid.tensor([[1.0, 2.0, 0.0, 1.0]])  # split → a=[1,2], b=[0,1]
>>> glu(x, dim=-1)
Tensor([[0.5000, 1.4621]])