glu

→Tensor

glu(x: Tensor, dim: int = -1)

source edit

Gated Linear Unit (Dauphin et al. 2017).

Splits the input in half along dim and multiplies the first half element-wise by the sigmoid of the second. Acts as a learnable gate that lets the network modulate information flow per channel; widely used in sequence models and Conformer-style speech architectures.

Parameters

xTensor

Input tensor. The size along dim must be even — it is split into two halves

a

and

b

of equal size.

dimint= -1

Dimension along which to split. Default -1.

Returns

Tensor

Output tensor whose shape matches x except along dim, where the size is halved.

Notes

\text{GLU}(x) = a \odot \sigma(b), \qquad x = [a; b] \text{ split along } \text{dim}

The sigmoid gate $\sigma(b)$ selects how much of $a$ to pass through. Because the gate is multiplicative the derivative is well-behaved (no saturation in $a$ 's direction), which helps train deeper stacks than a plain feed-forward layer would allow.

Examples

>>> import lucid
>>> from lucid.nn.functional import glu
>>> x = lucid.tensor([[1.0, 2.0, 0.0, 1.0]])  # split → a=[1,2], b=[0,1]
>>> glu(x, dim=-1)
Tensor([[0.5000, 1.4621]])

Used by 2

glu