fn
glu
→Tensorglu(x: Tensor, dim: int = -1)Gated Linear Unit (Dauphin et al. 2017).
Splits the input in half along dim and multiplies the first half
element-wise by the sigmoid of the second. Acts as a learnable gate
that lets the network modulate information flow per channel; widely
used in sequence models and Conformer-style speech architectures.
Parameters
xTensorInput tensor. The size along
dim must be even — it is split
into two halves and of equal size.dimint= -1Dimension along which to split. Default
-1.Returns
TensorOutput tensor whose shape matches x except along dim,
where the size is halved.
Notes
The sigmoid gate selects how much of to pass through. Because the gate is multiplicative the derivative is well-behaved (no saturation in 's direction), which helps train deeper stacks than a plain feed-forward layer would allow.
Examples
>>> import lucid
>>> from lucid.nn.functional import glu
>>> x = lucid.tensor([[1.0, 2.0, 0.0, 1.0]]) # split → a=[1,2], b=[0,1]
>>> glu(x, dim=-1)
Tensor([[0.5000, 1.4621]])