GLU
ModuleGLU(dim: int = -1)Gated Linear Unit activation function.
Splits the input tensor into two equal halves along dim, then
applies an element-wise gate:
where and are the two halves of along
dim, is element-wise multiplication, and
is the logistic sigmoid. The sigmoid gate controls how
much information from the first half flows through, enabling the network
to learn a soft feature-selection mechanism.
Parameters
dimint= -1-1.Notes
- Input: — size along
dimmust be even. - Output: — output is half the size of
the input along
dim.
GLU was introduced for language modelling with convolutional sequence models and is also used in transformer feed-forward blocks as an alternative to ReLU/GELU projection layers.
Examples
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.GLU(dim=-1)
>>> x = lucid.tensor([[1.0, 2.0, 3.0, 4.0]]) # split into [1,2] and [3,4]
>>> m(x)
tensor([[0.9526, 1.9640]])
>>> # Feed-forward block with GLU gating
>>> ff = nn.Sequential(nn.Linear(256, 512), nn.GLU(dim=-1))
>>> x = lucid.randn(4, 64, 256)
>>> out = ff(x)
>>> out.shape
(4, 64, 256)Methods (3)
__init__
→None__init__(dim: int = -1)Initialise the GLU module. See the class docstring for parameter semantics.
forward
→Tensorforward(x: Tensor)Apply the activation function element-wise.
Parameters
inputTensorReturns
TensorOutput tensor of the same shape as input.
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.