class

GELU

extendsModule
GELU(approximate: str = 'none')
source

Gaussian Error Linear Unit activation function.

Applies element-wise:

GELU(x)=xΦ(x)\text{GELU}(x) = x \cdot \Phi(x)

where Φ(x)\Phi(x) is the cumulative distribution function of the standard normal distribution. Intuitively, GELU weights each input by the probability that a standard Gaussian random variable is smaller than it — inputs far into the positive tail pass through nearly unchanged, while those deep in the negative tail are suppressed.

When approximate="tanh" the following closed-form approximation is used instead:

GELU(x)x12[1+tanh ⁣(2π(x+0.044715x3))]\text{GELU}(x) \approx x \cdot \frac{1}{2} \left[ 1 + \tanh\!\left( \sqrt{\tfrac{2}{\pi}} \left(x + 0.044715\, x^3\right) \right) \right]

Parameters

approximatestr= 'none'
Approximation method. "none" uses the exact erf-based formula; "tanh" uses the faster tanh approximation. Default: "none".

Notes

  • Input: ()(*) — any shape.
  • Output: ()(*) — same shape as input.

GELU is the default activation in transformer architectures (BERT, GPT) because its smooth non-linearity and non-zero gradient for all inputs improve training stability over ReLU for attention-based models.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.GELU()
>>> x = lucid.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
>>> m(x)
tensor([-0.0454, -0.1587,  0.    ,  0.8413,  1.9545])
>>> # Fast tanh approximation — nearly identical for most inputs
>>> m_approx = nn.GELU(approximate="tanh")
>>> x = lucid.randn(4, 512)
>>> out = m_approx(x)
>>> out.shape
(4, 512)

Methods (3)

dunder

__init__

None
__init__(approximate: str = 'none')
source

Initialise the GELU module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Apply the activation function element-wise.

Parameters

inputTensor
Input tensor of arbitrary shape.

Returns

Tensor

Output tensor of the same shape as input.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.