class

GELU

extendsModule

GELU(approximate: GeluApproximate = 'none')

source edit

Gaussian Error Linear Unit activation function.

Applies element-wise:

\text{GELU}(x) = x \cdot \Phi(x)

where $\Phi(x)$ is the cumulative distribution function of the standard normal distribution. Intuitively, GELU weights each input by the probability that a standard Gaussian random variable is smaller than it — inputs far into the positive tail pass through nearly unchanged, while those deep in the negative tail are suppressed.

When approximate="tanh" the following closed-form approximation is used instead:

\text{GELU}(x) \approx x \cdot \frac{1}{2} \left[ 1 + \tanh\!\left( \sqrt{\tfrac{2}{\pi}} \left(x + 0.044715\, x^3\right) \right) \right]

Parameters

approximatestr= 'none'

Approximation method. "none" uses the exact erf-based formula; "tanh" uses the faster tanh approximation. Default: "none".

Notes

Input: $(*)$ — any shape.
Output: $(*)$ — same shape as input.

GELU is the default activation in transformer architectures (BERT, GPT) because its smooth non-linearity and non-zero gradient for all inputs improve training stability over ReLU for attention-based models.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.GELU()
>>> x = lucid.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
>>> m(x)
tensor([-0.0454, -0.1587,  0.    ,  0.8413,  1.9545])
>>> # Fast tanh approximation — nearly identical for most inputs
>>> m_approx = nn.GELU(approximate="tanh")
>>> x = lucid.randn(4, 512)
>>> out = m_approx(x)
>>> out.shape
(4, 512)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(approximate: GeluApproximate = 'none')

source edit

Initialise the GELU module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Apply the activation function element-wise.

Parameters

inputTensor

Input tensor of arbitrary shape.

Returns

Tensor

Output tensor of the same shape as input.

>>> import lucid >>> import lucid.nn as nn >>> m = nn.GELU() >>> x = lucid.tensor([-2.0, -1.0, 0.0, 1.0, 2.0]) >>> m(x) tensor([-0.0454, -0.1587, 0. , 0.8413, 1.9545]) >>> # Fast tanh approximation — nearly identical for most inputs >>> m_approx = nn.GELU(approximate="tanh") >>> x = lucid.randn(4, 512) >>> out = m_approx(x) >>> out.shape (4, 512)