fn

gelu

Tensor
gelu(x: Tensor, approximate: str = 'none')
source

Gaussian Error Linear Unit activation.

Smooth, non-monotonic activation that has largely replaced ReLU in transformer architectures. Approximates x1x>0x \cdot \mathbb{1}_{x>0} but is differentiable everywhere and lets a small negative signal pass through, which improves gradient flow in deep networks.

Parameters

xTensor
Input tensor of any shape; activation is element-wise.
approximatestr= 'none'
Either "none" (default, exact formula via erf) or "tanh" (faster polynomial approximation used in BERT / Hendrycks 2016).

Returns

Tensor

Activated tensor with the same shape as x.

Notes

Exact form:

GELU(x)=xΦ(x)=x2[1+erf ⁣(x2)]\text{GELU}(x) = x \, \Phi(x) = \frac{x}{2}\left[1 + \text{erf}\!\left(\frac{x}{\sqrt{2}}\right)\right]

where Φ\Phi is the standard normal CDF. The "tanh" approximation is

GELU(x)x2[1+tanh ⁣(2π(x+0.044715x3))].\text{GELU}(x) \approx \frac{x}{2}\left[1 + \tanh\!\left(\sqrt{\tfrac{2}{\pi}}\,(x + 0.044715\, x^3)\right)\right].

Unlike ReLU, GELU has a non-zero gradient everywhere — useful for training very deep transformer stacks. Introduced by Hendrycks & Gimpel (2016) and adopted widely after BERT.

Examples

>>> import lucid
>>> from lucid.nn.functional import gelu
>>> x = lucid.tensor([-1.0, 0.0, 1.0, 2.0])
>>> gelu(x)
Tensor([-0.1587,  0.0000,  0.8413,  1.9545])