fn
gelu
→Tensorgelu(x: Tensor, approximate: str = 'none')Gaussian Error Linear Unit activation.
Smooth, non-monotonic activation that has largely replaced ReLU in transformer architectures. Approximates but is differentiable everywhere and lets a small negative signal pass through, which improves gradient flow in deep networks.
Parameters
xTensorInput tensor of any shape; activation is element-wise.
approximatestr= 'none'Either
"none" (default, exact formula via erf) or
"tanh" (faster polynomial approximation used in BERT /
Hendrycks 2016).Returns
TensorActivated tensor with the same shape as x.
Notes
Exact form:
where is the standard normal CDF. The "tanh"
approximation is
Unlike ReLU, GELU has a non-zero gradient everywhere — useful for training very deep transformer stacks. Introduced by Hendrycks & Gimpel (2016) and adopted widely after BERT.
Examples
>>> import lucid
>>> from lucid.nn.functional import gelu
>>> x = lucid.tensor([-1.0, 0.0, 1.0, 2.0])
>>> gelu(x)
Tensor([-0.1587, 0.0000, 0.8413, 1.9545])