nn.functional.gelu¶
The gelu function applies the Gaussian Error Linear Unit activation function element-wise to the input tensor. GELU is a smooth, non-linear activation function that combines properties of dropout and activation functions, providing better performance in certain neural network architectures.
Function Signature¶
def gelu(input_: Tensor) -> Tensor
Parameters¶
- input_ (Tensor):
The input tensor of any shape.
Returns¶
- Tensor:
A new Tensor where each element is the result of applying the GELU function to the corresponding element in input_. If input_ requires gradients, the resulting tensor will also require gradients.
Forward Calculation¶
The forward calculation for the gelu operation is:
Where \(\Phi\) is the cumulative distribution function of the standard normal distribution.
An approximate formulation commonly used is:
Backward Gradient Calculation¶
For the tensor input_ involved in the gelu operation, the gradient with respect to the output (out) is computed as follows:
Gradient with respect to \(\mathbf{input\_}\):
Where \(\phi\) is the probability density function of the standard normal distribution.
Examples¶
Using gelu on a tensor:
>>> import lucid.nn.functional as F
>>> input_ = Tensor([-1.0, 0.0, 1.0], requires_grad=True)
>>> out = F.gelu(input_)
>>> print(out)
Tensor([-0.1588, 0.0, 0.8413], grad=None)
Backpropagation computes gradients for input_:
>>> out.backward()
>>> print(input_.grad)
[0.1588, 0.5, 0.8413]