fused_linear_gelu

→Tensor

fused_linear_gelu(x: Tensor, weight: Tensor, bias: Tensor, approximate: GeluApproximate = 'tanh')

source edit

Fused linear + GELU forward kernel.

Computes $\text{GELU}(xW^\top + b)$ in a single CPU pass. Used extensively in transformer feed-forward blocks where the linear-then- GELU pattern dominates inference cost; fusing eliminates the intermediate activation buffer and improves cache behaviour.

Parameters

xTensor

Input, shape (..., in_features).

weightTensor

Weight matrix, shape (out_features, in_features).

biasTensor

Bias vector, shape (out_features,).

approximatestr= 'tanh'

GELU approximation. "tanh" (default) is what the fused kernel implements; "none" falls back to the unfused exact-erf path.

Returns

Tensor

gelu(x @ weight.T + bias), shape (..., out_features).

Notes

y = \text{GELU}\!\big(x W^\top + b\big)

Implementation: a single BLAS SGEMM followed by vForce vtanh over the GEMM output to evaluate the tanh-form GELU $\tfrac{x}{2}\big[1 + \tanh(\sqrt{2/\pi}\,(x + 0.044715 x^3))\big]$ . Engaged only in inference mode (no grad-tracking inputs) and only when approximate="tanh"; otherwise the call falls back to the standard linear + gelu graph so gradients remain exact.

Examples

>>> import lucid
>>> from lucid.nn.functional import fused_linear_gelu
>>> x = lucid.randn(8, 16)
>>> w = lucid.randn(32, 16)
>>> b = lucid.zeros(32)
>>> fused_linear_gelu(x, w, b).shape
(8, 32)

Used by 2