fn

fused_linear_gelu

Tensor
fused_linear_gelu(x: Tensor, weight: Tensor, bias: Tensor, approximate: str = 'tanh')
source

Fused linear + GELU forward kernel.

Computes GELU(xW+b)\text{GELU}(xW^\top + b) in a single CPU pass. Used extensively in transformer feed-forward blocks where the linear-then- GELU pattern dominates inference cost; fusing eliminates the intermediate activation buffer and improves cache behaviour.

Parameters

xTensor
Input, shape (..., in_features).
weightTensor
Weight matrix, shape (out_features, in_features).
biasTensor
Bias vector, shape (out_features,).
approximatestr= 'tanh'
GELU approximation. "tanh" (default) is what the fused kernel implements; "none" falls back to the unfused exact-erf path.

Returns

Tensor

gelu(x @ weight.T + bias), shape (..., out_features).

Notes

y=GELU ⁣(xW+b)y = \text{GELU}\!\big(x W^\top + b\big)

Implementation: a single BLAS SGEMM followed by vForce vtanh over the GEMM output to evaluate the tanh-form GELU x2[1+tanh(2/π(x+0.044715x3))]\tfrac{x}{2}\big[1 + \tanh(\sqrt{2/\pi}\,(x + 0.044715 x^3))\big]. Engaged only in inference mode (no grad-tracking inputs) and only when approximate="tanh"; otherwise the call falls back to the standard linear + gelu graph so gradients remain exact.

Examples

>>> import lucid
>>> from lucid.nn.functional import fused_linear_gelu
>>> x = lucid.randn(8, 16)
>>> w = lucid.randn(32, 16)
>>> b = lucid.zeros(32)
>>> fused_linear_gelu(x, w, b).shape
(8, 32)