fn
fused_linear_gelu
→Tensorfused_linear_gelu(x: Tensor, weight: Tensor, bias: Tensor, approximate: str = 'tanh')Fused linear + GELU forward kernel.
Computes in a single CPU pass. Used extensively in transformer feed-forward blocks where the linear-then- GELU pattern dominates inference cost; fusing eliminates the intermediate activation buffer and improves cache behaviour.
Parameters
xTensorInput, shape
(..., in_features).weightTensorWeight matrix, shape
(out_features, in_features).biasTensorBias vector, shape
(out_features,).approximatestr= 'tanh'GELU approximation.
"tanh" (default) is what the fused
kernel implements; "none" falls back to the unfused exact-erf
path.Returns
Tensorgelu(x @ weight.T + bias), shape (..., out_features).
Notes
Implementation: a single BLAS SGEMM followed by vForce vtanh
over the GEMM output to evaluate the tanh-form GELU
.
Engaged only in inference mode (no grad-tracking inputs) and only
when approximate="tanh"; otherwise the call falls back to the
standard linear + gelu graph so gradients remain
exact.
Examples
>>> import lucid
>>> from lucid.nn.functional import fused_linear_gelu
>>> x = lucid.randn(8, 16)
>>> w = lucid.randn(32, 16)
>>> b = lucid.zeros(32)
>>> fused_linear_gelu(x, w, b).shape
(8, 32)