fn
fused_linear_relu
→Tensorfused_linear_relu(x: Tensor, weight: Tensor, bias: Tensor)Fused linear + ReLU forward kernel.
Computes in a single CPU pass — the pre-activation tensor is never materialised, halving memory traffic on large MLP blocks. Falls back to the unfused two-op path whenever any input requires gradient so that autograd remains correct.
Parameters
xTensorInput, shape
(..., in_features).weightTensorWeight matrix, shape
(out_features, in_features).biasTensorBias vector, shape
(out_features,).Returns
Tensorrelu(x @ weight.T + bias), shape (..., out_features).
Notes
Implementation: a single BLAS SGEMM followed by vDSP vrelu over
the GEMM output buffer. Selected by the Phase-19 FusionPass when
grad-mode is disabled and none of x / weight / bias has
requires_grad=True. In training mode the call routes through
the standard linear + relu graph so autograd's
backward derivations are exact.
Examples
>>> import lucid
>>> from lucid.nn.functional import fused_linear_relu
>>> x = lucid.randn(8, 16)
>>> w = lucid.randn(32, 16)
>>> b = lucid.zeros(32)
>>> fused_linear_relu(x, w, b).shape
(8, 32)