fused_linear_relu

→Tensor

fused_linear_relu(x: Tensor, weight: Tensor, bias: Tensor)

source edit

Fused linear + ReLU forward kernel.

Computes $\text{ReLU}(xW^\top + b)$ in a single CPU pass — the pre-activation tensor is never materialised, halving memory traffic on large MLP blocks. Falls back to the unfused two-op path whenever any input requires gradient so that autograd remains correct.

Parameters

xTensor

Input, shape (..., in_features).

weightTensor

Weight matrix, shape (out_features, in_features).

biasTensor

Bias vector, shape (out_features,).

Returns

Tensor

relu(x @ weight.T + bias), shape (..., out_features).

Notes

y = \max\!\big(0,\, x W^\top + b\big)

Implementation: a single BLAS SGEMM followed by vDSP vrelu over the GEMM output buffer. Selected by the Phase-19 FusionPass when grad-mode is disabled and none of x / weight / bias has requires_grad=True. In training mode the call routes through the standard linear + relu graph so autograd's backward derivations are exact.

Examples

>>> import lucid
>>> from lucid.nn.functional import fused_linear_relu
>>> x = lucid.randn(8, 16)
>>> w = lucid.randn(32, 16)
>>> b = lucid.zeros(32)
>>> fused_linear_relu(x, w, b).shape
(8, 32)

Used by 2