FusedLinear
ModuleFusedLinear(in_features: int, out_features: int, activation: str = 'relu', bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)Linear layer with a kernel-fused non-linear activation.
Computes
where is one of the supported pointwise activations.
Inference mode dispatches to a single BLAS + Accelerate pass that avoids allocating the intermediate pre-activation tensor. Training mode falls back to unfused, differentiable ops so that the autograd engine can compute correct gradients through both the linear projection and the activation.
Parameters
in_featuresintout_featuresintactivationstr= 'relu''relu'(default) — rectified linear unit, . Fastest; preferred for intermediate hidden layers.'gelu'— Gaussian error linear unit (tanh approximation), . Preferred in Transformer MLP blocks.
biasbool= TrueTrue (default) a learnable bias is added before the
activation. When bias=False the fused kernel is unavailable
and the layer falls back to standard unfused ops even during
inference.deviceDeviceLike= NonedtypeDTypeLike= NoneAttributes
weightParameter(out_features, in_features).
Initialized with Kaiming uniform (same scheme as Linear).biasParameter or None(out_features,).
None when bias=False.activationstr'relu' or 'gelu').in_featuresintout_featuresintNotes
- Input: .
- Output: after activation.
The kernel fusion benefit is most pronounced for inference-only
workloads (e.g. model serving with lucid.no_grad()). During
training, the unfused fallback ensures that every intermediate value
needed for backpropagation is materialised correctly.
For bias=False the fused path is always skipped; prefer using
bias=True to take advantage of fusion.
Examples
Inference with ReLU activation:
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.FusedLinear(64, 256, activation='relu')
>>> x = lucid.randn(4, 64)
>>> with lucid.no_grad():
... y = m(x) # single-pass fused kernel on CPU
>>> y.shape
(4, 256)
GELU activation for a Transformer MLP block:
>>> mlp = nn.FusedLinear(768, 3072, activation='gelu')
>>> x = lucid.randn(2, 16, 768) # (batch, seq_len, d_model)
>>> with lucid.no_grad():
... out = mlp(x)
>>> out.shape
(2, 16, 3072)Methods (3)
__init__
→None__init__(in_features: int, out_features: int, activation: str = 'relu', bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)Initialise the FusedLinear module. See the class docstring for parameter semantics.
forward
→Tensorforward(x: Tensor)Apply the linear transformation to the input tensor.
Parameters
inputTensorReturns
TensorOutput tensor of shape .
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.