Linear
ModuleLinear(in_features: int, out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)Apply a learnable affine transformation to incoming data.
Computes the linear map
where is the weight matrix and is the optional bias vector.
Parameters
in_featuresintout_featuresintbiasbool= TrueTrue (default) a learnable bias is added to the
output. Set to False when a subsequent normalization layer already
absorbs the bias (e.g. BatchNorm1d).deviceDeviceLike= None'cpu' or
'metal'). Defaults to the global default device.dtypeDTypeLike= Nonefloat32).Attributes
weightParameter(out_features, in_features).
Initialized with Kaiming uniform sampling:
where is the default negative-slope parameter.
This keeps gradient variance roughly constant across layers at
initialization — critical for training stability in deep networks.biasParameter or None(out_features,). Initialized with
uniform sampling over .
None when bias=False.Notes
- Input: — any number of leading batch
dimensions followed by
in_features. - Output: — same leading dimensions,
last axis replaced by
out_features.
Linear is the most common building block in feed-forward sub-layers
(e.g. the MLP inside a Transformer block uses two Linear layers with a
non-linearity in between). When composing many layers in sequence the
Kaiming initialization ensures that neither the forward activations nor the
backward gradients explode or vanish at the start of training.
Examples
Basic usage with a 2-D input:
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.Linear(20, 10)
>>> x = lucid.randn(4, 20) # batch of 4, 20 features each
>>> y = m(x)
>>> y.shape
(4, 10)
Higher-dimensional inputs (batch + sequence):
>>> m = nn.Linear(512, 256)
>>> x = lucid.randn(2, 32, 512) # (batch, seq_len, d_model)
>>> m(x).shape
(2, 32, 256)
Disable bias for use before a normalization layer:
>>> m_no_bias = nn.Linear(128, 64, bias=False)
>>> m_no_bias.bias is None
True
>>> lucid.randn(8, 128).shape == (8, 128)
TrueMethods (4)
__init__
→None__init__(in_features: int, out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)Initialise the Linear module. See the class docstring for parameter semantics.
reset_parameters
→Nonereset_parameters()Initialize weight with Kaiming uniform and bias with uniform fan_in bound.
forward
→Tensorforward(x: Tensor)Apply the linear transformation to the input tensor.
Parameters
inputTensorReturns
TensorOutput tensor of shape .
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.