class

Linear

extendsModule

Linear(in_features: int, out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++LinearBackwardclass

Apply a learnable affine transformation to incoming data.

Computes the linear map

\mathbf{y} = \mathbf{x} \mathbf{W}^{\top} + \mathbf{b}

where $\mathbf{W} \in \mathbb{R}^{d_{\text{out}} \times d_{\text{in}}}$ is the weight matrix and $\mathbf{b} \in \mathbb{R}^{d_{\text{out}}}$ is the optional bias vector.

Parameters

in_featuresint

Dimensionality of each input sample (

d_{\text{in}}

out_featuresint

Dimensionality of each output sample (

d_{\text{out}}

biasbool= True

If True (default) a learnable bias

\mathbf{b}

is added to the output. Set to False when a subsequent normalization layer already absorbs the bias (e.g. BatchNorm1d).

deviceDeviceLike= None

Device on which the initial parameters are allocated ('cpu' or 'metal'). Defaults to the global default device.

dtypeDTypeLike= None

Floating-point dtype for the initial parameters. Defaults to the global default dtype (float32).

Attributes

weightParameter

Learnable weight matrix of shape (out_features, in_features). Initialized with Kaiming uniform sampling:

\mathbf{W}_{ij} \sim \mathcal{U}\!\left( -\sqrt{\tfrac{6}{(1 + a^2)\,d_{\text{in}}}},\; \sqrt{\tfrac{6}{(1 + a^2)\,d_{\text{in}}}} \right)

where

a = \sqrt{5}

is the default negative-slope parameter. This keeps gradient variance roughly constant across layers at initialization — critical for training stability in deep networks.

biasParameter or None

Learnable bias vector of shape (out_features,). Initialized with uniform sampling over

\left[-\tfrac{1}{\sqrt{d_{\text{in}}}},\; \tfrac{1}{\sqrt{d_{\text{in}}}}\right]

. None when bias=False.

Notes

Input: $(\ast, d_{\text{in}})$ — any number of leading batch dimensions followed by in_features.
Output: $(\ast, d_{\text{out}})$ — same leading dimensions, last axis replaced by out_features.

Linear is the most common building block in feed-forward sub-layers (e.g. the MLP inside a Transformer block uses two Linear layers with a non-linearity in between). When composing many layers in sequence the Kaiming initialization ensures that neither the forward activations nor the backward gradients explode or vanish at the start of training.

Examples

Basic usage with a 2-D input:
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.Linear(20, 10)
>>> x = lucid.randn(4, 20)   # batch of 4, 20 features each
>>> y = m(x)
>>> y.shape
(4, 10)
Higher-dimensional inputs (batch + sequence):
>>> m = nn.Linear(512, 256)
>>> x = lucid.randn(2, 32, 512)   # (batch, seq_len, d_model)
>>> m(x).shape
(2, 32, 256)
Disable bias for use before a normalization layer:
>>> m_no_bias = nn.Linear(128, 64, bias=False)
>>> m_no_bias.bias is None
True
>>> lucid.randn(8, 128).shape == (8, 128)
True

Used by 4

Constructors

dunder

init

→None

__init__(in_features: int, out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Initialise the Linear module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Apply the linear transformation to the input tensor.

Parameters

inputTensor

Input tensor of shape

(*, \text{in\_features})

Returns

Tensor

Output tensor of shape $(*, \text{out\_features})$ .

reset_parameters

→None

reset_parameters()

source edit

Initialize weight with Kaiming uniform and bias with uniform fan_in bound.

Basic usage with a 2-D input: >>> import lucid >>> import lucid.nn as nn >>> m = nn.Linear(20, 10) >>> x = lucid.randn(4, 20) # batch of 4, 20 features each >>> y = m(x) >>> y.shape (4, 10) Higher-dimensional inputs (batch + sequence): >>> m = nn.Linear(512, 256) >>> x = lucid.randn(2, 32, 512) # (batch, seq_len, d_model) >>> m(x).shape (2, 32, 256) Disable bias for use before a normalization layer: >>> m_no_bias = nn.Linear(128, 64, bias=False) >>> m_no_bias.bias is None True >>> lucid.randn(8, 128).shape == (8, 128) True