class

LazyLinear

extendsModule

LazyLinear(out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Linear layer whose input dimension is inferred on the first forward call.

LazyLinear defers weight allocation until it receives its first input tensor. At that point it reads x.shape[-1] to determine in_features, allocates and initializes weight and bias, and then performs the standard affine transformation

\mathbf{y} = \mathbf{x} \mathbf{W}^{\top} + \mathbf{b}

All subsequent calls behave identically to Linear.

Parameters

out_featuresint

Dimensionality of each output sample (

d_{\text{out}}

biasbool= True

If True (default) a learnable bias is added to the output.

deviceDeviceLike= None

Device on which the parameters will be allocated once materialized.

dtypeDTypeLike= None

Dtype for the materialized parameters.

Attributes

weightParameter or None

None before the first forward call. After materialization, a Parameter of shape (out_features, in_features) initialized with Kaiming uniform.

biasParameter or None

None before the first forward call (and permanently None when bias=False). After materialization, a Parameter of shape (out_features,) initialized with uniform fan-in bounds.

in_featuresint or None

None until the layer is materialized. Afterwards stores the inferred input dimensionality.

out_featuresint

The output dimensionality supplied at construction time.

Notes

When to prefer LazyLinear over Linear:

The input width is only known at runtime (e.g. it depends on a preceding convolutional feature extractor whose spatial size varies with the input image resolution).
You want to prototype model architectures without tracking every intermediate feature dimension by hand.

State-dict loading — If load_state_dict is called while the layer is still uninitialized, the implementation reads the saved weight shape, materializes the parameters to the correct size, and then proceeds with the standard copy. This means a serialized LazyLinear checkpoint can be restored even without a forward pass.

Important: once materialized, the layer behaves exactly like a Linear with the same in_features. There is no runtime overhead after the first call.

Examples

Infer input size from actual data:
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.LazyLinear(64)
>>> m.weight is None
True
>>> x = lucid.randn(4, 128)
>>> y = m(x)              # triggers materialization
>>> m.in_features
128
>>> y.shape
(4, 64)
Works with arbitrary leading batch dimensions:
>>> m2 = nn.LazyLinear(32)
>>> x2 = lucid.randn(2, 10, 256)
>>> m2(x2).shape
(2, 10, 32)
Restore from a checkpoint without running a forward pass first:
>>> import lucid
>>> import lucid.nn as nn
>>> # Suppose we saved a trained LazyLinear that had in_features=512.
>>> src = nn.Linear(512, 64)
>>> ckpt = src.state_dict()
>>> lazy = nn.LazyLinear(64)
>>> lazy.weight is None
True
>>> lazy.load_state_dict(ckpt)  # materializes to (64, 512) from ckpt shape
>>> lazy.in_features
512

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Initialise the LazyLinear module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Apply the linear transformation to the input tensor.

Parameters

inputTensor

Input tensor of shape

(*, \text{in\_features})

Returns

Tensor

Output tensor of shape $(*, \text{out\_features})$ .

Infer input size from actual data: >>> import lucid >>> import lucid.nn as nn >>> m = nn.LazyLinear(64) >>> m.weight is None True >>> x = lucid.randn(4, 128) >>> y = m(x) # triggers materialization >>> m.in_features 128 >>> y.shape (4, 64) Works with arbitrary leading batch dimensions: >>> m2 = nn.LazyLinear(32) >>> x2 = lucid.randn(2, 10, 256) >>> m2(x2).shape (2, 10, 32) Restore from a checkpoint without running a forward pass first: >>> import lucid >>> import lucid.nn as nn >>> # Suppose we saved a trained LazyLinear that had in_features=512. >>> src = nn.Linear(512, 64) >>> ckpt = src.state_dict() >>> lazy = nn.LazyLinear(64) >>> lazy.weight is None True >>> lazy.load_state_dict(ckpt) # materializes to (64, 512) from ckpt shape >>> lazy.in_features 512