class

LazyLinear

extendsModule
LazyLinear(out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Linear layer whose input dimension is inferred on the first forward call.

LazyLinear defers weight allocation until it receives its first input tensor. At that point it reads x.shape[-1] to determine in_features, allocates and initializes weight and bias, and then performs the standard affine transformation

y=xW+b\mathbf{y} = \mathbf{x} \mathbf{W}^{\top} + \mathbf{b}

All subsequent calls behave identically to Linear.

Parameters

out_featuresint
Dimensionality of each output sample (doutd_{\text{out}}).
biasbool= True
If True (default) a learnable bias is added to the output.
deviceDeviceLike= None
Device on which the parameters will be allocated once materialized.
dtypeDTypeLike= None
Dtype for the materialized parameters.

Attributes

weightParameter or None
None before the first forward call. After materialization, a Parameter of shape (out_features, in_features) initialized with Kaiming uniform.
biasParameter or None
None before the first forward call (and permanently None when bias=False). After materialization, a Parameter of shape (out_features,) initialized with uniform fan-in bounds.
in_featuresint or None
None until the layer is materialized. Afterwards stores the inferred input dimensionality.
out_featuresint
The output dimensionality supplied at construction time.

Notes

When to prefer LazyLinear over Linear:

  • The input width is only known at runtime (e.g. it depends on a preceding convolutional feature extractor whose spatial size varies with the input image resolution).
  • You want to prototype model architectures without tracking every intermediate feature dimension by hand.

State-dict loading — If load_state_dict is called while the layer is still uninitialized, the implementation reads the saved weight shape, materializes the parameters to the correct size, and then proceeds with the standard copy. This means a serialized LazyLinear checkpoint can be restored even without a forward pass.

Important: once materialized, the layer behaves exactly like a Linear with the same in_features. There is no runtime overhead after the first call.

Examples

Infer input size from actual data:
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.LazyLinear(64)
>>> m.weight is None
True
>>> x = lucid.randn(4, 128)
>>> y = m(x)              # triggers materialization
>>> m.in_features
128
>>> y.shape
(4, 64)
Works with arbitrary leading batch dimensions:
>>> m2 = nn.LazyLinear(32)
>>> x2 = lucid.randn(2, 10, 256)
>>> m2(x2).shape
(2, 10, 32)
Restore from a checkpoint without running a forward pass first:
>>> import lucid
>>> import lucid.nn as nn
>>> # Suppose we saved a trained LazyLinear that had in_features=512.
>>> src = nn.Linear(512, 64)
>>> ckpt = src.state_dict()
>>> lazy = nn.LazyLinear(64)
>>> lazy.weight is None
True
>>> lazy.load_state_dict(ckpt)  # materializes to (64, 512) from ckpt shape
>>> lazy.in_features
512

Methods (3)

dunder

__init__

None
__init__(out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the LazyLinear module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Apply the linear transformation to the input tensor.

Parameters

inputTensor
Input tensor of shape (,in_features)(*, \text{in\_features}).

Returns

Tensor

Output tensor of shape (,out_features)(*, \text{out\_features}).

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.