LazyLinear
ModuleLazyLinear(out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)Linear layer whose input dimension is inferred on the first forward call.
LazyLinear defers weight allocation until it receives its first input
tensor. At that point it reads x.shape[-1] to determine
in_features, allocates and initializes weight and bias, and
then performs the standard affine transformation
All subsequent calls behave identically to Linear.
Parameters
out_featuresintbiasbool= TrueTrue (default) a learnable bias is added to the output.deviceDeviceLike= NonedtypeDTypeLike= NoneAttributes
weightParameter or NoneNone before the first forward call. After materialization,
a Parameter of shape (out_features, in_features) initialized
with Kaiming uniform.biasParameter or NoneNone before the first forward call (and permanently None when
bias=False). After materialization, a Parameter of shape
(out_features,) initialized with uniform fan-in bounds.in_featuresint or NoneNone until the layer is materialized. Afterwards stores the
inferred input dimensionality.out_featuresintNotes
When to prefer LazyLinear over Linear:
- The input width is only known at runtime (e.g. it depends on a preceding convolutional feature extractor whose spatial size varies with the input image resolution).
- You want to prototype model architectures without tracking every intermediate feature dimension by hand.
State-dict loading — If load_state_dict is called while the
layer is still uninitialized, the implementation reads the saved weight
shape, materializes the parameters to the correct size, and then
proceeds with the standard copy. This means a serialized
LazyLinear checkpoint can be restored even without a forward
pass.
Important: once materialized, the layer behaves exactly like a
Linear with the same in_features. There is no runtime
overhead after the first call.
Examples
Infer input size from actual data:
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.LazyLinear(64)
>>> m.weight is None
True
>>> x = lucid.randn(4, 128)
>>> y = m(x) # triggers materialization
>>> m.in_features
128
>>> y.shape
(4, 64)
Works with arbitrary leading batch dimensions:
>>> m2 = nn.LazyLinear(32)
>>> x2 = lucid.randn(2, 10, 256)
>>> m2(x2).shape
(2, 10, 32)
Restore from a checkpoint without running a forward pass first:
>>> import lucid
>>> import lucid.nn as nn
>>> # Suppose we saved a trained LazyLinear that had in_features=512.
>>> src = nn.Linear(512, 64)
>>> ckpt = src.state_dict()
>>> lazy = nn.LazyLinear(64)
>>> lazy.weight is None
True
>>> lazy.load_state_dict(ckpt) # materializes to (64, 512) from ckpt shape
>>> lazy.in_features
512Methods (3)
__init__
→None__init__(out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)Initialise the LazyLinear module. See the class docstring for parameter semantics.
forward
→Tensorforward(x: Tensor)Apply the linear transformation to the input tensor.
Parameters
inputTensorReturns
TensorOutput tensor of shape .
extra_repr
→strextra_repr()Return a string representation of the layer's configuration.