Module
Module()Base class for all neural network modules.
Every custom model should subclass this and implement forward.
Submodules assigned as attributes are tracked automatically.
Notes
Attribute routing: Setting an attribute follows this priority order:
- If the value is a
lucid.nn.Parameter→ stored in_parameters. - If the value is a
Module→ stored in_modules. - Otherwise → plain Python attribute.
To register a non-parameter tensor (e.g. a running mean), call
register_buffer explicitly.
Examples
>>> class MLP(nn.Module):
... def __init__(self):
... super().__init__()
... self.fc1 = nn.Linear(10, 20)
... self.fc2 = nn.Linear(20, 1)
...
... def forward(self, x):
... return self.fc2(lucid.relu(self.fc1(x)))
...
>>> model = MLP()
>>> model(lucid.randn(4, 10)).shape
(4, 1)Methods (47)
__init__
→None__init__()Initialise the instance. See the class docstring for parameter semantics.
__call__
→_ModuleOutput__call__(args: Tensor = (), kwargs: object = {})Forward to the underlying callable (see class docstring).
forward
→_ModuleOutputforward(args: Tensor = (), kwargs: object = {})Override in subclasses to define the computation.
parameters
→Iterator[Parameter]parameters(recurse: bool = True)Yield all Parameters in this module (and children if recurse=True).
named_parameters
→Iterator[tuple[str, Parameter]]named_parameters(prefix: str = '', recurse: bool = True, remove_duplicate: bool = True)Yield (name, Parameter) pairs.
Parameters
remove_duplicatebool= Truebuffers
→Iterator[Tensor]buffers(recurse: bool = True)Yield all buffer tensors.
named_buffers
→Iterator[tuple[str, Tensor]]named_buffers(prefix: str = '', recurse: bool = True, remove_duplicate: bool = True)Yield (name, buffer) pairs.
modules
→Iterator[Module]modules()Yield this module and all submodules (depth-first).
named_modules
→Iterator[tuple[str, Module]]named_modules(memo: set[int] | None = None, prefix: str = '', remove_duplicate: bool = True)Yield (name, module) pairs.
children
→Iterator[Module]children()Yield direct child modules.
named_children
→Iterator[tuple[str, Module]]named_children()Yield (name, child_module) pairs.
get_submodule
→Moduleget_submodule(target: str)Return submodule at dotted path, e.g. 'encoder.layer.0'.
get_parameter
→Parameterget_parameter(target: str)Return parameter at dotted path, e.g. 'fc.weight'.
get_buffer
→Tensorget_buffer(target: str)Return buffer at dotted path, e.g. 'bn.running_mean'.
register_parameter
→Noneregister_parameter(name: str, param: Parameter | None)Register a Parameter under the given name.
register_buffer
→Noneregister_buffer(name: str, tensor: Tensor | None, persistent: bool = True)Register a buffer tensor. Non-persistent buffers are excluded from state_dict.
add_module
→Noneadd_module(name: str, module: Module | None)Add a child module.
register_module
→Noneregister_module(name: str, module: Module | None)Alias for add_module.
train
→Selftrain(mode: bool = True)Set this module and all children to training mode.
eval
→Selfeval()Set this module and all children to evaluation mode.
to
→Selfto(args: object = (), kwargs: object = {})Move/cast all parameters and buffers, preserving Parameter object identity.
Floating-point dtype casts (.float(), .double(),
.half(), .bfloat16()) skip integer buffers — e.g.
BatchNorm.num_batches_tracked stays int64 — matching the
reference framework so checkpoint round-trips don't quietly
widen / narrow the counter type. Device moves still apply to
every tensor.
metal
→Selfmetal()Move all parameters and buffers to Apple Metal GPU.
cpu
→Selfcpu()Move all parameters and buffers to CPU.
half
→Selfhalf()Cast all parameters and buffers to float16.
float
→Selffloat()Cast all parameters and buffers to float32.
double
→Selfdouble()Cast all parameters and buffers to float64.
bfloat16
→Selfbfloat16()Cast all parameters and buffers to bfloat16.
type
→Selftype(dst_type: object)Cast all parameters and buffers to dst_type.
dst_type may be a lucid.dtype, a Python type (float,
int), or a string ("float32", "float16", etc.).
Delegates to to, which handles the conversion.
apply
→Selfapply(fn: Callable[[Module], None])Apply fn recursively to every submodule (including self).
zero_grad
→Nonezero_grad(set_to_none: bool = True)Zero gradients of all parameters.
requires_grad_
→Selfrequires_grad_(requires_grad: bool = True)Set requires_grad for all parameters.
share_memory
→Selfshare_memory()No-op on Apple Silicon (unified memory is always shared).
compile
→Selfcompile(args: object = (), kwargs: object = {})No-op compatibility stub.
External codepaths often call model.compile() to opt into JIT
acceleration; Lucid has no such layer, so this returns self
unchanged rather than crashing the caller. Any positional or
keyword arguments are accepted and ignored.
to_empty
→Selfto_empty(device: object = None, recurse: bool = True)Move parameters and buffers to device without copying data.
The reference framework uses to_empty to materialise a model
constructed on the meta device. Lucid has no meta device, but
falls back to the standard to when called for parity with
external code. recurse is honoured by to already.
get_extra_state
→objectget_extra_state()Return extra state to include in state_dict. Override in subclasses.
set_extra_state
→Noneset_extra_state(state: object)Restore extra state loaded from state_dict. Override in subclasses.
state_dict
→dict[str, Tensor]state_dict(destination: dict[str, Tensor] | None = None, prefix: str = '', keep_vars: bool = False)Return a dict mapping parameter/buffer names to tensors.
The returned OrderedDict carries a _metadata attribute:
{module_path: {"version": <int>}} for every module that defines
a _version class attribute. lucid.save preserves this attribute
across disk round-trips.
load_state_dict
→objectload_state_dict(state_dict: dict[str, Tensor], strict: bool = True, assign: bool = False)Load parameters from a state_dict.
Calls each module's _load_from_state_dict recursively.
Returns IncompatibleKeys(missing_keys, unexpected_keys) on success.
Raises RuntimeError if strict=True and any keys are missing
or unexpected, or if any error_msgs accumulated during loading.
Parameters
state_dictdictstrictbool= TrueTrue (default) require an exact key match; raise on any
missing or unexpected keys.assignbool= FalseTrue replace each parameter/buffer object with the
loaded tensor directly (allows shape/dtype changes). If
False (default) copy data into the existing parameter
preserving its dtype and device.register_load_state_dict_pre_hook
→RemovableHandleregister_load_state_dict_pre_hook(hook: Callable[..., object])Register a pre-hook called when this module loads state_dict.
Hook signature: hook(module, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs) -> None
The hook may mutate state_dict/missing/unexpected/error_msgs.
register_load_state_dict_post_hook
→RemovableHandleregister_load_state_dict_post_hook(hook: Callable[..., object])Register a post-hook called after this module loads state_dict.
Hook signature: hook(module, incompatible_keys) -> None.
register_forward_pre_hook
→RemovableHandleregister_forward_pre_hook(hook: _ForwardPreHook, prepend: bool = False, with_kwargs: bool = False)Register a hook called before forward().
register_forward_hook
→RemovableHandleregister_forward_hook(hook: _ForwardHook, prepend: bool = False, with_kwargs: bool = False, always_call: bool = False)Register a hook called after forward().
register_full_backward_pre_hook
→RemovableHandleregister_full_backward_pre_hook(hook: _BackwardHook, prepend: bool = False)Register a hook to be called before backward hooks.
register_full_backward_hook
→RemovableHandleregister_full_backward_hook(hook: _BackwardHook, prepend: bool = False)Register a backward hook. Returns a RemovableHandle.
register_backward_hook
→RemovableHandleregister_backward_hook(hook: _BackwardHook)Deprecated alias for register_full_backward_hook.
extra_repr
→strextra_repr()Override to add extra repr info (e.g. Linear shows in_features, etc.).
__repr__
→str__repr__()Return a developer-facing string representation of the instance.