Adam
OptimizerAdam(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 0.001, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False)Adaptive Moment Estimation optimizer (Kingma & Ba, 2015).
Combines the benefits of two earlier adaptive methods — AdaGrad's per-parameter learning rates derived from the running history of gradients, and RMSProp's exponential moving average of squared gradients — by maintaining two moment estimates and applying bias correction to compensate for their zero initialisation. The result is a near-parameterless optimiser that works well across a remarkably wide range of architectures and is the de-facto default for deep learning training.
Parameters
paramsiterable of Parameterlrfloat= 0.0011e-3).betastuple of float= (0.9, 0.999)(0.9, 0.999)).epsfloat= 1e-081e-8).weight_decayfloat= 0AdamW for properly decoupled weight decay.amsgradbool= FalseFalse).Notes
The update rule for parameter with gradient at step is:
where is the running first moment (mean of gradients), is the running uncentered second moment (mean of squared gradients), and the hat-quantities apply bias correction so that even at small . The effective per-parameter learning rate is — small for high-variance gradients, large for stable ones.
Examples
>>> import lucid.optim as optim
>>> optimizer = optim.Adam(model.parameters(), lr=1e-3)
>>> for x, y in dataloader:
... optimizer.zero_grad()
... loss = loss_fn(model(x), y)
... loss.backward()
... optimizer.step()Methods (2)
__init__
→None__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 0.001, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False)Initialise the Adam. See the class docstring for parameter semantics.
step
→Tensor or Nonestep(closure: _OptimizerClosure = None)Perform a single Adam optimisation step.
Calls the engine-level Adam update for each parameter group, which applies the bias-corrected first- and second-moment update rule.
Parameters
closurecallable= NoneReturns
Tensor or NoneThe loss returned by closure, or None if no closure
was provided.
Examples
>>> optimizer.zero_grad()
>>> loss = model(inputs)
>>> loss.backward()
>>> optimizer.step()