class
NAdam
extends
OptimizerNAdam(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 0.002, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0)Nesterov-accelerated Adaptive Moment Estimation optimizer.
NAdam incorporates Nesterov momentum into Adam by replacing the standard first-moment estimate in the denominator with a one-step lookahead estimate. The update rule is:
Parameters
paramsiterable of Parameter or iterable of dictParameters to optimise, or a list of parameter-group dicts.
lrfloat= 0.002Learning rate (default:
2e-3).betastuple of float= (0.9, 0.999)Coefficients for the first- and
second-moment estimates (default:
(0.9, 0.999)).epsfloat= 1e-08Term for numerical stability (default:
1e-8).weight_decayfloat= 0L2 regularisation coefficient (default:
0).Attributes
param_groupslist of dictParameter groups with keys
"params", "lr", "beta1",
"beta2", "eps", and "weight_decay".defaultsdictDefault hyperparameter values.
Notes
NAdam often converges faster than vanilla Adam because the Nesterov lookahead provides a more accurate gradient direction. It is particularly effective on recurrent networks and tasks with noisy gradients.
Examples
>>> import lucid.optim as optim
>>> optimizer = optim.NAdam(model.parameters(), lr=2e-3)
>>> optimizer.zero_grad()
>>> loss.backward()
>>> optimizer.step()Methods (2)
dunder
__init__
→None__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 0.002, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0)Initialise the NAdam. See the class docstring for parameter semantics.
fn
step
→Tensor | Nonestep(closure: _OptimizerClosure = None)Perform a single NAdam step.