class
Adamax
extends
OptimizerAdamax(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 0.002, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0)Adamax optimizer — a variant of Adam based on the infinity norm.
Adamax generalises Adam by using the norm instead of the norm for the second-moment estimate. The update rule replaces with the element-wise maximum of past absolute gradients scaled by :
Because is bounded by , the effective step size is naturally bounded.
Parameters
paramsiterable of Parameter or iterable of dictParameters to optimise, or a list of parameter-group dicts.
lrfloat= 0.002Learning rate (default:
2e-3).betastuple of float= (0.9, 0.999)Coefficients for the first-moment
estimate and the norm decay
(default:
(0.9, 0.999)).epsfloat= 1e-08Term added to the denominator for numerical
stability (default:
1e-8).weight_decayfloat= 0L2 regularisation coefficient (default:
0).Attributes
param_groupslist of dictParameter groups with keys
"params", "lr", "beta1",
"beta2", "eps", and "weight_decay".defaultsdictDefault hyperparameter values.
Notes
Adamax can be more stable than Adam on problems where gradients are sparse or have large outliers, because the infinity norm is less sensitive to large individual gradient magnitudes than the L2 norm.
Examples
>>> import lucid.optim as optim
>>> optimizer = optim.Adamax(model.parameters(), lr=2e-3)
>>> optimizer.zero_grad()
>>> loss.backward()
>>> optimizer.step()Methods (2)
dunder
__init__
→None__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 0.002, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0)Initialise the Adamax. See the class docstring for parameter semantics.
fn
step
→Tensor | Nonestep(closure: _OptimizerClosure = None)Perform a single Adamax step.