class
Adadelta
extends
OptimizerAdadelta(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, rho: float = 0.9, eps: float = 1e-06, weight_decay: float = 0)Adadelta optimizer — an adaptive learning rate method with no global LR.
Adadelta addresses Adagrad's aggressive, monotonically decreasing learning rate by limiting the accumulated past gradients to a fixed-size window via exponential moving averages. No global learning rate is required in the canonical form:
where is lr (defaults to 1.0 in the original
formulation) and controls the decay window size.
Parameters
paramsiterable of Parameter or iterable of dictParameters to optimise, or a list of parameter-group dicts.
lrfloat= 1.0Scaling factor applied to the update (default:
1.0).rhofloat= 0.9Coefficient for the running averages of squared gradients and
squared updates (default:
0.9).epsfloat= 1e-06Term added to the denominator for numerical
stability (default:
1e-6).weight_decayfloat= 0L2 regularisation coefficient (default:
0).Attributes
param_groupslist of dictParameter groups with keys
"params", "lr", "rho",
"eps", and "weight_decay".defaultsdictDefault hyperparameter values.
Notes
Because Adadelta automatically adapts its learning rate based on a window of recent gradient magnitudes, it is relatively robust to hyperparameter choices and does not require manual LR tuning.
Examples
>>> import lucid.optim as optim
>>> optimizer = optim.Adadelta(model.parameters(), rho=0.9, eps=1e-6)
>>> optimizer.zero_grad()
>>> loss.backward()
>>> optimizer.step()Methods (2)
dunder
__init__
→None__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, rho: float = 0.9, eps: float = 1e-06, weight_decay: float = 0)Initialise the Adadelta. See the class docstring for parameter semantics.
fn
step
→Tensor | Nonestep(closure: _OptimizerClosure = None)Perform a single Adadelta step.