class

Adadelta

extendsOptimizer

Adadelta(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, rho: float = 0.9, eps: float = 1e-06, weight_decay: float = 0)

source

Adadelta optimizer — an adaptive learning rate method with no global LR.

Adadelta addresses Adagrad's aggressive, monotonically decreasing learning rate by limiting the accumulated past gradients to a fixed-size window via exponential moving averages. No global learning rate is required in the canonical form:

\begin{aligned} E[g^2]_t &= \rho \, E[g^2]_{t-1} + (1 - \rho) \, g_t^2 \\ \Delta\theta_t &= -\frac{\sqrt{E[\Delta\theta^2]_{t-1} + \epsilon}} {\sqrt{E[g^2]_t + \epsilon}} \, g_t \\ E[\Delta\theta^2]_t &= \rho \, E[\Delta\theta^2]_{t-1} + (1 - \rho) \, \Delta\theta_t^2 \\ \theta_t &= \theta_{t-1} + \eta \, \Delta\theta_t \end{aligned}

where $\eta$ is lr (defaults to 1.0 in the original formulation) and $\rho$ controls the decay window size.

Parameters

paramsiterable of Parameter or iterable of dict

Parameters to optimise, or a list of parameter-group dicts.

lrfloat= 1.0

Scaling factor

\eta

applied to the update (default: 1.0).

rhofloat= 0.9

Coefficient for the running averages of squared gradients and squared updates

\rho

(default: 0.9).

epsfloat= 1e-06

Term

\epsilon

added to the denominator for numerical stability (default: 1e-6).

weight_decayfloat= 0

L2 regularisation coefficient (default: 0).

Attributes

param_groupslist of dict

Parameter groups with keys "params", "lr", "rho", "eps", and "weight_decay".

defaultsdict

Default hyperparameter values.

Notes

Because Adadelta automatically adapts its learning rate based on a window of recent gradient magnitudes, it is relatively robust to hyperparameter choices and does not require manual LR tuning.

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.Adadelta(model.parameters(), rho=0.9, eps=1e-6)
>>> optimizer.zero_grad()
>>> loss.backward()
>>> optimizer.step()

Methods (2)

dunder

init

→None

__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, rho: float = 0.9, eps: float = 1e-06, weight_decay: float = 0)

source

Initialise the Adadelta. See the class docstring for parameter semantics.

step

→Tensor | None

step(closure: _OptimizerClosure = None)

source

Perform a single Adadelta step.