class

Adadelta

extendsOptimizer
Adadelta(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, rho: float = 0.9, eps: float = 1e-06, weight_decay: float = 0)
source

Adadelta optimizer — an adaptive learning rate method with no global LR.

Adadelta addresses Adagrad's aggressive, monotonically decreasing learning rate by limiting the accumulated past gradients to a fixed-size window via exponential moving averages. No global learning rate is required in the canonical form:

E[g2]t=ρE[g2]t1+(1ρ)gt2Δθt=E[Δθ2]t1+ϵE[g2]t+ϵgtE[Δθ2]t=ρE[Δθ2]t1+(1ρ)Δθt2θt=θt1+ηΔθt\begin{aligned} E[g^2]_t &= \rho \, E[g^2]_{t-1} + (1 - \rho) \, g_t^2 \\ \Delta\theta_t &= -\frac{\sqrt{E[\Delta\theta^2]_{t-1} + \epsilon}} {\sqrt{E[g^2]_t + \epsilon}} \, g_t \\ E[\Delta\theta^2]_t &= \rho \, E[\Delta\theta^2]_{t-1} + (1 - \rho) \, \Delta\theta_t^2 \\ \theta_t &= \theta_{t-1} + \eta \, \Delta\theta_t \end{aligned}

where η\eta is lr (defaults to 1.0 in the original formulation) and ρ\rho controls the decay window size.

Parameters

paramsiterable of Parameter or iterable of dict
Parameters to optimise, or a list of parameter-group dicts.
lrfloat= 1.0
Scaling factor η\eta applied to the update (default: 1.0).
rhofloat= 0.9
Coefficient for the running averages of squared gradients and squared updates ρ\rho (default: 0.9).
epsfloat= 1e-06
Term ϵ\epsilon added to the denominator for numerical stability (default: 1e-6).
weight_decayfloat= 0
L2 regularisation coefficient (default: 0).

Attributes

param_groupslist of dict
Parameter groups with keys "params", "lr", "rho", "eps", and "weight_decay".
defaultsdict
Default hyperparameter values.

Notes

Because Adadelta automatically adapts its learning rate based on a window of recent gradient magnitudes, it is relatively robust to hyperparameter choices and does not require manual LR tuning.

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.Adadelta(model.parameters(), rho=0.9, eps=1e-6)
>>> optimizer.zero_grad()
>>> loss.backward()
>>> optimizer.step()

Methods (2)

dunder

__init__

None
__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, rho: float = 0.9, eps: float = 1e-06, weight_decay: float = 0)
source

Initialise the Adadelta. See the class docstring for parameter semantics.

fn

step

Tensor | None
step(closure: _OptimizerClosure = None)
source

Perform a single Adadelta step.