class

ASGD

extendsOptimizer
ASGD(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 0.01, lambd: float = 0.0001, alpha: float = 0.75, t0: float = 1000000.0, weight_decay: float = 0)
source

Averaged Stochastic Gradient Descent optimizer.

ASGD performs standard SGD updates but maintains a running average of the iterate sequence, which serves as the final parameter estimate. The averaging improves convergence in the presence of noise and is particularly effective near the end of training.

The SGD update with L2 regularisation is:

θt=θt1ηt(gt+λθt1)\theta_t = \theta_{t-1} - \eta_t \bigl(g_t + \lambda \, \theta_{t-1}\bigr)

where the effective learning rate decays as:

ηt=η0(1+λη0t)α\eta_t = \frac{\eta_0}{(1 + \lambda \, \eta_0 \, t)^\alpha}

The Polyak–Ruppert average is then:

θˉt=1tt0k=t0tθkfor tt0\bar{\theta}_t = \frac{1}{t - t_0} \sum_{k=t_0}^{t} \theta_k \quad \text{for } t \ge t_0

Parameters

paramsiterable of Parameter or iterable of dict
Parameters to optimise, or a list of parameter-group dicts.
lrfloat= 0.01
Initial learning rate η0\eta_0 (default: 1e-2).
lambdfloat= 0.0001
Decay term λ\lambda (default: 1e-4).
alphafloat= 0.75
Power for LR decay α\alpha (default: 0.75).
t0float= 1000000.0
Step at which averaging begins (default: 1e6).
weight_decayfloat= 0
L2 regularisation coefficient (default: 0).

Attributes

param_groupslist of dict
Parameter groups with keys "params", "lr", "lambd", "alpha", "t0", and "weight_decay".
defaultsdict
Default hyperparameter values.

Notes

ASGD can match or exceed the convergence rate of SGD with careful learning-rate tuning, and the averaging step provides additional regularisation. The default t0=1e6 delays averaging until very late in training; reduce it to start averaging earlier.

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.ASGD(model.parameters(), lr=1e-2, t0=1e5)
>>> optimizer.zero_grad()
>>> loss.backward()
>>> optimizer.step()

Methods (2)

dunder

__init__

None
__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 0.01, lambd: float = 0.0001, alpha: float = 0.75, t0: float = 1000000.0, weight_decay: float = 0)
source

Initialise the ASGD. See the class docstring for parameter semantics.

fn

step

Tensor | None
step(closure: _OptimizerClosure = None)
source

Perform a single ASGD step.