class

CosineAnnealingWarmRestarts

extends_LRScheduler
CosineAnnealingWarmRestarts(optimizer: Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0.0, last_epoch: int = -1, verbose: bool = False)
source

Cosine annealing with periodic warm restarts (SGDR).

Implements the Stochastic Gradient Descent with Warm Restarts (SGDR) schedule. Within each restart cycle of length TiT_i epochs the learning rate follows a cosine curve from the base LR down to eta_min, then restarts:

ηt=ηmin+12(ηmaxηmin)(1+cos ⁣(πTcurTi))\eta_t = \eta_{\min} + \frac{1}{2}(\eta_{\max} - \eta_{\min}) \left(1 + \cos\!\left( \frac{\pi\, T_{\text{cur}}}{T_i} \right)\right)

where TcurT_{\text{cur}} is the step count within the current cycle and TiT_i is the current cycle length. After each full cycle the cycle length is multiplied by T_mult:

Ti+1=TiTmultT_{i+1} = T_i \cdot T_{\text{mult}}

Parameters

optimizerOptimizer
Wrapped optimizer.
T_0int
Length (in epochs) of the first restart cycle.
T_multint= 1
Factor by which the cycle length is multiplied after each restart (default: 1, i.e. all cycles have the same length).
eta_minfloat= 0.0
Minimum learning rate at the bottom of each cosine curve (default: 0.0).
last_epochint= -1
The index of the last epoch (default: -1).
verbosebool= False
Print the updated LR after each step if True (default: False).

Attributes

T_0int
Initial cycle length.
T_multint
Cycle-length multiplier applied after each restart.
eta_minfloat
Lower bound on the learning rate.

Notes

With T_mult=1 every cycle has the same length T_0. With T_mult=2 cycle lengths double after each restart: T_0, 2*T_0, 4*T_0, … Longer later cycles are useful because the model can refine a good basin found in earlier cycles.

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.SGD(model.parameters(), lr=0.1)
>>> scheduler = optim.CosineAnnealingWarmRestarts(
...     optimizer, T_0=10, T_mult=2, eta_min=1e-5
... )
>>> for epoch in range(80):
...     train(...)
...     optimizer.step()
...     scheduler.step()

Methods (3)

dunder

__init__

None
__init__(optimizer: Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0.0, last_epoch: int = -1, verbose: bool = False)
source

Initialise the CosineAnnealingWarmRestarts. See the class docstring for parameter semantics.

fn

step

None
step()
source

Advance the scheduler by one step and update the optimizer learning rates.

Notes

Should be called after the optimizer's .step() at the end of each epoch (or each iteration, depending on the schedule).

fn

get_lr

list[float]
get_lr()
source

Compute the learning rate for each parameter group at the current step.

Returns

list[float]

One learning rate per param group, derived from the schedule formula documented in the class docstring.