class

CosineAnnealingWarmRestarts

extends_LRScheduler

CosineAnnealingWarmRestarts(optimizer: Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0.0, last_epoch: int = -1, verbose: bool = False)

source

Cosine annealing with periodic warm restarts (SGDR).

Implements the Stochastic Gradient Descent with Warm Restarts (SGDR) schedule. Within each restart cycle of length $T_i$ epochs the learning rate follows a cosine curve from the base LR down to eta_min, then restarts:

\eta_t = \eta_{\min} + \frac{1}{2}(\eta_{\max} - \eta_{\min}) \left(1 + \cos\!\left( \frac{\pi\, T_{\text{cur}}}{T_i} \right)\right)

where $T_{\text{cur}}$ is the step count within the current cycle and $T_i$ is the current cycle length. After each full cycle the cycle length is multiplied by T_mult:

T_{i+1} = T_i \cdot T_{\text{mult}}

Parameters

optimizerOptimizer

Wrapped optimizer.

T_0int

Length (in epochs) of the first restart cycle.

T_multint= 1

Factor by which the cycle length is multiplied after each restart (default: 1, i.e. all cycles have the same length).

eta_minfloat= 0.0

Minimum learning rate at the bottom of each cosine curve (default: 0.0).

last_epochint= -1

The index of the last epoch (default: -1).

verbosebool= False

Print the updated LR after each step if True (default: False).

Attributes

T_0int

Initial cycle length.

T_multint

Cycle-length multiplier applied after each restart.

eta_minfloat

Lower bound on the learning rate.

Notes

With T_mult=1 every cycle has the same length T_0. With T_mult=2 cycle lengths double after each restart: T_0, 2*T_0, 4*T_0, … Longer later cycles are useful because the model can refine a good basin found in earlier cycles.

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.SGD(model.parameters(), lr=0.1)
>>> scheduler = optim.CosineAnnealingWarmRestarts(
...     optimizer, T_0=10, T_mult=2, eta_min=1e-5
... )
>>> for epoch in range(80):
...     train(...)
...     optimizer.step()
...     scheduler.step()

Methods (3)

dunder

init

→None

__init__(optimizer: Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0.0, last_epoch: int = -1, verbose: bool = False)

source

Initialise the CosineAnnealingWarmRestarts. See the class docstring for parameter semantics.

step

→None

step()

source

Advance the scheduler by one step and update the optimizer learning rates.

Notes

Should be called after the optimizer's .step() at the end of each epoch (or each iteration, depending on the schedule).

get_lr

→list[float]

get_lr()

source

Compute the learning rate for each parameter group at the current step.

Returns

list[float]

One learning rate per param group, derived from the schedule formula documented in the class docstring.