CosineAnnealingWarmRestarts
_LRSchedulerCosineAnnealingWarmRestarts(optimizer: Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0.0, last_epoch: int = -1, verbose: bool = False)Cosine annealing with periodic warm restarts (SGDR).
Implements the Stochastic Gradient Descent with Warm Restarts (SGDR)
schedule. Within each restart cycle of length epochs the
learning rate follows a cosine curve from the base LR down to
eta_min, then restarts:
where is the step count within the current
cycle and is the current cycle length. After each full
cycle the cycle length is multiplied by T_mult:
Parameters
optimizerOptimizerT_0intT_multint= 11, i.e. all cycles have the same length).eta_minfloat= 0.00.0).last_epochint= -1-1).verbosebool= FalseTrue (default: False).Attributes
T_0intT_multinteta_minfloatNotes
With T_mult=1 every cycle has the same length T_0. With
T_mult=2 cycle lengths double after each restart: T_0,
2*T_0, 4*T_0, … Longer later cycles are useful because the
model can refine a good basin found in earlier cycles.
Examples
>>> import lucid.optim as optim
>>> optimizer = optim.SGD(model.parameters(), lr=0.1)
>>> scheduler = optim.CosineAnnealingWarmRestarts(
... optimizer, T_0=10, T_mult=2, eta_min=1e-5
... )
>>> for epoch in range(80):
... train(...)
... optimizer.step()
... scheduler.step()Methods (3)
__init__
→None__init__(optimizer: Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0.0, last_epoch: int = -1, verbose: bool = False)Initialise the CosineAnnealingWarmRestarts. See the class docstring for parameter semantics.
step
→Nonestep()Advance the scheduler by one step and update the optimizer learning rates.
Notes
Should be called after the optimizer's .step() at the end of each
epoch (or each iteration, depending on the schedule).
get_lr
→list[float]get_lr()Compute the learning rate for each parameter group at the current step.
Returns
list[float]One learning rate per param group, derived from the schedule formula documented in the class docstring.