class

CosineAnnealingLR

extends_LRScheduler
CosineAnnealingLR(optimizer: Optimizer, T_max: int, eta_min: float = 0, last_epoch: int = -1, verbose: bool = False)
source

Anneal the learning rate following a cosine curve over T_max epochs.

The learning rate at epoch tt is:

ηt=ηmin+12(ηmaxηmin)(1+cos ⁣(πtTmax))\eta_t = \eta_{\min} + \frac{1}{2}(\eta_{\max} - \eta_{\min}) \left(1 + \cos\!\left(\frac{\pi\, t}{T_{\max}}\right)\right)

where ηmax\eta_{\max} is the initial learning rate captured from the optimizer (base_lr) and ηmin\eta_{\min} is the floor set by eta_min.

Parameters

optimizerOptimizer
Wrapped optimizer.
T_maxint
Half-period of the cosine cycle in epochs. After T_max epochs the learning rate reaches eta_min.
eta_minfloat= 0
Minimum learning rate (default: 0).
last_epochint= -1
The index of the last epoch (default: -1).
verbosebool= False
Print the updated LR after each step if True (default: False).

Attributes

T_maxint
Decay period in epochs.
eta_minfloat
Lower bound on the learning rate.

Notes

Cosine annealing produces a smooth, monotonically decreasing schedule that starts fast and slows near the minimum. It is widely used for training deep networks and pairs naturally with warm-restarts (see CosineAnnealingWarmRestarts).

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.SGD(model.parameters(), lr=0.1)
>>> scheduler = optim.CosineAnnealingLR(optimizer, T_max=100, eta_min=1e-5)
>>> for epoch in range(100):
...     train(...)
...     optimizer.step()
...     scheduler.step()

Methods (2)

dunder

__init__

None
__init__(optimizer: Optimizer, T_max: int, eta_min: float = 0, last_epoch: int = -1, verbose: bool = False)
source

Initialise the CosineAnnealingLR. See the class docstring for parameter semantics.

fn

get_lr

list[float]
get_lr()
source

Compute the learning rate for each parameter group at the current step.

Returns

list[float]

One learning rate per param group, derived from the schedule formula documented in the class docstring.