class

OneCycleLR

extends_LRScheduler
OneCycleLR(optimizer: Optimizer, max_lr: float, total_steps: int, pct_start: float = 0.3, anneal_strategy: str = 'cos', div_factor: float = 25.0, final_div_factor: float = 10000.0, last_epoch: int = -1, verbose: bool = False)
source

1Cycle learning rate policy.

Implements the 1cycle policy: the learning rate first rises from an initial value to max_lr over a warmup phase, then anneals down to a minimum value over the remaining steps.

Three special learning rates are derived from max_lr:

ηinit=ηmaxdiv_factorηmin=ηinitfinal_div_factor\begin{aligned} \eta_{\text{init}} &= \frac{\eta_{\max}}{\text{div\_factor}} \\ \eta_{\min} &= \frac{\eta_{\text{init}}}{\text{final\_div\_factor}} \end{aligned}

The schedule has two phases:

  1. Warmup (first pct_start * total_steps steps): anneal from ηinit\eta_{\text{init}} up to ηmax\eta_{\max}.
  2. Cooldown (remaining steps): anneal from ηmax\eta_{\max} down to ηmin\eta_{\min}.

Each phase uses either cosine or linear annealing depending on anneal_strategy.

Parameters

optimizerOptimizer
Wrapped optimizer.
max_lrfloat
Peak learning rate reached at the end of the warmup phase.
total_stepsint
Total number of steps (batches) in the training run.
pct_startfloat= 0.3
Fraction of total_steps devoted to the warmup phase (default: 0.3).
anneal_strategystr= 'cos'
Annealing function: "cos" for cosine annealing or "linear" for linear annealing (default: "cos").
div_factorfloat= 25.0
Determines the initial LR as max_lr / div_factor (default: 25.0).
final_div_factorfloat= 10000.0
Determines the minimum LR as initial_lr / final_div_factor (default: 1e4).
last_epochint= -1
The index of the last epoch (default: -1).
verbosebool= False
Print the updated LR after each step if True (default: False).

Attributes

max_lrfloat
Peak learning rate.
total_stepsint
Total training steps.
pct_startfloat
Warmup fraction.
anneal_strategystr
"cos" or "linear".
div_factorfloat
Initial LR divisor.
final_div_factorfloat
Minimum LR divisor relative to initial LR.

Notes

The 1cycle policy should be called once per training step (per batch), not once per epoch. It is designed for super-convergence and often allows training with much larger learning rates than standard schedules.

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.SGD(model.parameters(), lr=0.1)
>>> scheduler = optim.OneCycleLR(
...     optimizer, max_lr=0.1, total_steps=len(dataloader) * epochs
... )
>>> for batch in dataloader:
...     train_step(batch)
...     optimizer.step()
...     scheduler.step()

Methods (2)

dunder

__init__

None
__init__(optimizer: Optimizer, max_lr: float, total_steps: int, pct_start: float = 0.3, anneal_strategy: str = 'cos', div_factor: float = 25.0, final_div_factor: float = 10000.0, last_epoch: int = -1, verbose: bool = False)
source

Initialise the OneCycleLR. See the class docstring for parameter semantics.

fn

get_lr

list[float]
get_lr()
source

Compute the learning rate for each parameter group at the current step.

Returns

list[float]

One learning rate per param group, derived from the schedule formula documented in the class docstring.