OneCycleLR
_LRSchedulerOneCycleLR(optimizer: Optimizer, max_lr: float, total_steps: int, pct_start: float = 0.3, anneal_strategy: str = 'cos', div_factor: float = 25.0, final_div_factor: float = 10000.0, last_epoch: int = -1, verbose: bool = False)1Cycle learning rate policy.
Implements the 1cycle policy: the learning rate first rises from an
initial value to max_lr over a warmup phase, then anneals down to
a minimum value over the remaining steps.
Three special learning rates are derived from max_lr:
The schedule has two phases:
- Warmup (first
pct_start * total_stepssteps): anneal from up to . - Cooldown (remaining steps): anneal from down to .
Each phase uses either cosine or linear annealing depending on
anneal_strategy.
Parameters
optimizerOptimizermax_lrfloattotal_stepsintpct_startfloat= 0.3total_steps devoted to the warmup phase
(default: 0.3).anneal_strategystr= 'cos'"cos" for cosine annealing or
"linear" for linear annealing (default: "cos").div_factorfloat= 25.0max_lr / div_factor
(default: 25.0).final_div_factorfloat= 10000.0initial_lr / final_div_factor
(default: 1e4).last_epochint= -1-1).verbosebool= FalseTrue (default: False).Attributes
max_lrfloattotal_stepsintpct_startfloatanneal_strategystr"cos" or "linear".div_factorfloatfinal_div_factorfloatNotes
The 1cycle policy should be called once per training step (per batch), not once per epoch. It is designed for super-convergence and often allows training with much larger learning rates than standard schedules.
Examples
>>> import lucid.optim as optim
>>> optimizer = optim.SGD(model.parameters(), lr=0.1)
>>> scheduler = optim.OneCycleLR(
... optimizer, max_lr=0.1, total_steps=len(dataloader) * epochs
... )
>>> for batch in dataloader:
... train_step(batch)
... optimizer.step()
... scheduler.step()Methods (2)
__init__
→None__init__(optimizer: Optimizer, max_lr: float, total_steps: int, pct_start: float = 0.3, anneal_strategy: str = 'cos', div_factor: float = 25.0, final_div_factor: float = 10000.0, last_epoch: int = -1, verbose: bool = False)Initialise the OneCycleLR. See the class docstring for parameter semantics.
get_lr
→list[float]get_lr()Compute the learning rate for each parameter group at the current step.
Returns
list[float]One learning rate per param group, derived from the schedule formula documented in the class docstring.