lr_scheduler.NoamScheduler¶
- class lucid.optim.lr_scheduler.NoamScheduler(optimizer: Optimizer, model_size: int, warmup_steps: int, factor: float = 1.0, last_epoch: int = -1, verbose: bool = False)¶
The NoamScheduler implements the warmup + inverse square-root decay strategy popularized by the Transformer. It gradually increases the learning rate during the warmup window and then decays it proportionally to \(1 / \sqrt{t}\).
Class Signature¶
class NoamScheduler(
optimizer: Optimizer,
model_size: int,
warmup_steps: int,
factor: float = 1.0,
last_epoch: int = -1,
verbose: bool = False,
)
Parameters¶
optimizer (Optimizer): The optimizer whose learning rate is controlled.
model_size (int): Typically the Transformer hidden dimension. Used to scale the schedule.
warmup_steps (int): Number of steps over which to linearly ramp up the learning rate.
factor (float, optional): Global scaling factor for the learning rate curve. Default: 1.0.
last_epoch (int, optional): Index of the last epoch when resuming training. Default: -1.
verbose (bool, optional): If True, logs learning rate changes every step. Default: False.
Mathematical Formula¶
The learning rate at step \(t\) is:
Where: - \(t\) is the current step (1-indexed). - \(\eta_t\) is the scaled learning rate factor.
Methods¶
get_lr() -> list[float]: Returns the scaled learning rates for each optimizer parameter group.
step(epoch: Optional[int] = None) -> None: Advances the scheduler, updating optimizer learning rates.
Usage Example¶
import lucid.optim as optim
from lucid.optim.lr_scheduler import NoamScheduler
optimizer = optim.Adam(model.parameters(), lr=1.0)
scheduler = NoamScheduler(
optimizer,
model_size=512,
warmup_steps=4000,
factor=2.0,
)
for step in range(1, 10001):
optimizer.step()
scheduler.step()
if step % 1000 == 0:
print(f"Step {step}, Learning Rate: {scheduler.last_lr}")
Note
Noam scheduling is effective for Transformer-style architectures where large model dimensions benefit from warmup to stabilize early training.