SGD
OptimizerSGD(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float, momentum: float = 0, dampening: float = 0, weight_decay: float = 0, nesterov: bool = False)Stochastic Gradient Descent optimizer with optional momentum and weight decay.
Implements the classic SGD update rule. Without momentum the update is:
With momentum (Polyak momentum), a velocity buffer is maintained and the update becomes:
where is the momentum factor and is the dampening coefficient. With Nesterov momentum the gradient is evaluated at the lookahead position:
L2 weight decay adds to the gradient before the momentum step:
Parameters
paramsiterable of Parameter or iterable of dictlrfloatmomentumfloat= 00). Set to a value
such as 0.9 to enable momentum.dampeningfloat= 00). Has no effect when momentum=0.weight_decayfloat= 00).nesterovbool= FalseTrue, use Nesterov momentum (default: False).
Requires momentum > 0 and dampening == 0.Attributes
param_groupslist of dict"params", "lr",
"momentum", "dampening", "weight_decay", and
"nesterov".defaultsdictNotes
SGD with momentum is the de-facto standard for training image classifiers. Nesterov momentum often converges faster than vanilla momentum because it incorporates a correction based on where the parameters will be after the momentum step.
Examples
>>> import lucid.optim as optim
>>> optimizer = optim.SGD(
... model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4
... )
>>> optimizer.zero_grad()
>>> loss.backward()
>>> optimizer.step()Methods (2)
__init__
→None__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float, momentum: float = 0, dampening: float = 0, weight_decay: float = 0, nesterov: bool = False)Initialise the SGD. See the class docstring for parameter semantics.
step
→Tensor | Nonestep(closure: _OptimizerClosure = None)Perform a single SGD step.