LBFGS
OptimizerLBFGS(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, max_iter: int = 20, max_eval: int = 25, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn: str | None = 'strong_wolfe')Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) optimizer.
L-BFGS is a quasi-Newton method that approximates the inverse Hessian using a limited history of gradient and parameter difference vectors. At each step it computes a search direction via the two-loop recursion:
where is the L-BFGS Hessian approximation built from
the last history_size curvature pairs
:
The diagonal scaling of is initialised as:
A back-tracking Armijo line search finds a step size that satisfies the sufficient-decrease condition:
with .
Parameters
paramsiterable of Parameter or iterable of dictlrfloat= 1.01.0).max_iterint= 20step call
(default: 20).max_evalint= 25step call
(default: 25).tolerance_gradfloat= 1e-071e-7).tolerance_changefloat= 1e-091e-9).history_sizeint= 100100).line_search_fnstr or None= 'strong_wolfe'"strong_wolfe" (back-tracking
Armijo) and None (fixed step) are recognised
(default: "strong_wolfe").Attributes
param_groupslist of dictdefaultsdictNotes
Unlike first-order optimizers, L-BFGS requires a closure argument
in step that clears gradients, computes the loss, and calls
loss.backward(). Without a closure the method raises
ValueError.
L-BFGS is best suited for full-batch or large-batch training where the curvature information is reliable. It is not recommended for stochastic mini-batch training because noisy gradients corrupt the Hessian approximation.
Examples
>>> import lucid.optim as optim
>>> optimizer = optim.LBFGS(model.parameters(), lr=1.0, max_iter=20)
>>> def closure():
... optimizer.zero_grad()
... loss = criterion(model(x), y)
... loss.backward()
... return loss
>>> optimizer.step(closure)Methods (3)
__init__
→None__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, max_iter: int = 20, max_eval: int = 25, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn: str | None = 'strong_wolfe')Initialise the LBFGS. See the class docstring for parameter semantics.
zero_grad
→Nonezero_grad(set_to_none: bool = True)Set gradients of all parameters to None.
L-BFGS always sets gradients to None regardless of the
set_to_none argument, because the closure passed to
step is responsible for zeroing and recomputing gradients
on each function evaluation.
Parameters
set_to_nonebool= TrueOptimizer
(default: True).Examples
>>> def closure():
... optimizer.zero_grad()
... loss = criterion(model(x), y)
... loss.backward()
... return loss
>>> optimizer.step(closure)step
→Tensorstep(closure: _OptimizerClosure = None)Perform a single L-BFGS optimisation step.
Computes the L-BFGS search direction using the two-loop recursion, performs a back-tracking Armijo line search to find an acceptable step size, updates all parameters, and then updates the curvature history .
Parameters
closurecallable= None- Calls
optimizer.zero_grad()to clear stale gradients. - Runs the forward pass and computes the scalar loss.
- Calls
loss.backward()to populate gradients. - Returns the loss tensor.
None raises
ValueError.Returns
TensorThe loss value at the final parameter position after the line search.
Raises
ValueErrorclosure is None.Notes
The closure may be called multiple times per step call
(up to max_eval times) during the line search. Ensure that
any side effects (e.g. batch norm running stats) are handled
appropriately if this matters for your use-case.
Examples
>>> def closure():
... optimizer.zero_grad()
... output = model(x)
... loss = criterion(output, y)
... loss.backward()
... return loss
>>> optimizer.step(closure)