class

LBFGS

extendsOptimizer
LBFGS(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, max_iter: int = 20, max_eval: int = 25, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn: str | None = 'strong_wolfe')
source

Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) optimizer.

L-BFGS is a quasi-Newton method that approximates the inverse Hessian using a limited history of gradient and parameter difference vectors. At each step it computes a search direction dtd_t via the two-loop recursion:

dt=Ht1L(θt)d_t = -H_t^{-1} \nabla L(\theta_t)

where Ht1H_t^{-1} is the L-BFGS Hessian approximation built from the last history_size curvature pairs {(sk,yk)}k=tmt1\{(s_k, y_k)\}_{k=t-m}^{t-1}:

sk=θk+1θkyk=L(θk+1)L(θk)\begin{aligned} s_k &= \theta_{k+1} - \theta_k \\ y_k &= \nabla L(\theta_{k+1}) - \nabla L(\theta_k) \end{aligned}

The diagonal scaling of Ht1H_t^{-1} is initialised as:

Hdiag=st1yt1yt1yt1H_{\text{diag}} = \frac{s_{t-1}^\top y_{t-1}}{y_{t-1}^\top y_{t-1}}

A back-tracking Armijo line search finds a step size α\alpha that satisfies the sufficient-decrease condition:

L(θt+αdt)L(θt)+c1αL(θt)dtL(\theta_t + \alpha d_t) \le L(\theta_t) + c_1 \alpha \, \nabla L(\theta_t)^\top d_t

with c1=104c_1 = 10^{-4}.

Parameters

paramsiterable of Parameter or iterable of dict
Parameters to optimise.
lrfloat= 1.0
Initial step size for the line search (default: 1.0).
max_iterint= 20
Maximum number of L-BFGS iterations per step call (default: 20).
max_evalint= 25
Maximum number of closure evaluations per step call (default: 25).
tolerance_gradfloat= 1e-07
Gradient-norm convergence threshold; optimisation stops when L2tolerance_grad\|\nabla L\|_2 \le \text{tolerance\_grad} (default: 1e-7).
tolerance_changefloat= 1e-09
Parameter-change convergence threshold (default: 1e-9).
history_sizeint= 100
Number of (s,y)(s, y) curvature pairs retained in memory (default: 100).
line_search_fnstr or None= 'strong_wolfe'
Line search strategy. Currently "strong_wolfe" (back-tracking Armijo) and None (fixed step) are recognised (default: "strong_wolfe").

Attributes

param_groupslist of dict
Single parameter group containing all parameters.
defaultsdict
Default hyperparameter values.

Notes

Unlike first-order optimizers, L-BFGS requires a closure argument in step that clears gradients, computes the loss, and calls loss.backward(). Without a closure the method raises ValueError.

L-BFGS is best suited for full-batch or large-batch training where the curvature information is reliable. It is not recommended for stochastic mini-batch training because noisy gradients corrupt the Hessian approximation.

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.LBFGS(model.parameters(), lr=1.0, max_iter=20)
>>> def closure():
...     optimizer.zero_grad()
...     loss = criterion(model(x), y)
...     loss.backward()
...     return loss
>>> optimizer.step(closure)

Methods (3)

dunder

__init__

None
__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, max_iter: int = 20, max_eval: int = 25, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn: str | None = 'strong_wolfe')
source

Initialise the LBFGS. See the class docstring for parameter semantics.

fn

zero_grad

None
zero_grad(set_to_none: bool = True)
source

Set gradients of all parameters to None.

L-BFGS always sets gradients to None regardless of the set_to_none argument, because the closure passed to step is responsible for zeroing and recomputing gradients on each function evaluation.

Parameters

set_to_nonebool= True
Ignored; kept for API compatibility with Optimizer (default: True).

Examples

>>> def closure():
...     optimizer.zero_grad()
...     loss = criterion(model(x), y)
...     loss.backward()
...     return loss
>>> optimizer.step(closure)
fn

step

Tensor
step(closure: _OptimizerClosure = None)
source

Perform a single L-BFGS optimisation step.

Computes the L-BFGS search direction using the two-loop recursion, performs a back-tracking Armijo line search to find an acceptable step size, updates all parameters, and then updates the curvature history (s,y)(s, y).

Parameters

closurecallable= None
A zero-argument callable that:
  1. Calls optimizer.zero_grad() to clear stale gradients.
  2. Runs the forward pass and computes the scalar loss.
  3. Calls loss.backward() to populate gradients.
  4. Returns the loss tensor.
This argument is required — passing None raises ValueError.

Returns

Tensor

The loss value at the final parameter position after the line search.

Raises

ValueError
If closure is None.

Notes

The closure may be called multiple times per step call (up to max_eval times) during the line search. Ensure that any side effects (e.g. batch norm running stats) are handled appropriately if this matters for your use-case.

Examples

>>> def closure():
...     optimizer.zero_grad()
...     output = model(x)
...     loss = criterion(output, y)
...     loss.backward()
...     return loss
>>> optimizer.step(closure)