class

LBFGS

extendsOptimizer

LBFGS(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, max_iter: int = 20, max_eval: int = 25, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn: str | None = 'strong_wolfe')

source

Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) optimizer.

L-BFGS is a quasi-Newton method that approximates the inverse Hessian using a limited history of gradient and parameter difference vectors. At each step it computes a search direction $d_t$ via the two-loop recursion:

d_t = -H_t^{-1} \nabla L(\theta_t)

where $H_t^{-1}$ is the L-BFGS Hessian approximation built from the last history_size curvature pairs $\{(s_k, y_k)\}_{k=t-m}^{t-1}$ :

\begin{aligned} s_k &= \theta_{k+1} - \theta_k \\ y_k &= \nabla L(\theta_{k+1}) - \nabla L(\theta_k) \end{aligned}

The diagonal scaling of $H_t^{-1}$ is initialised as:

H_{\text{diag}} = \frac{s_{t-1}^\top y_{t-1}}{y_{t-1}^\top y_{t-1}}

A back-tracking Armijo line search finds a step size $\alpha$ that satisfies the sufficient-decrease condition:

L(\theta_t + \alpha d_t) \le L(\theta_t) + c_1 \alpha \, \nabla L(\theta_t)^\top d_t

with $c_1 = 10^{-4}$ .

Parameters

paramsiterable of Parameter or iterable of dict

Parameters to optimise.

lrfloat= 1.0

Initial step size for the line search (default: 1.0).

max_iterint= 20

Maximum number of L-BFGS iterations per step call (default: 20).

max_evalint= 25

Maximum number of closure evaluations per step call (default: 25).

tolerance_gradfloat= 1e-07

Gradient-norm convergence threshold; optimisation stops when

\|\nabla L\|_2 \le \text{tolerance\_grad}

(default: 1e-7).

tolerance_changefloat= 1e-09

Parameter-change convergence threshold (default: 1e-9).

history_sizeint= 100

Number of

(s, y)

curvature pairs retained in memory (default: 100).

line_search_fnstr or None= 'strong_wolfe'

Line search strategy. Currently "strong_wolfe" (back-tracking Armijo) and None (fixed step) are recognised (default: "strong_wolfe").

Attributes

param_groupslist of dict

Single parameter group containing all parameters.

defaultsdict

Default hyperparameter values.

Notes

Unlike first-order optimizers, L-BFGS requires a closure argument in step that clears gradients, computes the loss, and calls loss.backward(). Without a closure the method raises ValueError.

L-BFGS is best suited for full-batch or large-batch training where the curvature information is reliable. It is not recommended for stochastic mini-batch training because noisy gradients corrupt the Hessian approximation.

Examples

>>> import lucid.optim as optim
>>> optimizer = optim.LBFGS(model.parameters(), lr=1.0, max_iter=20)
>>> def closure():
...     optimizer.zero_grad()
...     loss = criterion(model(x), y)
...     loss.backward()
...     return loss
>>> optimizer.step(closure)

Methods (3)

dunder

init

→None

__init__(params: Iterable[Parameter] | Iterable[dict[str, object]], lr: float = 1.0, max_iter: int = 20, max_eval: int = 25, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn: str | None = 'strong_wolfe')

source

Initialise the LBFGS. See the class docstring for parameter semantics.

zero_grad

→None

zero_grad(set_to_none: bool = True)

source

Set gradients of all parameters to None.

L-BFGS always sets gradients to None regardless of the set_to_none argument, because the closure passed to step is responsible for zeroing and recomputing gradients on each function evaluation.

Parameters

set_to_nonebool= True

Ignored; kept for API compatibility with Optimizer (default: True).

Examples

>>> def closure():
...     optimizer.zero_grad()
...     loss = criterion(model(x), y)
...     loss.backward()
...     return loss
>>> optimizer.step(closure)

step

→Tensor

step(closure: _OptimizerClosure = None)

source

Perform a single L-BFGS optimisation step.

Computes the L-BFGS search direction using the two-loop recursion, performs a back-tracking Armijo line search to find an acceptable step size, updates all parameters, and then updates the curvature history $(s, y)$ .

Parameters

closurecallable= None

A zero-argument callable that:

Calls optimizer.zero_grad() to clear stale gradients.
Runs the forward pass and computes the scalar loss.
Calls loss.backward() to populate gradients.
Returns the loss tensor.

This argument is required — passing None raises ValueError.

Returns

Tensor

The loss value at the final parameter position after the line search.

Raises

ValueError

If closure is None.

Notes

The closure may be called multiple times per step call (up to max_eval times) during the line search. Ensure that any side effects (e.g. batch norm running stats) are handled appropriately if this matters for your use-case.

Examples

>>> def closure():
...     optimizer.zero_grad()
...     output = model(x)
...     loss = criterion(output, y)
...     loss.backward()
...     return loss
>>> optimizer.step(closure)