fn

gradgradcheck

bool
gradgradcheck(func: Callable[..., Tensor | tuple[Tensor, ...]], inputs: Sequence[Tensor], grad_outputs: Sequence[Tensor] | None = None, eps: float = 1e-06, atol: float = 1e-05, rtol: float = 0.001, raise_exception: bool = True)
source

Verify second-order gradients via finite differences.

Most bugs in custom lucid.autograd.Function.backward implementations show up at the second-derivative level — the first-order gradient is consistent but the gradient of the gradient is not. gradgradcheck constructs such a test by wrapping func in a scalar-valued helper

f~(x)=i(f(x))i,\tilde f(x) = \sum_i (\nabla f(x))_i,

differentiates it analytically with create_graph=True, and then runs gradcheck on f~\tilde f so its gradient is compared against the central finite-difference estimate

f~(x+εek)f~(xεek)2ε2f(x)xk2.\frac{\tilde f(x + \varepsilon e_k) - \tilde f(x - \varepsilon e_k)}{2 \varepsilon} \approx \frac{\partial^2 f(x)}{\partial x_k^2}.

Disagreement signals a bug in the analytic backward formula that ordinary gradcheck would miss.

Parameters

funccallable
Function mapping Tensor inputs to a Tensor (or tuple of Tensor). Must be twice differentiable.
inputssequence of Tensor
Input tensors at which to verify the gradient. Floating dtype required.
grad_outputssequence of Tensor or None= None
Reserved for custom upstream gradients in the inner backward pass. Currently ignored — ones_like upstream gradients are always used.
epsfloat= 1e-06
Finite-difference step size used by the underlying gradcheck. Defaults to 1e-6.
atolfloat= 1e-05
Absolute tolerance for the comparison. Defaults to 1e-5.
rtolfloat= 0.001
Relative tolerance for the comparison. Defaults to 1e-3.
raise_exceptionbool= True
If True (default) raise AssertionError on mismatch; if False return False silently.

Returns

bool

True iff all second-order gradients agree with the finite-difference reference within the supplied tolerances.

Notes

The bound on the truncation error of central differences is

f~(x+ε)f~(xε)2εf~(x)=O(ε2),\left| \frac{\tilde f(x + \varepsilon) - \tilde f(x - \varepsilon)} {2 \varepsilon} - \tilde f'(x) \right| = O(\varepsilon^2),

so tightening eps improves accuracy until round-off error dominates.

Examples

>>> import lucid
>>> from lucid.autograd import gradgradcheck
>>> x = lucid.randn(3, requires_grad=True, dtype=lucid.float64)
>>> def f(x):
...     return (x ** 3).sum()
>>> gradgradcheck(f, [x])
True