gradgradcheck

→bool

gradgradcheck(func: Callable[..., Tensor | tuple[Tensor, ...]], inputs: Sequence[Tensor], grad_outputs: Sequence[Tensor] | None = None, eps: float = 1e-06, atol: float = 1e-05, rtol: float = 0.001, raise_exception: bool = True)

source edit

Verify second-order gradients via finite differences.

Most bugs in custom lucid.autograd.Function.backward implementations show up at the second-derivative level — the first-order gradient is consistent but the gradient of the gradient is not. gradgradcheck constructs such a test by wrapping func in a scalar-valued helper

\tilde f(x) = \sum_i (\nabla f(x))_i,

differentiates it analytically with create_graph=True, and then runs gradcheck on $\tilde f$ so its gradient is compared against the central finite-difference estimate

\frac{\tilde f(x + \varepsilon e_k) - \tilde f(x - \varepsilon e_k)}{2 \varepsilon} \approx \frac{\partial^2 f(x)}{\partial x_k^2}.

Disagreement signals a bug in the analytic backward formula that ordinary gradcheck would miss.

Parameters

funccallable

Function mapping Tensor inputs to a Tensor (or tuple of Tensor). Must be twice differentiable.

inputssequence of Tensor

Input tensors at which to verify the gradient. Floating dtype required.

grad_outputssequence of Tensor or None= None

Reserved for custom upstream gradients in the inner backward pass. Currently ignored — ones_like upstream gradients are always used.

epsfloat= 1e-06

Finite-difference step size used by the underlying gradcheck. Defaults to 1e-6.

atolfloat= 1e-05

Absolute tolerance for the comparison. Defaults to 1e-5.

rtolfloat= 0.001

Relative tolerance for the comparison. Defaults to 1e-3.

raise_exceptionbool= True

If True (default) raise AssertionError on mismatch; if False return False silently.

Returns

bool

True iff all second-order gradients agree with the finite-difference reference within the supplied tolerances.

Notes

The bound on the truncation error of central differences is

\left| \frac{\tilde f(x + \varepsilon) - \tilde f(x - \varepsilon)} {2 \varepsilon} - \tilde f'(x) \right| = O(\varepsilon^2),

so tightening eps improves accuracy until round-off error dominates.

Examples

>>> import lucid
>>> from lucid.autograd import gradgradcheck
>>> x = lucid.randn(3, requires_grad=True, dtype=lucid.float64)
>>> def f(x):
...     return (x ** 3).sum()
>>> gradgradcheck(f, [x])
True

Used by 1

lucid.autograd

gradgradcheck(func: Callable[..., Tensor | tuple[Tensor, ...]], inputs: Sequence[Tensor], grad_outputs: Sequence[Tensor] | None = None, eps: float = 1e-06, atol: float = 1e-05, rtol: float = 0.001, raise_exception: bool = True)

>>> import lucid >>> from lucid.autograd import gradgradcheck >>> x = lucid.randn(3, requires_grad=True, dtype=lucid.float64) >>> def f(x): ... return (x ** 3).sum() >>> gradgradcheck(f, [x]) True