fn

clip_grad_norm_

Tensor
clip_grad_norm_(parameters: Iterable[Parameter], max_norm: float, norm_type: float = 2.0, error_if_nonfinite: bool = False)
source

Clip the global gradient norm of parameters in place.

Rescales every gradient so that the total p\ell_p norm — computed across all parameters jointly, as if they were one long concatenated vector — is at most max_norm. A staple of stable Transformer / RNN training: prevents the occasional huge gradient from derailing optimisation.

Parameters

parametersiterable of Parameter
Parameters whose .grad should be clipped. Entries with grad is None are silently skipped.
max_normfloat
Maximum allowed norm of the combined gradient vector. The scaling factor never exceeds 1 — gradients smaller than max_norm are untouched.
norm_typefloat= 2.0
Order pp of the norm. Default 2.0 (Euclidean). Pass math.inf for the max-norm (element-wise absolute maximum across all gradients).
error_if_nonfinitebool= False
If True, raise RuntimeError when the computed total norm is inf or nan instead of silently scaling by a non-finite coefficient.

Returns

Tensor

Scalar tensor holding the pre-clipping total norm. Useful for logging the gradient magnitude during training even when no actual clipping took place.

Notes

With combined norm gp=(igip)1/p\|g\|_p = \left(\sum_i |g_i|^p\right)^{1/p} taken over every element of every gradient, the update is

g    gmin ⁣(1,max_normgp+ϵ),g \;\mapsto\; g \cdot \min\!\left(1,\, \frac{\text{max\_norm}}{\|g\|_p + \epsilon}\right),

where the ϵ=106\epsilon = 10^{-6} guards against division by zero when all gradients vanish. Because every parameter is scaled by the same coefficient the direction of the global update is preserved — only its magnitude is bounded.

Examples

>>> import lucid
>>> from lucid.nn.utils import clip_grad_norm_
>>> # after loss.backward() ...
>>> total_norm = clip_grad_norm_(model.parameters(), max_norm=1.0)