clip_grad_norm_
→Tensorclip_grad_norm_(parameters: Iterable[Parameter], max_norm: float, norm_type: float = 2.0, error_if_nonfinite: bool = False)Clip the global gradient norm of parameters in place.
Rescales every gradient so that the total norm —
computed across all parameters jointly, as if they were one long
concatenated vector — is at most max_norm. A staple of stable
Transformer / RNN training: prevents the occasional huge gradient
from derailing optimisation.
Parameters
parametersiterable of Parameter.grad should be clipped. Entries with
grad is None are silently skipped.max_normfloat1 — gradients smaller than
max_norm are untouched.norm_typefloat= 2.02.0 (Euclidean). Pass
math.inf for the max-norm (element-wise absolute maximum
across all gradients).error_if_nonfinitebool= FalseTrue, raise RuntimeError when the computed total
norm is inf or nan instead of silently scaling by a
non-finite coefficient.Returns
TensorScalar tensor holding the pre-clipping total norm. Useful for logging the gradient magnitude during training even when no actual clipping took place.
Notes
With combined norm taken over every element of every gradient, the update is
where the guards against division by zero when all gradients vanish. Because every parameter is scaled by the same coefficient the direction of the global update is preserved — only its magnitude is bounded.
Examples
>>> import lucid
>>> from lucid.nn.utils import clip_grad_norm_
>>> # after loss.backward() ...
>>> total_norm = clip_grad_norm_(model.parameters(), max_norm=1.0)