nn.util.grad_norm¶
Function Signature¶
def grad_norm(parameters: Iterable[Tensor] | Tensor, norm_type: int = 2) -> Tensor
Parameters¶
parameters (Iterable[Tensor] | Tensor): Model parameters whose gradients will be measured. Parameters with grad is None are skipped.
norm_type (int, optional): Order of the p-norm. Use a positive integer number for \(p\) (e.g., 2 for L2). Default is 2.
Return Value¶
float: The global gradient p-norm computed across all provided parameters before any clipping or modification.
Mathematical Definition¶
Let the parameter set be \(\{\theta_i\}_{i=1}^N\) with associated gradient tensors \(\{g_i\}_{i=1}^N\), where each \(g_i\) has the same shape as \(\theta_i\). Define \(\operatorname{vec}(g_i)\) as the flattened vector of \(g_i\).
Global p-norm:
where for each parameter \(i\),
Note
The gradients are not modified by grad_norm();
it only measures the global magnitude.
Examples¶
import lucid
import lucid.nn as nn
class Tiny(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(8, 16)
self.fc2 = nn.Linear(16, 4)
def forward(self, x):
x = lucid.nn.functional.relu(self.fc1(x))
return self.fc2(x)
model = Tiny()
x = lucid.random.randn(32, 8) # (N, D)
y = lucid.random.randint(0, 4, size=(32,))
out = model(x)
loss = lucid.nn.functional.cross_entropy(out, y)
loss.backward()
n2 = nn.util.grad_norm(model.parameters(), norm_type=2) # L2 norm
print("L2:", n2)
Usage Tips¶
Tip
Use grad_norm() to monitor training stability. If the reported norm spikes,
consider applying lucid.nn.util.clip_grad_norm() right after backward().
Warning
Ensure all parameters belong to the same device when you subsequently use the value for device-sensitive logic.