class

BatchNorm2d

extends_BatchNormBase
BatchNorm2d(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Batch normalization over a 4-D input (N, C, H, W).

Normalises each channel across the batch and spatial dimensions:

y=xE[x]Var[x]+εγ+βy = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \varepsilon}} \cdot \gamma + \beta

where E[x]\mathrm{E}[x] and Var[x]\mathrm{Var}[x] are computed over the (N,H,W)(N, H, W) axes for each channel cc.

During training, batch statistics are used and running statistics are updated via an exponential moving average:

μ^(1m)μ^+mμbatch\hat{\mu} \leftarrow (1 - m)\,\hat{\mu} + m\,\mu_{\text{batch}}

During evaluation (model.eval()), the stored running statistics running_mean and running_var are used instead.

Parameters

num_featuresint
Number of channels CC.
epsfloat= 1e-05
Small constant added to the variance for numerical stability. Default: 1e-5.
momentumfloat or None= 0.1
Exponential moving average factor for running statistics. If None, uses cumulative moving average. Default: 0.1.
affinebool= True
If True, learns per-channel scale γ\gamma and shift β\beta. Default: True.
track_running_statsbool= True
If True, maintains running_mean, running_var, and num_batches_tracked. Default: True.
deviceDeviceLike= None
Device for parameters and buffers. Default: None.
dtypeDTypeLike= None
Data type for parameters and buffers. Default: None.

Attributes

weightParameter or None
Learnable scale γ\gamma of shape (num_features,). None when affine=False.
biasParameter or None
Learnable shift β\beta of shape (num_features,). None when affine=False.
running_meanTensor or None
Running estimate of the per-channel mean, shape (num_features,). None when track_running_stats=False.
running_varTensor or None
Running estimate of the per-channel variance, shape (num_features,). None when track_running_stats=False.
num_batches_trackedTensor or None
Scalar counting the number of batches seen during training. None when track_running_stats=False.

Notes

  • Input: (N,C,H,W)(N, C, H, W)
  • Output: (N,C,H,W)(N, C, H, W) — same shape.
  • BatchNorm2d is the most commonly used normalization layer in convolutional neural networks. It stabilises training by keeping activations in a well-scaled range after each convolutional block.
  • At small batch sizes (e.g. N<8N < 8) the batch statistics become noisy. Consider GroupNorm or InstanceNorm2d in those settings.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> bn = nn.BatchNorm2d(64)
>>> x = lucid.randn(8, 64, 32, 32)
>>> out = bn(x)   # normalised per channel
>>> out.shape
(8, 64, 32, 32)
>>> # Eval mode uses running statistics (no batch dependence)
>>> bn.eval()
>>> with lucid.no_grad():
...     out = bn(x)