class

BatchNorm2d

extends_BatchNormBase

BatchNorm2d(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++BatchNormNdBackwardclass

Batch normalization over a 4-D input (N, C, H, W).

Normalises each channel across the batch and spatial dimensions:

y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \varepsilon}} \cdot \gamma + \beta

where $\mathrm{E}[x]$ and $\mathrm{Var}[x]$ are computed over the $(N, H, W)$ axes for each channel $c$ .

During training, batch statistics are used and running statistics are updated via an exponential moving average:

\hat{\mu} \leftarrow (1 - m)\,\hat{\mu} + m\,\mu_{\text{batch}}

During evaluation (model.eval()), the stored running statistics running_mean and running_var are used instead.

Parameters

num_featuresint

Number of channels

C

epsfloat= 1e-05

Small constant added to the variance for numerical stability. Default: 1e-5.

momentumfloat or None= 0.1

Exponential moving average factor for running statistics. If None, uses cumulative moving average. Default: 0.1.

affinebool= True

If True, learns per-channel scale

\gamma

and shift

\beta

. Default: True.

track_running_statsbool= True

If True, maintains running_mean, running_var, and num_batches_tracked. Default: True.

deviceDeviceLike= None

Device for parameters and buffers. Default: None.

dtypeDTypeLike= None

Data type for parameters and buffers. Default: None.

Attributes

weightParameter or None

Learnable scale

\gamma

of shape (num_features,). None when affine=False.

biasParameter or None

Learnable shift

\beta

of shape (num_features,). None when affine=False.

running_meanTensor or None

Running estimate of the per-channel mean, shape (num_features,). None when track_running_stats=False.

running_varTensor or None

Running estimate of the per-channel variance, shape (num_features,). None when track_running_stats=False.

num_batches_trackedTensor or None

Scalar counting the number of batches seen during training. None when track_running_stats=False.

Notes

Input: $(N, C, H, W)$
Output: $(N, C, H, W)$ — same shape.

BatchNorm2d is the most commonly used normalization layer in convolutional neural networks. It stabilises training by keeping activations in a well-scaled range after each convolutional block.
At small batch sizes (e.g. $N < 8$ ) the batch statistics become noisy. Consider GroupNorm or InstanceNorm2d in those settings.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> bn = nn.BatchNorm2d(64)
>>> x = lucid.randn(8, 64, 32, 32)
>>> out = bn(x)   # normalised per channel
>>> out.shape
(8, 64, 32, 32)
>>> # Eval mode uses running statistics (no batch dependence)
>>> bn.eval()
>>> with lucid.no_grad():
...     out = bn(x)

Used by 2

>>> import lucid >>> import lucid.nn as nn >>> bn = nn.BatchNorm2d(64) >>> x = lucid.randn(8, 64, 32, 32) >>> out = bn(x) # normalised per channel >>> out.shape (8, 64, 32, 32) >>> # Eval mode uses running statistics (no batch dependence) >>> bn.eval() >>> with lucid.no_grad(): ... out = bn(x)