class

BatchNorm3d

extends_BatchNormBase

BatchNorm3d(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++BatchNormNdBackwardclass

Batch normalization over a 5-D input (N, C, D, H, W).

Extends batch normalization to volumetric data by normalising each channel across the batch and all three spatial dimensions:

y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \varepsilon}} \cdot \gamma + \beta

where $\mathrm{E}[x]$ and $\mathrm{Var}[x]$ are computed over the $(N, D, H, W)$ axes for each channel $c$ .

The training/evaluation distinction is identical to BatchNorm2d — running statistics are updated during training and used as fixed normalisation constants during evaluation.

Parameters

num_featuresint

Number of channels

C

epsfloat= 1e-05

Small constant added to the variance for numerical stability. Default: 1e-5.

momentumfloat or None= 0.1

EMA factor for the running statistics. None selects cumulative averaging. Default: 0.1.

affinebool= True

If True, learns per-channel

\gamma

and

\beta

. Default: True.

track_running_statsbool= True

If True, maintains running_mean, running_var, and num_batches_tracked. Default: True.

deviceDeviceLike= None

Device for parameters and buffers. Default: None.

dtypeDTypeLike= None

Data type for parameters and buffers. Default: None.

Attributes

weightParameter or None

Learnable scale

\gamma

of shape (num_features,). None when affine=False.

biasParameter or None

Learnable shift

\beta

of shape (num_features,). None when affine=False.

running_meanTensor or None

Running per-channel mean, shape (num_features,). None when track_running_stats=False.

running_varTensor or None

Running per-channel variance, shape (num_features,). None when track_running_stats=False.

num_batches_trackedTensor or None

Scalar int64 counting training batches seen. None when track_running_stats=False.

Notes

Input: $(N, C, D, H, W)$
Output: $(N, C, D, H, W)$ — same shape.

Typical applications include 3-D convolutional networks for video understanding, medical image segmentation (CT/MRI), and any domain where the data has a depth axis in addition to height and width.
Because the $(N, D, H, W)$ reduction covers more elements than in BatchNorm2d, the variance estimate is generally more stable at the same batch size.

Examples

Normalizing activations from a 3-D convolution:
>>> import lucid
>>> import lucid.nn as nn
>>> bn3d = nn.BatchNorm3d(32)
>>> x = lucid.randn(4, 32, 16, 32, 32)   # (N, C, D, H, W)
>>> out = bn3d(x)
>>> out.shape
(4, 32, 16, 32, 32)
Disable affine parameters for a parameter-free normaliser:
>>> bn_no_affine = nn.BatchNorm3d(32, affine=False)
>>> out2 = bn_no_affine(x)
>>> out2.shape
(4, 32, 16, 32, 32)

Used by 2

Normalizing activations from a 3-D convolution: >>> import lucid >>> import lucid.nn as nn >>> bn3d = nn.BatchNorm3d(32) >>> x = lucid.randn(4, 32, 16, 32, 32) # (N, C, D, H, W) >>> out = bn3d(x) >>> out.shape (4, 32, 16, 32, 32) Disable affine parameters for a parameter-free normaliser: >>> bn_no_affine = nn.BatchNorm3d(32, affine=False) >>> out2 = bn_no_affine(x) >>> out2.shape (4, 32, 16, 32, 32)