class

BatchNorm1d

extends_BatchNormBase
BatchNorm1d(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Batch normalization over a 2-D or 3-D input (N, C) or (N, C, L).

Normalises each channel across the batch (and, for 3-D inputs, the length) dimension:

y=xE[x]Var[x]+εγ+βy = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \varepsilon}} \cdot \gamma + \beta

For a 3-D input (N,C,L)(N, C, L), the statistics E[x]\mathrm{E}[x] and Var[x]\mathrm{Var}[x] are computed over the (N,L)(N, L) axes for each channel cc. For a 2-D input (N,C)(N, C) only the batch axis NN is reduced.

During training, batch statistics are used and running statistics are updated via an exponential moving average:

μ^(1m)μ^+mμbatch\hat{\mu} \leftarrow (1 - m)\,\hat{\mu} + m\,\mu_{\text{batch}}

During evaluation (model.eval()), the stored running_mean and running_var are used instead, making inference independent of batch composition.

Parameters

num_featuresint
Number of channels CC.
epsfloat= 1e-05
Small constant added to the variance for numerical stability. Default: 1e-5.
momentumfloat or None= 0.1
Exponential moving average factor for running statistics. None uses a cumulative moving average. Default: 0.1.
affinebool= True
If True, learns per-channel scale γ\gamma and shift β\beta. Default: True.
track_running_statsbool= True
If True, maintains running_mean, running_var, and num_batches_tracked. Default: True.
deviceDeviceLike= None
Device for parameters and buffers. Default: None.
dtypeDTypeLike= None
Data type for parameters and buffers. Default: None.

Attributes

weightParameter or None
Learnable scale γ\gamma of shape (num_features,). None when affine=False.
biasParameter or None
Learnable shift β\beta of shape (num_features,). None when affine=False.
running_meanTensor or None
Running per-channel mean, shape (num_features,). None when track_running_stats=False.
running_varTensor or None
Running per-channel variance, shape (num_features,). None when track_running_stats=False.
num_batches_trackedTensor or None
Scalar int64 counting batches seen during training. None when track_running_stats=False.

Notes

  • Input: (N,C)(N, C) or (N,C,L)(N, C, L)
  • Output: same shape as the input.

Examples

2-D input (e.g. a linear layer's activations):
>>> import lucid
>>> import lucid.nn as nn
>>> bn = nn.BatchNorm1d(128)
>>> x = lucid.randn(32, 128)
>>> out = bn(x)
>>> out.shape
(32, 128)
3-D input — temporal sequence with channels:
>>> bn_seq = nn.BatchNorm1d(64)
>>> x_seq = lucid.randn(16, 64, 200)   # (N, C, L)
>>> out_seq = bn_seq(x_seq)
>>> out_seq.shape
(16, 64, 200)