class

BatchNorm1d

extends_BatchNormBase

BatchNorm1d(num_features: int, eps: float = 1e-05, momentum: float | None = 0.1, affine: bool = True, track_running_stats: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++BatchNormNdBackwardclass

Batch normalization over a 2-D or 3-D input (N, C) or (N, C, L).

Normalises each channel across the batch (and, for 3-D inputs, the length) dimension:

y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \varepsilon}} \cdot \gamma + \beta

For a 3-D input $(N, C, L)$ , the statistics $\mathrm{E}[x]$ and $\mathrm{Var}[x]$ are computed over the $(N, L)$ axes for each channel $c$ . For a 2-D input $(N, C)$ only the batch axis $N$ is reduced.

During training, batch statistics are used and running statistics are updated via an exponential moving average:

\hat{\mu} \leftarrow (1 - m)\,\hat{\mu} + m\,\mu_{\text{batch}}

During evaluation (model.eval()), the stored running_mean and running_var are used instead, making inference independent of batch composition.

Parameters

num_featuresint

Number of channels

C

epsfloat= 1e-05

Small constant added to the variance for numerical stability. Default: 1e-5.

momentumfloat or None= 0.1

Exponential moving average factor for running statistics. None uses a cumulative moving average. Default: 0.1.

affinebool= True

If True, learns per-channel scale

\gamma

and shift

\beta

. Default: True.

track_running_statsbool= True

If True, maintains running_mean, running_var, and num_batches_tracked. Default: True.

deviceDeviceLike= None

Device for parameters and buffers. Default: None.

dtypeDTypeLike= None

Data type for parameters and buffers. Default: None.

Attributes

weightParameter or None

Learnable scale

\gamma

of shape (num_features,). None when affine=False.

biasParameter or None

Learnable shift

\beta

of shape (num_features,). None when affine=False.

running_meanTensor or None

Running per-channel mean, shape (num_features,). None when track_running_stats=False.

running_varTensor or None

Running per-channel variance, shape (num_features,). None when track_running_stats=False.

num_batches_trackedTensor or None

Scalar int64 counting batches seen during training. None when track_running_stats=False.

Notes

Input: $(N, C)$ or $(N, C, L)$
Output: same shape as the input.

Examples

2-D input (e.g. a linear layer's activations):
>>> import lucid
>>> import lucid.nn as nn
>>> bn = nn.BatchNorm1d(128)
>>> x = lucid.randn(32, 128)
>>> out = bn(x)
>>> out.shape
(32, 128)
3-D input — temporal sequence with channels:
>>> bn_seq = nn.BatchNorm1d(64)
>>> x_seq = lucid.randn(16, 64, 200)   # (N, C, L)
>>> out_seq = bn_seq(x_seq)
>>> out_seq.shape
(16, 64, 200)

Used by 2

2-D input (e.g. a linear layer's activations): >>> import lucid >>> import lucid.nn as nn >>> bn = nn.BatchNorm1d(128) >>> x = lucid.randn(32, 128) >>> out = bn(x) >>> out.shape (32, 128) 3-D input — temporal sequence with channels: >>> bn_seq = nn.BatchNorm1d(64) >>> x_seq = lucid.randn(16, 64, 200) # (N, C, L) >>> out_seq = bn_seq(x_seq) >>> out_seq.shape (16, 64, 200)