class

GroupNorm

extendsModule

GroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Implementing kernel

C++GroupNormBackwardclass

Group normalization over the channel dimension.

Divides the $C$ channels into num_groups contiguous groups of size $C / \text{num\_groups}$ and normalises each group independently over its spatial elements:

y = \frac{x - \mu_g}{\sqrt{\sigma_g^2 + \varepsilon}} \cdot \gamma + \beta

where $\mu_g$ and $\sigma_g^2$ are the mean and variance computed over a single group (channels + spatial axes) for each sample in the batch.

Group Norm sits between two extremes: num_groups=1 recovers Layer Norm (normalize over all channels at once), while num_groups=num_channels recovers Instance Norm (each channel is its own group). Unlike Batch Norm, Group Norm statistics are independent of the batch size, making it stable for small batches and well-suited to detection and segmentation models.

Parameters

num_groupsint

Number of groups to divide the channels into. num_channels must be divisible by num_groups.

num_channelsint

Total number of channels

C

expected in the input.

epsfloat= 1e-05

Small constant for numerical stability. Default: 1e-5.

affinebool= True

If True, learns per-channel scale

\gamma

and shift

\beta

of shape (num_channels,). Default: True.

deviceDeviceLike= None

Device for the learnable parameters. Default: None.

dtypeDTypeLike= None

Data type of the learnable parameters. Default: None.

Attributes

weightParameter or None

Learnable per-channel scale

\gamma

of shape (num_channels,), initialised to ones. None when affine=False.

biasParameter or None

Learnable per-channel shift

\beta

of shape (num_channels,), initialised to zeros. None when affine=False.

Notes

Input: $(N, C, *)$ where $*$ denotes zero or more spatial dimensions and $C = \text{num\_channels}$ .
Output: same shape as the input.

num_channels must be divisible by num_groups; a ValueError is raised at the functional level if this is violated.
Despite sharing a name with batch-norm affine parameters, the weight and bias here have shape (num_channels,) rather than being element-wise over the full normalized region.

Examples

32-channel input split into 8 groups:
>>> import lucid
>>> import lucid.nn as nn
>>> gn = nn.GroupNorm(num_groups=8, num_channels=32)
>>> x = lucid.randn(4, 32, 64, 64)
>>> out = gn(x)
>>> out.shape
(4, 32, 64, 64)
Layer-Norm equivalent (single group) on a 1-D sequence:
>>> gn_layer = nn.GroupNorm(num_groups=1, num_channels=128)
>>> x_seq = lucid.randn(16, 128, 200)   # (N, C, L)
>>> out_seq = gn_layer(x_seq)
>>> out_seq.shape
(16, 128, 200)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Initialise the GroupNorm module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x: Tensor)

source edit

Apply normalisation to the input tensor.

Parameters

inputTensor

Input tensor whose shape is documented in the class docstring.

Returns

Tensor

Normalised tensor of the same shape as input.

32-channel input split into 8 groups: >>> import lucid >>> import lucid.nn as nn >>> gn = nn.GroupNorm(num_groups=8, num_channels=32) >>> x = lucid.randn(4, 32, 64, 64) >>> out = gn(x) >>> out.shape (4, 32, 64, 64) Layer-Norm equivalent (single group) on a 1-D sequence: >>> gn_layer = nn.GroupNorm(num_groups=1, num_channels=128) >>> x_seq = lucid.randn(16, 128, 200) # (N, C, L) >>> out_seq = gn_layer(x_seq) >>> out_seq.shape (16, 128, 200)