class

GroupNorm

extendsModule
GroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Group normalization over the channel dimension.

Divides the CC channels into num_groups contiguous groups of size C/num_groupsC / \text{num\_groups} and normalises each group independently over its spatial elements:

y=xμgσg2+εγ+βy = \frac{x - \mu_g}{\sqrt{\sigma_g^2 + \varepsilon}} \cdot \gamma + \beta

where μg\mu_g and σg2\sigma_g^2 are the mean and variance computed over a single group (channels + spatial axes) for each sample in the batch.

Group Norm sits between two extremes: num_groups=1 recovers Layer Norm (normalize over all channels at once), while num_groups=num_channels recovers Instance Norm (each channel is its own group). Unlike Batch Norm, Group Norm statistics are independent of the batch size, making it stable for small batches and well-suited to detection and segmentation models.

Parameters

num_groupsint
Number of groups to divide the channels into. num_channels must be divisible by num_groups.
num_channelsint
Total number of channels CC expected in the input.
epsfloat= 1e-05
Small constant for numerical stability. Default: 1e-5.
affinebool= True
If True, learns per-channel scale γ\gamma and shift β\beta of shape (num_channels,). Default: True.
deviceDeviceLike= None
Device for the learnable parameters. Default: None.
dtypeDTypeLike= None
Data type of the learnable parameters. Default: None.

Attributes

weightParameter or None
Learnable per-channel scale γ\gamma of shape (num_channels,), initialised to ones. None when affine=False.
biasParameter or None
Learnable per-channel shift β\beta of shape (num_channels,), initialised to zeros. None when affine=False.

Notes

  • Input: (N,C,)(N, C, *) where * denotes zero or more spatial dimensions and C=num_channelsC = \text{num\_channels}.
  • Output: same shape as the input.
  • num_channels must be divisible by num_groups; a ValueError is raised at the functional level if this is violated.
  • Despite sharing a name with batch-norm affine parameters, the weight and bias here have shape (num_channels,) rather than being element-wise over the full normalized region.

Examples

32-channel input split into 8 groups:
>>> import lucid
>>> import lucid.nn as nn
>>> gn = nn.GroupNorm(num_groups=8, num_channels=32)
>>> x = lucid.randn(4, 32, 64, 64)
>>> out = gn(x)
>>> out.shape
(4, 32, 64, 64)
Layer-Norm equivalent (single group) on a 1-D sequence:
>>> gn_layer = nn.GroupNorm(num_groups=1, num_channels=128)
>>> x_seq = lucid.randn(16, 128, 200)   # (N, C, L)
>>> out_seq = gn_layer(x_seq)
>>> out_seq.shape
(16, 128, 200)

Methods (3)

dunder

__init__

None
__init__(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the GroupNorm module. See the class docstring for parameter semantics.

fn

forward

Tensor
forward(x: Tensor)
source

Apply normalisation to the input tensor.

Parameters

inputTensor
Input tensor whose shape is documented in the class docstring.

Returns

Tensor

Normalised tensor of the same shape as input.

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.