group_norm

→Tensor

group_norm(x: Tensor, num_groups: int, weight: Tensor | None = None, bias: Tensor | None = None, eps: float = 1e-05)

source edit

Implementing kernel

Group normalization (Wu & He, 2018).

Splits the channel dimension into num_groups contiguous groups and normalises each (sample, group) slice independently across its channels and spatial axes. Combines the spatial reduction of BatchNorm with the per-sample stability of LayerNorm — performance is therefore largely independent of batch size, which makes it the go-to choice for detection / segmentation models trained with very small batches.

Parameters

xTensor

Input of shape (N, C, *spatial) where C must be divisible by num_groups.

num_groupsint

Number of channel groups. Two limiting cases: num_groups == C reduces to InstanceNorm; num_groups == 1 reduces to LayerNorm over channels + spatial axes.

weightTensor= None

Per-channel scale

\gamma

of shape (C,).

biasTensor= None

Per-channel shift

\beta

of shape (C,).

epsfloat= 1e-05

Numerical safety added inside the square root.

Returns

Tensor

Same shape as x.

Notes

Math (let $G_g$ be the channel set of group $g$ and $S$ the spatial axes):

\begin{aligned} \mu_{n,g} &= \frac{1}{|G_g||S|} \sum_{c \in G_g} \sum_{s \in S} x_{n,c,s} \\ \sigma^2_{n,g} &= \frac{1}{|G_g||S|} \sum_{c \in G_g} \sum_{s \in S} (x_{n,c,s} - \mu_{n,g})^2 \\ y_{n,c,s} &= \gamma_c \cdot \frac{x_{n,c,s} - \mu_{n,g(c)}}{\sqrt{\sigma^2_{n,g(c)} + \epsilon}} + \beta_c \end{aligned}

Independence from batch size avoids the train/eval mismatch that BatchNorm requires running buffers to fix.

Examples

>>> import lucid
>>> from lucid.nn.functional import group_norm
>>> x = lucid.randn(2, 32, 16, 16)
>>> y = group_norm(x, num_groups=8)
>>> y.shape
(2, 32, 16, 16)

Used by 2

group_norm

→Tensor

group_norm(x: Tensor, num_groups: int, weight: Tensor | None = None, bias: Tensor | None = None, eps: float = 1e-05)

source edit

Implementing kernel

Group normalization (Wu & He, 2018).

Parameters

xTensor

Input of shape (N, C, *spatial) where C must be divisible by num_groups.

num_groupsint

Number of channel groups. Two limiting cases: num_groups == C reduces to InstanceNorm; num_groups == 1 reduces to LayerNorm over channels + spatial axes.

weightTensor= None

Per-channel scale

\gamma

of shape (C,).

biasTensor= None

Per-channel shift

\beta

of shape (C,).

epsfloat= 1e-05

Numerical safety added inside the square root.

Returns

Tensor

Same shape as x.

Notes

Math (let $G_g$ be the channel set of group $g$ and $S$ the spatial axes):

\begin{aligned} \mu_{n,g} &= \frac{1}{|G_g||S|} \sum_{c \in G_g} \sum_{s \in S} x_{n,c,s} \\ \sigma^2_{n,g} &= \frac{1}{|G_g||S|} \sum_{c \in G_g} \sum_{s \in S} (x_{n,c,s} - \mu_{n,g})^2 \\ y_{n,c,s} &= \gamma_c \cdot \frac{x_{n,c,s} - \mu_{n,g(c)}}{\sqrt{\sigma^2_{n,g(c)} + \epsilon}} + \beta_c \end{aligned}

Independence from batch size avoids the train/eval mismatch that BatchNorm requires running buffers to fix.

Examples

>>> import lucid
>>> from lucid.nn.functional import group_norm
>>> x = lucid.randn(2, 32, 16, 16)
>>> y = group_norm(x, num_groups=8)
>>> y.shape
(2, 32, 16, 16)