nn.init.xavier_normal¶

lucid.nn.init.xavier_normal(tensor: Tensor, gain: int | float | complex = 1.0) → None¶

The xavier_normal function initializes the input tensor with values sampled from a normal distribution \(\mathcal{N}(0, \sigma^2)\), where the standard deviation \(\sigma\) is calculated to maintain a stable variance of activations across layers.

This initialization method is commonly used with activation functions like sigmoid and tanh.

Function Signature¶

def xavier_normal(tensor: Tensor, gain: _Scalar = 1.0) -> None

Parameters¶

tensor (Tensor): The tensor to be initialized. The shape of the tensor determines the fan-in and fan-out for the initialization.
gain (_Scalar, optional): An optional scaling factor applied to the computed standard deviation. Defaults to 1.0.

Returns¶

None: The function modifies the tensor in-place with new values sampled from the normal distribution.

Forward Calculation¶

The values in the tensor are sampled from a normal distribution \(\mathcal{N}(0, \sigma^2)\), where the standard deviation \(\sigma\) is calculated as:

\[\sigma = \text{gain} \cdot \sqrt{\frac{2}{\text{fan\_in} + \text{fan\_out}}}\]

Where:

\(\text{fan\_in}\) is the number of input units in the weight tensor.
\(\text{fan\_out}\) is the number of output units in the weight tensor.

Examples¶

Basic Xavier Normal Initialization

>>> import lucid
>>> from lucid.nn.init import xavier_normal
>>> tensor = lucid.zeros((3, 2))
>>> xavier_normal(tensor)
>>> print(tensor)
Tensor([[ 0.123, -0.234],
        [ 0.342, -0.678],
        [ 0.678,  0.123]], requires_grad=False)

Xavier Normal Initialization with Gain

>>> tensor = lucid.zeros((4, 4))
>>> xavier_normal(tensor, gain=2.0)
>>> print(tensor)
Tensor([[ 0.563, -0.342,  0.421, -0.678],
        [-0.321,  0.654, -0.276,  0.345],
        [ 0.876,  0.124, -0.563, -0.234],
        [ 0.543, -0.234,  0.657, -0.421]], requires_grad=False)

Note

Xavier initialization is best suited for layers with symmetric activation functions such as tanh or sigmoid.
For ReLU activations, consider using Kaiming Initialization instead for better performance.