nn.functional.linear

lucid.nn.functional.linear(input_: Tensor, weight: Tensor, bias: Tensor | None = None) Tensor

The linear function applies a linear transformation to the incoming data: it multiplies the input tensor by a weight tensor and, if provided, adds a bias tensor. This operation is fundamental in neural networks, particularly in fully connected layers.

Function Signature

def linear(input_: Tensor, weight: Tensor, bias: Tensor | None = None) -> Tensor

Parameters

  • input_ (Tensor):

    The input tensor of shape (N, *, in_features), where * represents any number of additional dimensions.

  • weight (Tensor):

    The weight tensor of shape (out_features, in_features). Each row of the weight tensor represents the weights for one output feature.

  • bias (Tensor, optional):

    The bias tensor of shape (1, out_features). If None, no bias is added.

Returns

  • Tensor:

    A new Tensor containing the result of the linear transformation. The shape of the output tensor is (N, *, out_features). If either input_, weight, or bias requires gradients, the resulting tensor will also require gradients.

Forward Calculation

The forward calculation for the linear operation is:

\[\mathbf{out} = \mathbf{input\_} \cdot \mathbf{weight}^\top + \mathbf{bias}\]

Where:

  • For each input vector \(\mathbf{x}\) in input_:

    \[\mathbf{out} = \mathbf{x} \cdot \mathbf{W}^\top + \mathbf{b}\]

Backward Gradient Calculation

For tensors input_, weight, and bias involved in the linear operation, the gradients with respect to the output (out) are computed as follows:

Gradient with respect to \(\mathbf{input\_}\):

\[\frac{\partial \mathbf{out}}{\partial \mathbf{input\_}} = \mathbf{weight}\]

Gradient with respect to \(\mathbf{weight}\):

\[\frac{\partial \mathbf{out}}{\partial \mathbf{weight}} = \mathbf{input\_}^\top\]

Gradient with respect to \(\mathbf{bias}\):

\[\frac{\partial \mathbf{out}}{\partial \mathbf{bias}} = \mathbf{1}\]

Examples

Using linear for a simple linear transformation without bias:

>>> import lucid.nn.functional as F
>>> input_ = Tensor([[1.0, 2.0, 3.0]], requires_grad=True)  # Shape: (1, 3)
>>> weight = Tensor([[4.0, 5.0, 6.0],
                    [7.0, 8.0, 9.0]], requires_grad=True)  # Shape: (2, 3)
>>> out = F.linear(input_, weight)  # Shape: (1, 2)
>>> print(out)
Tensor([[32.0, 50.0]], grad=None)

Backpropagation computes gradients for both input_ and weight:

>>> out.backward()
>>> print(input_.grad)
[[4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]  # Corresponding to weight
>>> print(weight.grad)
[[1.0, 2.0, 3.0],
 [1.0, 2.0, 3.0]]  # Corresponding to input_

Using linear with bias for a batch of inputs:

>>> import lucid.nn.functional as F
>>> input_ = Tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)  # Shape: (2, 2)
>>> weight = Tensor([[5.0, 6.0], [7.0, 8.0]], requires_grad=True)  # Shape: (2, 2)
>>> bias = Tensor([[9.0, 10.0]], requires_grad=True).reshape(1, -1)  # Shape: (1, 2)
>>> out = F.linear(input_, weight, bias)  # Shape: (2, 2)
>>> print(out)
Tensor([[26.0, 33.0],
        [48.0, 75.0]], grad=None)

Backpropagation propagates gradients through the inputs, weights, and bias:

>>> out.backward()
>>> print(input_.grad)
[[5.0, 6.0],
 [7.0, 8.0]]  # Corresponding to weight
>>> print(weight.grad)
[[1.0, 2.0],
 [3.0, 4.0]]  # Corresponding to input_
>>> print(bias.grad)
[1.0, 1.0]  # Corresponding to ones