lucid.linalg.pinv¶
The pinv function computes the Moore-Penrose pseudo-inverse of a matrix. Given any matrix A, it returns the pseudo-inverse A⁺, which generalizes the concept of the inverse to non-square or singular matrices.
Function Signature¶
def pinv(a: Tensor) -> Tensor
Parameters¶
a (Tensor): The input tensor, which can be any two-dimensional tensor (matrix), regardless of shape or rank.
Returns¶
Tensor: The pseudo-inverse of the input matrix a.
Forward Calculation¶
The forward calculation for pinv computes the Moore-Penrose pseudo-inverse using Singular Value Decomposition (SVD):
Compute the SVD of A:
\[\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\mathrm{T}\]where:
\(\mathbf{U}\) is an orthogonal matrix containing the left singular vectors.
\(\mathbf{\Sigma}\) is a diagonal matrix containing the singular values (non-negative real numbers).
\(\mathbf{V}\) is an orthogonal matrix containing the right singular vectors.
Compute the reciprocal of non-zero singular values:
\[\mathbf{\Sigma}^+ = \operatorname{diag}\left( \frac{1}{\sigma_i} \right)\]for each non-zero singular value σᵢ. Singular values close to zero can be regularized to avoid numerical instability.
Compute the pseudo-inverse:
\[\mathbf{A}^+ = \mathbf{V} \mathbf{\Sigma}^+ \mathbf{U}^\mathrm{T}\]
Backward Gradient Calculation¶
The gradient of the pseudo-inverse with respect to the input matrix A is computed using advanced matrix calculus involving the SVD components.
The derivative is given by:
In practical implementation, this involves:
Compute the SVD of A (from the forward pass).
Compute the necessary intermediate matrices:
\(\mathbf{S}^{-2}\): Diagonal matrix with elements \(1/\sigma_i^2\).
Orthogonal projectors:
\[\mathbf{P}_U = \mathbf{U} \mathbf{U}^\mathrm{T}, \quad \mathbf{P}_V = \mathbf{V} \mathbf{V}^\mathrm{T}\]
Compute the gradient:
\[\frac{\partial L}{\partial \mathbf{A}} = -\mathbf{A}^+ \frac{\partial L}{\partial \mathbf{A}^+} \mathbf{A}^+ + \mathbf{P}_V \frac{\partial L}{\partial \mathbf{A}^+}^\mathrm{T} \left( \mathbf{I} - \mathbf{A} \mathbf{A}^+ \right ) + \left( \mathbf{I} - \mathbf{A}^+ \mathbf{A} \right ) \frac{\partial L}{\partial \mathbf{A}^+}^\mathrm{T} \mathbf{P}_U\]where \(\frac{\partial L}{\partial \mathbf{A}^+}\) is the gradient of the loss function with respect to the pseudo-inverse A⁺.
Explanation:
The gradient involves several terms accounting for the non-square and potentially singular nature of A.
It ensures that during backpropagation, the gradients are correctly propagated through the pseudo-inverse operation.
Raises¶
Attention
LinAlgError: If the SVD computation does not converge during the calculation of the pseudo-inverse or its gradient.
ValueError: If the input tensor a is not a two-dimensional tensor.
Example¶
>>> import lucid
>>> a = lucid.Tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
>>> result = lucid.linalg.pinv(a)
>>> print(result)
Tensor([[-1.33333333, -0.33333333, 0.66666667],
[ 1.08333333, 0.33333333, -0.41666667]])
Note
The pseudo-inverse is particularly useful for solving least squares problems and systems of linear equations that do not have a unique solution.
The function supports backpropagation, allowing it to be used in optimization problems involving pseudo-inverses.
Numerical stability is maintained by handling singular values close to zero appropriately during the computation.
Additional Details¶
Singular Values Near Zero:
Small singular values can cause numerical instability due to division by very small numbers.
In practice, a threshold or regularization parameter may be used to avoid dividing by zero or extremely small values.
Applications:
The pseudo-inverse is widely used in machine learning, statistics, and engineering for solving ill-posed problems.
It is essential in computing solutions to linear systems that are underdetermined or overdetermined.
Performance Considerations:
Computing the SVD can be computationally intensive for large matrices.
For performance-critical applications, consider using approximations or specialized algorithms optimized for large-scale computations.