Pseudoinverse
Encyclopedia
In mathematics
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

, and in particular linear algebra
Linear algebra
Linear algebra is a branch of mathematics that studies vector spaces, also called linear spaces, along with linear functions that input one vector and output another. Such functions are called linear maps and can be represented by matrices if a basis is given. Thus matrix theory is often...

, a pseudoinverse of a matrix
Matrix (mathematics)
In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...

  is a generalization
Generalized inverse
In mathematics, a generalized inverse or pseudoinverse of a matrix A is a matrix that has some properties of the inverse matrix of A but not necessarily all of them...

 of the inverse matrix. The most widely known type of matrix pseudoinverse is the Moore–Penrose pseudoinverse, which was independently described by E. H. Moore
E. H. Moore
Eliakim Hastings Moore was an American mathematician.-Life:Moore, the son of a Methodist minister and grandson of US Congressman Eliakim H. Moore, discovered mathematics through a summer job at the Cincinnati Observatory while in high school. He learned mathematics at Yale University, where he was...

 in 1920, Arne Bjerhammar
Arne Bjerhammar
Arne Bjerhammar was a Swedish geodesist. He was professor at Royal Institute of Technology in Stockholm. He was born in Båstad, Scania in the south of Sweden.His research covered many fields of geodesy...

  in 1951 and Roger Penrose
Roger Penrose
Sir Roger Penrose OM FRS is an English mathematical physicist and Emeritus Rouse Ball Professor of Mathematics at the Mathematical Institute, University of Oxford and Emeritus Fellow of Wadham College...

 in 1955. Earlier, Fredholm
Erik Ivar Fredholm
Erik Ivar Fredholm was a Swedish mathematician who established the modern theory of integral equations. His 1903 paper in Acta Mathematica is considered to be one of the major landmarks in the establishment of operator theory.The lunar crater Fredholm is named after him.-List of publications:* E.I...

 had introduced the concept of a pseudoinverse of integral operators in 1903. When referred to a matrix, the term pseudoinverse, without further specification, is often used to indicate the Moore–Penrose pseudoinverse. The term generalized inverse
Generalized inverse
In mathematics, a generalized inverse or pseudoinverse of a matrix A is a matrix that has some properties of the inverse matrix of A but not necessarily all of them...

 is sometimes used as a synonym for pseudoinverse.

A common use of the Moore–Penrose pseudoinverse (hereafter, just pseudoinverse) is to compute a 'best fit' (least squares
Linear least squares
In statistics and mathematics, linear least squares is an approach to fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model...

) solution to a system of linear equations that lacks a unique solution (see below under Applications).
Another use is to find the minimum (Euclidean
Euclidean
Euclidean relates to Euclid , a town or others. It may refer to:Geometry...

) norm solution to a system of linear equations with multiple solutions.

The pseudoinverse is defined and unique for all matrices whose entries are real
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

 or complex
Complex number
A complex number is a number consisting of a real part and an imaginary part. Complex numbers extend the idea of the one-dimensional number line to the two-dimensional complex plane by using the number line for the real part and adding a vertical axis to plot the imaginary part...

 numbers. It can be computed using the singular value decomposition
Singular value decomposition
In linear algebra, the singular value decomposition is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics....

.

Notation

In the following discussion, the following conventions are adopted.
  • will denote one of the fields of real or complex numbers, denoted , respectively. The ring of matrices over is denoted by .
  • For , and denote the transpose and Hermitian transpose (also called conjugate transpose
    Conjugate transpose
    In mathematics, the conjugate transpose, Hermitian transpose, Hermitian conjugate, or adjoint matrix of an m-by-n matrix A with complex entries is the n-by-m matrix A* obtained from A by taking the transpose and then taking the complex conjugate of each entry...

    ) respectively. If , then .
  • For , then denotes the range
    Range (mathematics)
    In mathematics, the range of a function refers to either the codomain or the image of the function, depending upon usage. This ambiguity is illustrated by the function f that maps real numbers to real numbers with f = x^2. Some books say that range of this function is its codomain, the set of all...

     (image) of (the space spanned by the column vectors of ) and denotes the kernel (null space) of .
  • Finally, for any positive integer , denotes the identity matrix
    Identity matrix
    In linear algebra, the identity matrix or unit matrix of size n is the n×n square matrix with ones on the main diagonal and zeros elsewhere. It is denoted by In, or simply by I if the size is immaterial or can be trivially determined by the context...

    .

Definition

For ,
a Moore–Penrose pseudoinverse (hereafter, just pseudoinverse) of is defined as a matrix

satisfying all of the following four criteria:
  1.       ( need not be the general identity matrix, but it maps all column vectors of to themselves);
  2.       ( is a weak inverse for the multiplicative semigroup
    Semigroup
    In mathematics, a semigroup is an algebraic structure consisting of a set together with an associative binary operation. A semigroup generalizes a monoid in that there might not exist an identity element...

    );
  3.       ( is Hermitian); and
  4.       ( is also Hermitian).

Properties

Proofs for some of these facts may be found on a separate page here.

Existence and uniqueness

  • The Moore–Penrose pseudoinverse exists and is unique: for any matrix , there is precisely one matrix , that satisfies the four properties of the definition.


A matrix satisfying the first two conditions of the definition is known as a generalized inverse
Generalized inverse
In mathematics, a generalized inverse or pseudoinverse of a matrix A is a matrix that has some properties of the inverse matrix of A but not necessarily all of them...

. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.

Basic properties

  • If has real entries, then so does .
  • If is invertible, the pseudoinverse and the inverse coincide: .
  • The pseudoinverse of a zero matrix is its transpose.
  • The pseudoinverse of the pseudoinverse is the original matrix: .
  • Pseudoinversion commutes with transposition, conjugation, and taking the conjugate transpose:
  • The pseudoinverse of a scalar multiple of is the reciprocal multiple of :
for .

Identities

The following identities can be used to cancel certain subexpressions or expand expressions involving pseudoinverses. Proofs for these properties can be found in the proofs subpage.

Reduction to Hermitian case

  • .
  • .

Products

If and either,
  • has orthonormal columns (i.e. ) or,
  • has orthonormal rows (i.e. ) or,
  • has all columns linearly independent (full column rank) and has all rows linearly independent (full row rank),

then .

Projectors

If and are
orthogonal projection operators
Projection (linear algebra)
In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself such that P2 = P. It leaves its image unchanged....

 i.e. , , and i.e. they are Hermitian and idempotent.

Then the following hold:
  • and
  • is the orthogonal projector onto the range
    Range (mathematics)
    In mathematics, the range of a function refers to either the codomain or the image of the function, depending upon usage. This ambiguity is illustrated by the function f that maps real numbers to real numbers with f = x^2. Some books say that range of this function is its codomain, the set of all...

     of (which equals the orthogonal complement of the kernel of ).
  • is the orthogonal projector onto the range
    Range (mathematics)
    In mathematics, the range of a function refers to either the codomain or the image of the function, depending upon usage. This ambiguity is illustrated by the function f that maps real numbers to real numbers with f = x^2. Some books say that range of this function is its codomain, the set of all...

     of (which equals the orthogonal complement of the kernel of ).
  • is the orthogonal projector onto the kernel of .
  • is the orthogonal projector onto the kernel of .

Subspaces


Limit relations

  • The pseudoinverse can also be computed via limiting processes:. These limits exist even if or do not exist.

    Continuity

    • In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not continuous
      Continuous function
      In mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...

      : if the sequence converges to the matrix (in the maximum norm or Frobenius norm, say), then need not converge to . However, if all the matrices have the same rank, will converge to .

    Scalars

    It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar is zero if is zero and the reciprocal of otherwise:

    Vectors

    The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:

    Linearly independent columns

    If the columns of are linearly independent
    Linear independence
    In linear algebra, a family of vectors is linearly independent if none of them can be written as a linear combination of finitely many other vectors in the collection. A family of vectors which is not linearly independent is called linearly dependent...


    (so that ), then
    is invertible. In this case, an explicit formula is:.
    It follows that is then a left inverse of
    :   .

    Linearly independent rows

    If the rows of are linearly independent (so that ), then
    is invertible. In this case, an explicit formula is:.
    It follows that is a right inverse of
    :   .

    Orthonormal columns or rows

    This is a special case of either full column rank or full row rank (treated above).
    If has orthonormal columns ()
    or orthonormal rows (),
    then .

    Circulant matrices

    For a Circulant matrix
    Circulant matrix
    In linear algebra, a circulant matrix is a special kind of Toeplitz matrix where each row vector is rotated one element to the right relative to the preceding row vector. In numerical analysis, circulant matrices are important because they are diagonalized by a discrete Fourier transform, and hence...

     , the singular value decomposition is given by the Fourier transform
    Fourier transform
    In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...

    ,
    that is the singular values are the Fourier coefficients.
    Let be the Discrete Fourier Transform (DFT) matrix
    DFT matrix
    A DFT matrix is an expression of a discrete Fourier transform as a matrix multiplication.-Definition:An N-point DFT is expressed as an N-by-N matrix multiplication as X = W x, where x is the original input signal, and X is the DFT of the signal.The transformation W of size N\times N can be defined...

    , then

    Rank decomposition

    Let denote the rank of
    . Then can be (rank) decomposed
    Rank factorization
    Given an m \times n matrix A of rank r, a rank decomposition or rank factorization of A is a product A=CF, where C is an m \times r matrix and F is an r \times n matrix....

     as
    where
    and
    are of rank .
    Then .

    The QR method

    For or
    computing the product or and their inverses explicitly is often a source of numerical rounding errors and computational cost in practice.
    An alternative approach using the QR decomposition
    QR decomposition
    In linear algebra, a QR decomposition of a matrix is a decomposition of a matrix A into a product A=QR of an orthogonal matrix Q and an upper triangular matrix R...

     of may be used instead.

    Considering the case when is of full column rank, so that
    . then the Cholesky decomposition
    Cholesky decomposition
    In linear algebra, the Cholesky decomposition or Cholesky triangle is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. It was discovered by André-Louis Cholesky for real matrices...


    ,
    where is an upper triangular matrix, may be used.
    Multiplication by the inverse is then done easily by solving a system with multiple right-hand-sides,
    which may be solved by forward substitution followed by back substitution.

    The Cholesky decomposition may be computed without forming explicitly, by alternatively using the QR decomposition
    QR decomposition
    In linear algebra, a QR decomposition of a matrix is a decomposition of a matrix A into a product A=QR of an orthogonal matrix Q and an upper triangular matrix R...

     of ,
    where has orthonormal columns, , and
    is upper triangular. Then,
    so is the Cholesky factor of .

    The case of full row rank is treated similarly by using the formula
    and using a similar argument, swapping the roles of and
    .

    Singular value decomposition (SVD)

    A computationally simple and accurate way to compute the pseudoinverse is by using the singular value decomposition
    Singular value decomposition
    In linear algebra, the singular value decomposition is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics....

    . If is the singular value decomposition of , then . For a diagonal matrix
    Diagonal matrix
    In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero. The diagonal entries themselves may or may not be zero...

     such as , we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place, and transposing the resulting matrix. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the MATLAB
    MATLAB
    MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

     or NumPy function pinv, the tolerance is taken to be , where ε is the machine epsilon
    Machine epsilon
    Machine epsilon gives an upper bound on the relative error due to rounding in floating point arithmetic. This value characterizes computer arithmetic in the field of numerical analysis, and by extension in the subject of computational science...

    .

    The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix-matrix multiplication, even if a state-of-the art implementation (such as that of LAPACK
    LAPACK
    -External links:* : a modern replacement for PLAPACK and ScaLAPACK* on Netlib.org* * * : a modern replacement for LAPACK that is MultiGPU ready* on Sourceforge.net* * optimized LAPACK for Solaris OS on SPARC/x86/x64 and Linux* * *...

    ) is used.

    The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrix has a singular value 0 (a diagonal entry of the matrix above), then modifying slightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.

    Block matrices

    Optimized approaches
    Block matrix pseudoinverse
    In mathematics, block matrix pseudoinverse is a formula of pseudoinverse of a partitioned matrix. This is useful for decomposing or approximating many algorithms updating parameters in signal processing, which are based on least squares method.- Derivation :...

     exist for calculating the pseudoinverse of block structured matrices.

    The iterative method of Ben-Israel and Cohen

    Another method for computing the pseudoinverse uses the recursion
    which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of if it is started with an appropriate satisfying . The choice (where , with denoting the largest singular value of ) has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before enters the region of quadratic convergence. However, if started with already close to the Moore–Penrose pseudoinverse and , for example , convergence is fast (quadratic).

    Updating the pseudoinverse

    For the cases where has full row or column rank, and the inverse of the correlation matrix ( for with full row rank or for full column rank) is already known, the pseudoinverse for matrices related to can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.

    Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.

    Software libraries

    The package NumPy provides a pseudo-inverse calculation through its functions matrix.I and linalg.pinv. High quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK
    LAPACK
    -External links:* : a modern replacement for PLAPACK and ScaLAPACK* on Netlib.org* * * : a modern replacement for LAPACK that is MultiGPU ready* on Sourceforge.net* * optimized LAPACK for Solaris OS on SPARC/x86/x64 and Linux* * *...

    . Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing
    Parallel computing
    Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

     or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.

    Linear least-squares

    The pseudoinverse provides a least squares
    Linear least squares
    In statistics and mathematics, linear least squares is an approach to fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model...

     solution to a system of linear equations.
    For , given a system of linear equations,

    in general, a vector which solves the system may not exist, or if one exists, it may not be unique. The pseudoinverse solves the "least-squares" problem as follows:
    • , we have where and denotes the Euclidean norm. This weak inequality holds with equality if and only if for any vector w; this provides an infinitude of minimizing solutions unless A has full column rank, in which case is a zero matrix.


    This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by
    the Frobenius norm. Let .
    • , we have where and denotes the Frobenius norm.

    Obtaining all solutions of a linear system

    If the linear system


    has any solutions, they are all given by


    for arbitrary vector w. Solution(s) exist if and only if . If the latter holds, then the solution is unique if and only if A has full column rank, in which case is a zero matrix.

    Minimum-norm solution to a linear system

    For linear systems with non-unique solutions (such as under-determined systems), the pseudoinverse may be used to construct the solution of minimum Euclidean norm
    among all solutions.
    • If is satisfiable, the vector is a solution, and satisfies for all solutions.


    This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by
    the Frobenius norm. Let .
    • If is satisfiable, the matrix is a solution, and satisfies for all solutions.

    Geometric construction

    This description suggests the following geometric construction for the result of applying the pseudoinverse of an × matrix to a vector. To find for given in , first project orthogonally onto the range of , finding a point in the range. Then form , i.e. find those vectors in that sends to . This will be an affine subspace of parallel to the kernel of . The element of this subspace that has the smallest length (i.e. is closest to the origin) is the answer we are looking for. It can be found by taking an arbitrary member of and projecting it orthogonally onto the orthogonal complement of the kernel of .

    Condition number

    Using the pseudoinverse and a matrix norm, one can define a condition number
    Condition number
    In the field of numerical analysis, the condition number of a function with respect to an argument measures the asymptotically worst case of how much the function can change in proportion to small changes in the argument...

     for any matrix:
    A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of can lead to huge errors in the entries of the solution.

    Generalizations

    In order to solve more general least-squares problems, one could try to define Moore–Penrose pseudoinverses for all continuous linear operators between two Hilbert space
    Hilbert space
    The mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the two-dimensional Euclidean plane and three-dimensional space to spaces with any finite or infinite number of dimensions...

    s and , using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudo-inverse in this sense. Those that do are precisely the ones whose range is closed
    Closed set
    In geometry, topology, and related branches of mathematics, a closed set is a set whose complement is an open set. In a topological space, a closed set can be defined as a set which contains all its limit points...

     in .

    In abstract algebra
    Abstract algebra
    Abstract algebra is the subject area of mathematics that studies algebraic structures, such as groups, rings, fields, modules, vector spaces, and algebras...

    , a Moore–Penrose pseudoinverse may be defined on a *-regular semigroup. This abstract definition coincides with the one in linear algebra.

    See also

    • Proofs involving the Moore–Penrose pseudoinverse
    • Drazin inverse
      Drazin inverse
      In mathematics, the Drazin inverse, named after Michael P. Drazin, is a kind of generalized inverse of a matrix.Let A be a square matrix. The index of A is the least nonnegative integer k such that rank = rank...

    • Hat matrix
      Hat matrix
      In statistics, the hat matrix, H, maps the vector of observed values to the vector of fitted values. It describes the influence each observed value has on each fitted value...

    • Inverse element
      Inverse element
      In abstract algebra, the idea of an inverse element generalises the concept of a negation, in relation to addition, and a reciprocal, in relation to multiplication. The intuition is of an element that can 'undo' the effect of combination with another given element...

    • Linear least squares
      Linear least squares
      In statistics and mathematics, linear least squares is an approach to fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model...

    • Pseudo-determinant
      Pseudo-determinant
      In linear algebra and statistics, the pseudo-determinant is the product of all non-zero eigenvalues of a square matrix. It coincides with the regular determinant when the matrix is non-singular.- Definition :...

    • Von Neumann regular ring
      Von Neumann regular ring
      In mathematics, a von Neumann regular ring is a ring R such that for every a in R there exists an x in R withOne may think of x as a "weak inverse" of a...


    External links

    The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK