Skip to main content

Section 4.6 Matrix Factorizations and Eigenvalues

This section is a rather rapid tour of some cool ideas that get a lot of use in applied linear algebra. We are rather light on details here. The interested reader can consult sections 8.3–8.6 in the Nicholson textbook.

Subsection 4.6.1 Matrix Factorizations

Subsubsection 4.6.1.1 Positive Operators

Let T be a linear operator defined by a matrix A. If A is symmetric (for the case of Rn) or hermitian (for the case of Cn), we say that the operator T is self-adjoint.
Definition 4.6.1.
A self-adjoint operator T is positive if xHTx0 for all vectors x0. It is positive-definite if xHTx>0 for all nonzero x. If T=TA for some matrix A, we also refer to A as a positive(-definite) matrix.
The definition of a positive matrix is equivalent to requiring that all its eigenvalues are non-negative. Every positive matrix A has a unique positive square root: a matrix R such that R2=A. Since A is symmetric/hermitian, it can be diagonalized. Writing A=PDP1 where P is orthogonal/unitary and
D=[λ1000λ2000λn],
we have R=PEP1, where
D=[λ1000λ2000λn].
The following theorem gives us a simple way of generating positive matrices.
Proof.
For any x0 in Rn,
xTAx=xTUTUx=(Ux)T(Ux)=Ux20.
What is interesting is that the converse to the above statement is also true. The Cholesky factorization of a positive-definite matrix A is given by A=UTU, where U is upper-triangular, with positive diagonal entries.
Even better is that there is a very simple algorithm for obtaining the factorization: Carry the matrix A to triangular form, using only row operations of the type Ri+kRjRi. Then divide each row by the square root of the diagonal entry to obtain the matrix U.
The SymPy library contains the cholesky() algorithm. Note however that it produces a lower triangular matrix, rather than upper triangular. (That is, the output gives L=UT rather than U, so you will have A=LLT.) Let’s give it a try. First, let’s create a positive-definite matrix.
Next, find the Cholesky factorization:
Note that L is not the same as the matrix B!

Subsubsection 4.6.1.2 Singular Value Decomposition

For any n×n matrix A, the matrices ATA and AAT are both positive. (Exercise!) This means that we can define ATA, even if A itself is not symmetric or positive.
  • Since ATA is symmetric, we know that it can be diagonalized.
  • Since ATA is positive, we know its eigenvalues are non-negative.
  • This means we can define the singular values σi=λi for each i=1,,n.
  • Note: it’s possible to do this even if A is not a square matrix!
The SymPy library has a function for computing the singular values of a matrix. Given a matrix A, the command A.singular_values() will return its singular values. Try this for a few different matrices below:
In fact, SymPy can even return singular values for a matrix with variable entries! Try the following example from the SymPy documentation 1 .
For an n×n matrix A, we might not be able to diagonalize A (with a single orthonormal basis). However, it turns out that it’s always possible to find a pair of orthonormal bases {e1,,en},{f1,,fn} such that
Ax=σ1(xe1)f1++σn(xen)fn.
In matrix form, A=PΣAQT for orthogonal matrices P,Q.
In fact, this can be done even if A is not square, which is arguably the more interesting case! Let A be an m×n matrix. We will find an m×m orthogonal matrix P and n×n orthogonal matrix Q, such that A=PΣAQT, where ΣA is also m×n.
The basis {f1,,fn} is an orthonormal basis for ATA, and the matrix Q is the matrix whose columns are the vectors fi. As a result, Q is orthogonal.
The matrix ΣA is the same size as A. First, we list the positive singular values of A in decreasing order:
σ1σ2σk>0.
Then, we let DA=diag(σ1,,σk), and set
ΣA=[DA000].
That is, we put DA in the upper-left, and then fill in zeros as needed, until ΣA is the same size as A.
Next, we compute the vectors ei=1AfiAfi, for i=1,,k. As shown in Nicolson, {e1,,er} will be an orthonormal basis for the column space of A. The matrix P is constructed by extending this to an orthonormal basis of Rm.
All of this is a lot of work to do by hand, but it turns out that it can be done numerically, and more importantly, efficiently, by a computer. The SymPy library has an SVD algorithm, but it will not be efficient for larger matrices. In practice, most Python users will use the SVD algorithm provided by NumPy; we will stick with SymPy for simplicity and consistency.
Remark 4.6.3.
The version of the SVD given above is not used in computations, since it tends to be more resource intensive. In particular, it requires us to store more information than necessary: the last nr rows of Q, and the last mr columns of P, get multiplied by columns/rows of zeros in ΣA, so we don’t really need to keep track of these columns.
Instead, most algorithms that you find will give the r×r diagonal matrix DA, consisting of the nonzero singular values, and P will be replaced by the m×r matrix consisting of its first r columns, while Q gets replaced by the r×n matrix consisting of its first r rows. The resulting product is still equal to the original matrix.
In some cases, even the matrix DA is too large, and a decision is made to truncate to some smaller subset of singular values. In this case, the resulting product is no longer equal to the original matrix, but it does provide an approximation. A discussion can be found on Wikipedia 2 .
Example 4.6.4.
Find the singular value decomposition of the matrix A=[111101].
Solution.
Using SymPy, we get the condensed SVD 3 . First, let’s check the singular values.
Note that the values are not listed in decreasing order. Now, let’s ask for the singular value decomposition. The output consists of three matrices; the first line below assigns those matrices to the names P,S,Q.
Note that the output is the “condensed” version, which doesn’t match the exposition above. It also doesn’t follow the same ordering convention: we’ll need to swap columns in each of the matrices. But it does give us a decomposition of the matrix A:
To match our earlier presentation, we first set ΣA=[300020]. Next, we need to extend the 3×2 matrix in the output above to a 3×3 matrix. We can do this by choosing any vector orthogonal to the two existing columns, and normalizing. Let’s use entries 1/6,2/6,1/6. Noting that we also need to swap the first two columns (to match the fact that we swapped columns in ΣA), we get the matrix
Q=[33226633063332266].
Let’s check that it is indeed orthogonal.
Finally, we take P=[1001] (again swapping columns), which is just the identity matrix. We therefore should expect that
PΣAQT=ΣAQT=A.
Let’s check.
It worked!
The Singular Value Decomposition has a lot of useful appplications, some of which are described in Nicholson’s book. On a very fundamental level the SVD provides us with information on some of the most essential properties of the matrix A, and any system of equations with A as its coefficient matrix.
Recall the following definitions for an m×n matrix A:
  1. The rank of A is the number of leadning ones in the RREF of A, which is also equal to the dimension of the column space of A (or if you prefer, the dimension of im(TA)).
  2. The column space of A, denoted col(A), is the subspace of Rm spanned by the columns of A. (This is the image of the matrix transformation TA; it is also the space of all vectors b for which the system Ax=b is consistent.)
  3. The row space of A, denoted row(A), is the span of the rows of A, viewed as column vectors in Rn.
  4. The null space of A is the space of solutions to the homogeneous system Ax=0. This is, of course, equal the kernel of the associated transformation TA.
There are some interesting relationships among these spaces, which are left as an exercise.
Exercise 4.6.5.
Let A be an m×n matrix. Prove the following statements.
(a)
(row(A))=null(A)
Hint.
Note that vnull(A) if and only if Av=0, and v(row(A)) if and only if vri=0 for each row ri of A.
Note also that (Av)T=vTAT is the (dot) product of vT with each column of AT, and each column of AT is a row of A.
(b)
(col(A))=null(AT)
Hint.
Notice that vnull(AT) if and only if ATv=0, and that (ATv)T=vTA. Your reasoning should be similar to that of the previous part.
Here’s the cool thing about the SVD. Let σ1σ2σr>0 be the positive singular values of A. Let q1,,qr,,qn be the orthonormal basis of eigenvectors for ATA, and let p1,,pr,,pm be the orthonormal basis of Rm constructed in the SVD algorithm. Then:
  1. rank(A)=r
  2. q1,,qr form a basis for row(A).
  3. p1,,pr form a basis for col(A) (and thus, the “row rank” and “column rank” of A are the same).
  4. qr+1,,qn form a basis for null(A). (And these are therefore the basis solutions of Ax=0!)
  5. pr+1,,pm form a basis for null(AT).
If you want to explore this further, have a look at the excellent notebook by Dr. Juan H Klopper 4 . The ipynb file can be found on his GitHub page 5 . In it, he takes you through various approaches to finding the singular value decomposition, using the method above, as well as using NumPy and SciPy (which, for industrial applications, are superior to SymPy).

Subsubsection 4.6.1.3 QR Factorization

Suppose A is an m×n matrix with independent columns. (Question: for this to happen, which is true — mn, or nm?)
A QR-factorization of A is a factorization of the form A=QR, where Q is m×n, with orthonormal columns, and R is an invertible upper-triangular (n×n) matrix with positive diagonal entries. If A is a square matrix, Q will be orthogonal.
A lot of the methods we’re looking at here involve more sophisticated numerical techniques than SymPy is designed to handle. If we wanted to spend time on these topics, we’d have to learn a bit about the NumPy package, which has built in tools for finding things like polar decomposition and singular value decomposition. However, SymPy does know how to do QR factorization. After defining a matrix A, we can use the command
          Q, R = A.QRdecomposition()
        
Let’s check that the matrix Q really is orthogonal:
Details of how to perform the QR factorization can be found in Nicholson’s textbook. It’s essentially a consequence of performing the Gram-Schmidt algorithm on the columns of A, and keeping track of our work.
The calculation above is a symbolic computation, which is nice for understanding what’s going on. The reason why the QR factorization is useful in practice is that there are efficient numerical methods for doing it (with good control over rounding errors). Our next topic looks at a useful application of the QR factorization.

Subsection 4.6.2 Computing Eigenvalues

Our first method focuses on the dominant eigenvalue of a matrix. An eigenvalue is dominant if it is larger in absolute value than all other eigenvalues. For example, if A has eigenvalues 1,3,2,5, then 5 is the dominant eigenvalue.
If A has eigenvalues 1,3,0,4,4 then there is no dominant eigenvalue. Any eigenvector corresponding to a dominant eigenvalue is called a dominant eigenvector.

Subsubsection 4.6.2.1 The Power Method

If a matrix A has a dominant eigenvalue, there is a method for finding it (approximately) that does not involve finding and factoring the characteristic polynomial of A.
We start with some initial guess x0 for a dominant eigenvector. We then set xk+1=Axk for each k0, giving a sequence
x0,Ax0,A2x0,A3x0,.
We expect (for reasons we’ll explain) that xkx0 as k, where x is a dominant eigenvector. Let’s try an example.
The dominant eigenvalue is λ=7. Let’s try an initial guess of x0=[10] and see what happens.
We might want to confirm whether that rather large fraction is close to 23. To do so, we can get the computer to divide the numerator by the denominator.
The above might show you the fraction rather than its decimal approximation. (This may depend on whether you’re on Sage or Jupyter.) To get the decimal, try wrapping the above in float() (or N, or append with .evalf()).
For the eigenvalue, we note that if Ax=λx, then
xAxx2=x(λx)x2=λ.
This leads us to consider the Rayleigh quotients
rk=xkxk+1xk2.
We can convert a rational number r to a float using either N(r) or r.evalf(). (The latter seems to be the better bet when working with a list.)
So why does this work? Suppose A is diagonalizable, so that there is a basis of eigenvectors B={v1,,vn} for Rn. Suppose also that A has dominant eigenvalue λ1, with corresponding eigenvector v1. Then our initial guess x0 can be written in terms of this basis:
x0=c1v1+c2v2++cnvn,
for scalars c1,c2,,cn.
For each natural number k we have
Akx0=c1Akv1+c2Akv2++cnAkvnxk=c1λ1kv1+c2λ2kv2++cnλnkvn.
Dividing by λ1k we get
1λ1kxk=c1v1+c2(λ2λ1)kv2++cn(λnλ1)k,
and since λ1 is larger than all the other eigenvalues in absolute value, the powers (λmλ1)k will be very small when k is large.
If v1 is an eigenvector corresponding to the dominant eigenvalue λ1, then so is the scalar multiple c1λ1kv1, so xk is approximately equal to an eigenvector.
Two things are worth noting:
  1. The result does not depend on the initial guess x0, unless we choose an x0 for which the coefficient c1 is 0. If this turns out to be the case, we must choose another initial guess.
  2. If A is not diagonalizable, we can still use this method, but we must choose x0 from the subspace spanned by a set of eigenvectors for A.

Subsubsection 4.6.2.2 The QR Algorithm

Given an n×n matrix A, we know we can write A=QR, with Q orthogonal and R upper-triangular. The QR-algorithm exploits this fact. We set A1=A, and write A1=Q1R1.
Then we set A2=R1Q1, and factor: A2=Q2R2. Notice A2=R1Q1=Q1TA1Q1. Since A2 is similar to A1, A2 has the same eigenvalues as A1=A.
Next, set A3=R2Q2, and factor as A3=Q3R3. Since A3=Q2TA2Q2, A3 has the same eigenvalues as A2. In fact, A3=Q2T(Q1TAQ1)Q2=(Q1Q2)TA(Q1Q2).
After k steps we have Ak+1=(Q1Qk)TA(Q1Qk), which still has the same eigenvalues as A. By some sort of dark magic, this sequence of matrices converges to an upper triangular matrix with eigenvalues on the diagonal!
Consider the matrix A=[523040013]
Now we repeat the process:
Do this a few more times, and see what results! (If someone can come up with a way to code this as a loop, let me know!) The diagonal entries should get closer to 5,4,3, respectively, and the (3,2) entry should get closer to 0.

Exercises 4.6.3 Exercises

1.

Find the singular values σ1σ2 of
A=[9004].

2.

Find the singular values σ1σ2σ3 of
A=[802208].

3.

Find the QR factorization of the matrix [311212615].

4.

Find the QR factorization of the matrix [317642611].
You have attempted 1 of 7 activities on this page.