Diagonalization of symmetric matrices

Section 4.2 Diagonalization of symmetric matrices

Recall that an

n \times n

matrix

A

is symmetric if

A^{T} = A .

Symmetry of

A

is equivalent to the following: for any vectors

x, y \in R^{n},

x \cdot (A y) = (A x) \cdot y .

🔗

To see that this is implied by the symmetry of

A,

note that

x \cdot (A y) = x^{T} (A y) = (x^{T} A^{T}) y = (A x)^{T} y = (A x) \cdot y .

🔗

Exercise 4.2.1.

🔗

Prove that if

x \cdot (A y) = (A x) \cdot y

for any

x, y \in R^{n},

then

A

is symmetric.

Hint.

If this condition is true for all

x, y \in R^{n},

then it is true in particular for the vectors in the standard basis for

R^{n} .

🔗

A useful property of symmetric matrices, mentioned earlier, is that eigenvectors corresponding to distinct eigenvalues are orthogonal.

🔗

Theorem 4.2.2.

🔗

A

is a symmetric matrix, then eigenvectors corresponding to distinct eigenvalues are orthogonal.

Strategy.

We want to show that if

x_{1}, x_{2}

are eigenvectors corresponding to distinct eigenvalues

λ_{1}, λ_{2},

then

x_{1} \cdot x_{2} = 0 .

It was pointed out above that since

A

is symmetric, we know

(A x_{1}) \cdot x_{2} = x_{1} \cdot (A x_{2}) .

Can you see how to use this, and the fact that

x_{1}, x_{2}

are eigenvectors, to prove the result?

Proof.

To see this, suppose

A

is symmetric, and that we have

A x_{1} = λ_{1} x_{1} and A x_{2} = λ_{2} x_{2},

with

x_{1} \neq 0, x_{2} \neq 0,

and

λ_{1} \neq λ_{2} .

We then have, since

A

is symmetric, and using the result above,

λ_{1} (x_{1} \cdot x_{2}) = (λ_{1} x_{1}) \cdot x_{2} = (A x_{1}) \cdot x_{2} = x_{1} \cdot (A x_{2}) = x_{1} (λ_{2} x_{2}) = λ_{2} (x_{1} \cdot x_{2}) .

It follows that

(λ_{1} - λ_{2}) (x_{1} \cdot x_{2}) = 0,

and since

λ_{1} \neq λ_{2},

we must have

x_{1} \cdot x_{2} = 0 .

🔗

The procedure for diagonalizing a matrix is as follows: assuming that

\dim E_{λ} (A)

is equal to the multiplicity of

λ

for each distinct eigenvalue

λ,

we find a basis for

E_{λ} (A) .

The union of the bases for each eigenspace is then a basis of eigenvectors for

R^{n},

and the matrix

P

whose columns are those eigenvectors will satisfy

P^{- 1} A P = D,

where

D

is a diagonal matrix whose diagonal entries are the eigenvalues of

A .

🔗

A

is symmetric, we know that eigenvectors from different eigenspaces will be orthogonal to each other. If we further choose an orthogonal basis of eigenvectors for each eigenspace (which is possible via the Gram-Schmidt procedure), then we can construct an orthogonal basis of eigenvectors for

R^{n} .

Furthermore, if we normalize each vector, then we’ll have an orthonormal basis. The matrix

P

whose columns consist of these orthonormal basis vectors has a name.

🔗

Definition 4.2.3.

🔗

A matrix

P

is called orthogonal if

P^{T} = P^{- 1} .

🔗

Theorem 4.2.4.

🔗

A matrix

P

is orthogonal if and only if the columns of

P

form an orthonormal basis for

R^{n} .

Strategy.

This more or less amounts to the fact that

P^{T} = P^{- 1}

if and only if

P P^{T} = I,

and thinking about the matrix product in terms of dot products.

🔗

A fun fact is that if the columns of

P

are orthonormal, then so are the rows. But this is not true if we ask for the columns to be merely orthogonal. For example, the columns of

A = [\begin{matrix} 1 & 0 & 5 \\ - 2 & 1 & 2 \\ 1 & 2 & - 1 \end{matrix}]

are orthogonal, but (as you can check) the rows are not. But if we normalize the columns, we get

P = [\begin{matrix} 1 / \sqrt{6} & 0 & 1 / \sqrt{30} \\ - 2 / \sqrt{6} & 1 / \sqrt{5} & 2 / \sqrt{30} \\ 1 / \sqrt{6} & 2 / \sqrt{5} & - 1 / \sqrt{30} \end{matrix}],

🔗

which, as you can confirm, is an orthogonal matrix.

🔗

Definition 4.2.5.

🔗

n \times n

matrix

A

is said to be orthogonally diagonalizable if there exists an orthogonal matrix

P

such that

P^{T} A P

is diagonal.

🔗

The above definition leads to the following result, also known as the Principal Axes Theorem. A careful proof is quite difficult, and omitted from this book. The hard part is showing that any symmetric matrix is orthogonally diagonalizable. There are a few ways to do this, most requiring induction on the size of the matrix. A common approach actually uses multivariable calculus! (Optimization via Lagrange multipliers, to be precise.) If you are reading this along with the book by Nicholson, there is a gap in his proof: in the induction step, he assumes the existence of a real eigenalue of

A,

but this has to be proved!

🔗

Theorem 4.2.6. Real Spectral Theorem.

🔗

The following are equivalent for a real

n \times n

matrix

A :

$A$ is symmetric.
There is an orthonormal basis for $R^{n}$ consisting of eigenvectors of $A .$
$A$ is orthogonally diagonalizable.

🔗

Example 4.2.7.

🔗

Determine the eigenvalues of

A = [\begin{matrix} 5 & - 2 & - 4 \\ - 2 & 8 & - 2 \\ - 4 & - 2 & 5 \end{matrix}],

and find an orthogonal matrix

P

such that

P^{T} A P

is diagonal.

Solution.

We’ll solve this problem with the help of the computer.


    
        
xxxxxxxxxx
 
1
from sympy import Matrix,init_printing,factor
2
init_printing()
3
A = Matrix(3,3,[5,-2,-4,-2,8,-2,-4,-2,5])
4
p=A.charpoly().as_expr()
5
factor(p)

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

We get

c_{A} (x) = x (x - 9)^{2},

so our eigenvalues are

0

and

9 .

For

0

we have

E_{0} (A) = null (A) :


    
        
xxxxxxxxxx
 
1
A.nullspace()

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

For

9

we have

E_{9} (A) = null (A - 9 I) .


    
        
xxxxxxxxxx
 
1
from sympy import eye
2
B=A-9*eye(3)
3
B.nullspace()

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

The approach above is useful as we’re trying to remind ourselves how eigenvalues and eigenvectors are defined and computed. Eventually we might want to be more efficient. Fortunately, there’s a command for that.


    
        
xxxxxxxxxx
 
1
A.eigenvects()

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

Note that the output above lists each eigenvalue, followed by its multiplicity, and then the associated eigenvectors.

This gives us a basis for

R^{3}

consisting of eigenvalues of

A,

but we want an orthogonal basis. Note that the eigenvector corresponding to

λ = 0

is orthogonal to both of the eigenvectors corresponding to

λ = 9 .

But these eigenvectors are not orthogonal to each other. To get an orthogonal basis for

E_{9} (A),

we apply the Gram-Schmidt algorithm.


    
        
xxxxxxxxxx
 
1
from sympy import GramSchmidt
2
L=B.nullspace()
3
GramSchmidt(L)

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

This gives us an orthogonal basis of eigenvectors. Scaling to clear fractions, we have

{[\begin{matrix} 2 \\ 1 \\ 2 \end{matrix}], [\begin{matrix} - 1 \\ 2 \\ 0 \end{matrix}], [\begin{matrix} - 4 \\ - 2 \\ 5 \end{matrix}]}

From here, we need to normalize each vector to get the matrix

P .

But we might not like that the last vector has norm

\sqrt{45} .

One option to consider is to apply Gram-Schmidt with the vectors in the other order.


    
        
xxxxxxxxxx
 
1
L=[Matrix(3,1,[-1,0,1]),Matrix(3,1,[-1,2,0])]
2
GramSchmidt(L)

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

That gives us the (slightly nicer) basis

{[\begin{matrix} 2 \\ 1 \\ 2 \end{matrix}], [\begin{matrix} - 1 \\ 0 \\ 1 \end{matrix}], [\begin{matrix} 1 \\ - 4 \\ 1 \end{matrix}]} .

The corresponding orthonormal basis is

B = {\frac{1}{3} [\begin{matrix} 2 \\ 1 \\ 2 \end{matrix}], \frac{1}{\sqrt{2}} [\begin{matrix} - 1 \\ 0 \\ 1 \end{matrix}], \frac{1}{\sqrt{18}} [\begin{matrix} 1 \\ - 4 \\ 1 \end{matrix}]} .

This gives us the matrix

P = [\begin{matrix} 2 / 3 & - 1 / \sqrt{2} & 1 / \sqrt{18} \\ 1 / 3 & 0 & - 4 / \sqrt{18} \\ 2 / 3 & 1 / \sqrt{2} & 1 / \sqrt{18} \end{matrix}] .

Let’s confirm that

P

is orthogonal.


    
        
xxxxxxxxxx
 
1
P=Matrix(3,3,[2/3, -1/sqrt(2),1/sqrt(18), 1/3,0,-4/sqrt(18),2/3,1/sqrt(2),1/sqrt(18)])
2
P,P*P.transpose()

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

Since

P P^{T} = I_{3},

we can conclude that

P^{T} = P^{- 1},

P

is orthogonal, as required. Finally, we diagonalize

A .


    
        
xxxxxxxxxx
 
1
Q=P.transpose()
2
Q*A*P

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

Incidentally, the SymPy library for Python does have a diagaonalization routine; however, it does not do orthogonal diagonalization by default. Here is what it provides for our matrix

A .


    
        
xxxxxxxxxx
 
1
A.diagonalize()

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages