Skip to main content

Section 4.4 Diagonalization of complex matrices

Recall that when we first defined vector spaces, we mentioned that a vector space can be defined over any field F. To keep things simple, we’ve mostly assumed F=R. But most of the theorems and proofs we’ve encountered go through unchanged if we work over a general field. (This is not quite true: over a finite field things can get more complicated. For example, if F=Z2={0,1}, then we get weird results like v+v=0, since 1+1=0.)
In fact, if we replace R by C, about the only thing we’d have to go back and change is the definition of the dot product. The reason for this is that although the complex numbers seem computationally more complicated, (which might mostly be because you don’t use them often enough) they follow the exact same algebraic rules as the real numbers. In other words, the arithmetic might be different, but the algebra is the same. There is one key difference between the two fields: over the complex numbers, every polynomial can be factored. This is important if you’re interested in finding eigenvalues.
This section is written based on the assumption that complex numbers were covered in a previous course. If this was not the case, or to review this material, see Appendix A before proceeding.

Subsection 4.4.1 Complex vectors

A complex vector space is simply a vector space where the scalars are elements of C rather than R. Examples include polynomials with complex coefficients, complex-valued functions, and Cn, which is defined exactly how you think it should be. In fact, one way to obtain Cn is to start with the exact same standard basis we use for Rn, and then take linear combinations using complex scalars.
We’ll write elements of Cn as z=(z1,z2,,zn). The complex conjugate of z is given by
z¯=(z1,z2,,zn).
The standard inner product on Cn looks a lot like the dot product on Rn, with one important difference: we apply a complex conjugate to the second vector.

Definition 4.4.1.

The standard inner product on Cn is defined as follows: given z=(z1,z2,,zn) and w=(w1,w2,,wn),
z,w=zw¯=z1w¯1+z2w¯2++znw¯n.
If z,w are real, this is just the usual dot product. The reason for using the complex conjugate is to ensure that we still have a positive-definite inner product on Cn:
z,z=z1z1+z2z2++znzn=|z1|2+|z2|2++|zn|2,
which shows that z,z0, and z,z=0 if and only if z=0.

Exercise 4.4.2.

Compute the dot product of z=(2i,3i,4+2i) and w=(3i,45i,2+2i).
This isn’t hard to do by hand, but it’s useful to know how to ask the computer to do it, too. Unfortunately, the dot product in SymPy does not include the complex conjugate. One likely reason for this is that while most mathematicians take the complex conjugate of the second vector, some mathematicians, and most physicists, put the conjugate on the first vector. So they may have decided to remain agnostic about this choice. We can manually apply the conjugate, using Z.dot(W.H). (The .H operation is the hermitian conjugate; see Definition 4.4.6 below.)
Again, you might want to wrap that last term in simplify() (in which case you’ll get 226i for the dot product). Above, we saw that the complex inner product is designed to be positive definite, like the real inner product. The remaining properties of the complex inner product are given as follows.

Proof.

  1. Using the distributive properties of matrix multiplication and the transpose,
    z1+z2,z3=(z1+z2)Tz3¯=(z1T+z2T)z3¯=z1Tz3¯+z2Tz3¯=z1,z3+z2,z3.
    The proof is similar when addition is in the second component. (But not identical -- you’ll need the fact that the complex conjugate is distributive, rather than the transpose.)
  2. These again follow from writing the inner product as a matrix product.
    αz1,z2=(αz1)Tz2¯=α(z1Tz2¯)=αz1,z2,
    and
    z1,αz2=z1Tαz2=z1T(α¯z2¯)=α¯(z1Tz2)=αz1,z2.
  3. Note that for any vectors z,w, zTw is a number, and therefore equal to its own transpose. Thus, we have zTw=(zTw)T=wTz, and
    z1,z2=z1Tz2¯=z2¯Tz1=z2Tz1=z2,z1.
  4. This was already demonstrated above.

Definition 4.4.4.

The norm of a vector z=(z1,z2,,zn) in Cn is given by
z=z,z=|z1|2+|z2|2++|zn|2.
Note that much like the real norm, the complex norm satisfies αz=|α|z for any (complex) scalar α.

Exercise 4.4.5.

    The norm of a complex vector is always a real number.
  • True.

  • Since the norm is computed using the modulus, which is always real and non-negative, the norm will be a real number as well. If you ever get a complex number for your norm, you’ve probably forgotten the complex conjugate somewhere.
  • False.

  • Since the norm is computed using the modulus, which is always real and non-negative, the norm will be a real number as well. If you ever get a complex number for your norm, you’ve probably forgotten the complex conjugate somewhere.

Subsection 4.4.2 Complex matrices

Linear transformations are defined in exactly the same way, and a complex matrix is simply a matrix whose entries are complex numbers. There are two important operations defined on complex matrices: the conjugate, and the conjugate transpose (also known as the hermitian transpose).

Definition 4.4.6.

The conjugate of a matrix A=[aij]Mmn(C) is the matrix A¯=[a¯ij]. The conjugate transpose of A is the matrix AH defined by
AH=(A¯)T=(AT).
Note that many textbooks use the notation A for the conjugate transpose.

Definition 4.4.7.

An n×n matrix AMnn(C) is called hermitian if AH=A, and unitary if AH=A1. (A matrix is skew-hermitian if AH=A.)
Hermitian and unitary matrices (or more accurately, linear operators) are very important in quantum mechanics. Indeed, hermitian matrices represent “observable” quantities, in part because their eigenvalues are real, as we’ll soon see. For us, hermitian and unitary matrices can simply be viewed as the complex counterparts of symmetric and orthogonal matrices, respectively. In fact, a real symmetric matrix is hermitian, since the conjugate has no effect on it, and similarly, a real orthogonal matrix is technically unitary. As with orthogonal matrices, a unitary matrix can also be characterized by the property that its rows and columns both form orthonormal bases.

Exercise 4.4.8.

Show that the matrix A=[41i2+3i1+i57i23i7i4] is hermitian, and that the matrix B=12[1+i21i2i] is unitary.
When using SymPy, the hermitian conjugate of a matrix A is executed using A.H. (There appears to also be an equivalent operation named Dagger coming from sympy.physics.quantum, but I’ve had more success with .H.) The complex unit is entered as I. So for the exercise above, we can do the following.
The last line verifies that A=AH. We could also replace it with A,A.H to explicitly see the two matrices side by side. Now, let’s confirm that B is unitary.
Hmm... That doesn’t look like the identity on the right. Maybe try replacing B*B.H with simplify(B*B.H). (You will want to add from sympy import simplify at the top of the cell.) Or you could try B.H, B**-1 to compare results. Actually, what’s interesting is that in a Sage cell, B.H == B**-1 yields False, but B.H == simplify(B**-1) yields True!
As mentioned above, hermitian matrices are the complex analogue of symmetric matrices. Recall that a key property of a symmetric matrix is its symmetry with respect to the dot product. For a symmetric matrix A, we had x(Ay)=(Ax)y. Hermtian matrices exhibit the same behaviour with respect to the complex inner product.

Proof.

Note that the property AH=A is equivalent to AT=A¯. This gives us
Az,w=(Az)Tw¯=(zTAT)w¯=(zTA¯)w¯=zT(Aw)=z,w.
Conversely, suppose Az,w=z,Aw for all z,wCn, and let {e1,e2,,en} denote the standard basis for Cn. Then
aji=Aei,ej=ei,Aej=aij,
which shows that AT=A¯.
Next, we’ve noted that one advantage of doing linear algebra over C is that every polynomial can be completely factored, including the characteristic polynomial. This means that we can always find eigenvalues for a matrix. When that matrix is hermitian, we get a surprising result.

Proof.

  1. Suppose Az=λz for some λC and z0. Then
    λz,z=λz,z=Az,z=z,Az=z,λz=λ¯z,z.
    Thus, (λλ¯)z2=0, and since z0, we must have λ¯=λ, which means λR.
  2. Similarly, suppose λ1,λ2 are eigenvalues of A, with corresponding eigenvectors z,w. Then
    λ1z,w=λ1z,w=Az,w=z,Aw=z,λ2w=λ2¯z,w.
    This gives us (λ1λ2¯)z,w=0. And since we already know λ2 must be real, and λ1λ2, we must have z,w=0.
In light of Theorem 4.4.10, we realize that diagonalization of hermitian matrices will follow the same script as for symmetric matrices. Indeed, Gram-Schmidt Orthonormalization Algorithm applies equally well in Cn, as long as we replace the dot product with the complex inner product. This suggests the following.

Exercise 4.4.12.

Confirm that the matrix A=[43i3+i1] is hermitian. Then, find the eigenvalues of A, and a unitary matrix U such that UHAU is diagonal.
To do the above exercise using SymPy, we first define A and ask for the eigenvectors.
We can now manually determine the matrix U, as we did above, and input it:
To confirm it’s unitary, add the line U*U.H to the above, and confirm that you get the identity matrix as output. You might need to use simplify(U*U.H) if the result is not clear. Now, to confirm that UHAU really is diagonal, go back to the cell above, and enter it. Try (U.H)*A*U, just to remind yourself that adding the simplify command is often a good idea.
If you want to cut down on the manual labour involved, we can make use of some of the other tools SymPy provides. In the next cell, we’re going to assign the output of A.eigenvects() to a list. The only trouble is that the output of the eigenvector command is a list of lists. Each list item is a list (eigenvalue, multiplicity, [eigenvectors]).
Try the above modifications, in sequence. First, replacing the second line by L[0] will give the first list item, which is another list:
(1,1,[[35+i5]]).
We want the third item in the list, so try (L[0])[2]. But note the extra set of brackets! There could (in theory) be more than one eigenvector, so this is a list with one item. To finally get the vector out, try ((L[0])[2])[0]. (There is probably a better way to do this. Someone who is more fluent in Python is welcome to advise.)
Now that we know how to extract the eigenvectors, we can normalize them, and join them to make a matrix. The norm of a vector is simnply v.norm(), and to join column vectors u1 and u2 to make a matrix, we can use the command u1.row_join(u2). We already defined the matrix A and list L above, but here is the whole routine in one cell, in case you didn’t run all the cells above.
Believe me, you want the simplify command on that last matrix.
While Theorem 4.4.11 guarantees that any hermitian matrix can be “unitarily diagonalized”, there are also non-hermitian matrices for which this can be done as well. A classic example of this is the rotation matrix [0110]. This is a real matrix with complex eigenvalues ±i, and while it is neither symmetric nor hermitian, it can be orthogonally diagonalized. This should be contrasted with the real spectral theorem, where any matrix that can be orthogonally diagonalized is necessarily symmetric.
This suggests that perhaps hermitian matrices are not quite the correct class of matrix for which the spectral theorem should be stated. Indeed, it turns out there is a somewhat more general class of matrix: the normal matrices.

Definition 4.4.13.

An n×n matrix A is normal if AHA=AAH.

Exercise 4.4.14.

    Select all matrices below that are normal.
  • [313i1+3i4]
  • This matrix is hermitian, and we know that every hermitian matrix is normal.
  • [1302]
  • This matrix is not normal; this can be confirmed by direct computation, or by noting that it cannot be diagonalized.
  • 12[11ii]
  • This matrix is unitary, and every unitary matrix is normal.
  • [i2i2i3i]
  • This matrix is neither hermitian nor unitary, but it is normal, which can be verified by direct computation.
It turns out that a matrix A is normal if and only if A=UDUH for some unitary matrix U and diagonal matrix D. A further generalization is known as Schur’s Theorem.
Using Schur’s Theorem, we can obtain a famous result, known as the Cayley-Hamilton Theorem, for the case of complex matrices. (It is true for real matrices as well, but we don’t yet have the tools to prove it.) The Cayley-Hamilton Theorem states that substituting any matrix into its characteristic polynomial results in the zero matrix. To understand this result, we should first explain how to define a polynomial of a matrix.
Given a polynomial p(x)=a0+a1x++anxn, we define p(A) as
p(A)=a0I+a1A++anAn.
(Note the presence of the identity matrix in the first term, since it does not make sense to add a scalar to a matrix.) Note further that since (P1AP)n=P1AnP for any invertible matrix P and positive integer n, we have p(UHAU)=UHp(A)U for any polynomial p and unitary matrix U.

Proof.

By Theorem 4.4.15, there exists a unitary matrix U such that A=UTUH, where T is upper triangular, and has the eigenvalues of A as diagonal entries. Since cA(A)=cA(UTUH)=UcA(T)UH, and cA(x)=cT(x) (since A and T are similar) it suffices to show that cA(A)=0 when A is upper-triangular. (If you like, we are showing that CT(T)=0, and deducing that cA(A)=0.) But if A is upper-triangular, so is xIA, and therefore, det(xIA) is just the product of the diagonal entries. That is,
cA(x)=(xλ1)(xλ2)(xλn),
so
cA(A)=(Aλ1I)(Aλ2I)(AλnI).
Since the first column of A is [λ100]T, the first column of Aλ1I is identically zero. The second column of Aλ2I similarly has the form [k00] for some number k.
It follows that the first two columns of (Aλ1I)(Aλ2I) are identically zero. Since only the first two entries in the third column of (Aλ3I) can be nonzero, we find that the first three columns of (Aλ1I)(Aλ2I)(Aλ3I) are zero, and so on.

Exercises 4.4.3 Exercises

1.

Suppose A is a 3×3 matrix with real entries that has a complex eigenvalue 62i with corresponding eigenvector [2+3i18i]. Find another eigenvalue and eigenvector for A.

2.

Give an example of a [`2 times 2 `] matrix with no real eigenvalues.

3.

Find all the eigenvalues (real and complex) of the matrix
M=[362521530].

4.

Find all the eigenvalues (real and complex) of the matrix
M=[2005053304210331].

5.

Let M=[3993]. Find formulas for the entries of Mn, where n is a positive integer. (Your formulas should not contain complex numbers.)

6.

Let
M=[5310355001].
Find formulas for the entries of Mn, where n is a positive integer. (Your formulas should not contain complex numbers.)

7.

Let M=[313331321]. Find c1, c2, and c3 such that M3+c1M2+c2M+c3I3=0, where I3 is the 3×3 identity matrix.
You have attempted 1 of 11 activities on this page.