The Gram-Schmidt Procedure

Section 3.2 The Gram-Schmidt Procedure

Given an nonzero vector \(\uu\) and a vector \(\vv\text{,}\) the projection of \(\vv\) onto \(\uu\) is given by

\begin{equation} \proj{\uu}{\vv} = \left(\frac{\vv\dotp\uu}{\len{\uu}^2}\right)\uu\text{.}\tag{3.2.1} \end{equation}

Note that this looks just like one of the terms in Fourier expansion theorem.

The motivation for the projection is as follows: Given the vectors \(\vv\) and \(\uu\text{,}\) we want to find vectors \(\ww\) and \(\zz\) with the following properties:

The vector \(\ww\) is parallel to the vector \(\uu\text{.}\)
🔗

🔗
The vectors \(\ww\) and \(\zz\) add to \(\vv\text{.}\)
🔗

🔗
The vectors \(\ww\) and \(\zz\) are orthogonal.
🔗

🔗

🔗

The diagram shows two given vectors u and v, along with vectors w and z. The vectors w is parallel to u, while the vector z is orthogonal to u, and the two vectors sum to the vector v. The vector w is the projection of v onto u. — Figure 3.2.1. Illustrating the concept of orthogonal projection.
🔗

Motivation for the construction comes from Physics, where one needs to be able to decompose a force vector into parts that are parallel and orthogonal to a given direction.

🔗

To derive the formula, we note that the vector \(\ww\) must be a scalar multiple of \(\uu\text{,}\) since it is parallel to \(\uu\text{,}\) so \(\ww = c\uu\) for some scalar \(c\text{.}\) Next, since \(\ww\text{,}\) \(\zz\text{,}\) and \(\vv\) form a right triangle,

Assuming that the angle \(\theta\) is acute. If it is obtuse, the scalar \(c\) is negative, but so is the dot product, so the signs work out.

we know that \(\len{\ww}=c\len{\uu}=\len{\vv}\cos(\theta)\text{.}\) But \(\cos(\theta)=\frac{\vv\boldsymbol{\cdot}\uu}{\len{\vv}\len{\uu}}\text{.}\) Plugging this in, and solving for \(c\text{,}\) we get the formula in (3.2.1).

🔗

Exercise 3.2.2.

On the left, pairs of vectors \(\uu,\vv\) are given, and on the right, pairs of vectors \(\ww,\zz\text{.}\) Match each pair on the left with the pair on the right such that \(\ww=\proj{\uu}{\vv}\text{,}\) and \(\zz=\vv-\ww\text{.}\)

🔗

You can solve this problem without actually computing any projections. Think about the relationships between the different vectors.

\(\uu = \langle 4,0,2\rangle\text{,}\) \(\vv =\langle 3,2,-1\rangle\)
\(\ww = \langle 2,0,1\rangle\text{,}\) \(\zz = \langle 1,2,-2\rangle\)
\(\uu = \langle 2,4,-2\rangle\text{,}\) \(\vv = \langle 2,1,1\rangle\)
\(\ww = \langle 1/2, 1, -1/2\rangle\text{,}\) \(\zz = \langle 3/2,0,3/2\rangle\)
\(\uu = \langle -1,2,1\rangle\text{,}\) \(\vv = \langle 5,-4,-5\rangle\)
\(\ww = \langle 3,-6,-3\rangle\text{,}\) \(\zz = \langle 2,2,-2\rangle\)

🔗

An important part of the projection construction is that the vector \(\zz=\vv-\proj{\uu}{\vv}\) is orthogonal to \(\uu\text{.}\) Our next result is a generalization of this observation.

🔗

Theorem 3.2.3. Orthogonal Lemma.

Let \(\{\vv_1,\vv_2,\ldots, \vv_m\}\) be an orthogonal set of vectors in \(\R^n\text{,}\) and let \(\xx\) be any vector in \(\R^n\text{.}\) Define the vector \(\vv_{m+1}\) by

\begin{equation*} \vv_{m+1} = \xx-\left(\frac{\xx\dotp\vv_1}{\len{\vv_1}^2}\vv_1+\cdots + \frac{\xx\dotp\vv_m}{\len{\vv_m}^2}\vv_m\right)\text{.} \end{equation*}

Then:

\(\vv_{m+1}\dotp \vv_i = 0\) for each \(i=1,\ldots, m\text{.}\)
🔗

🔗
If \(\xx\notin\spn\{\vv_1,\ldots, \vv_m\}\text{,}\) then \(\vv_{m+1}\neq \zer\text{,}\) and therefore, \(\{\vv_1,\ldots, \vv_m,\vv_{m+1}\}\) is an orthogonal set.
🔗

🔗

🔗

Strategy.

For the first part, try calculating the dot product, using the definition of \(\vv_{m+1}\text{.}\) Don’t forget that \(\vv_i\dotp\vv_j=0\) if \(i\neq j\text{,}\) since you are assuming you have an orthogonal set of vectors.

🔗

For the second part, what does the Fourier Expansion Theorem say?

🔗

Proof.

For any \(i=1,\ldots m\text{,}\) we have

\begin{equation*} \vv_{m+1}\dotp\vv_i = \xx\dotp\vv_i - \frac{\xx\dotp\vv_i}{\len{\vv_i}^2}(\vv_i\dotp\vv_i)=0\text{,} \end{equation*}

since \(\vv_i\dotp\vv_j = 0\) for \(i\neq j\text{.}\)

🔗

🔗
It follows from the Fourier expansion theorem that \(\vv_{m+1}=\zer\) if and only if \(\xx\in\spn\{\vv_1,\ldots, \vv_m\}\text{,}\) and the fact that \(\{\vv_1,\ldots, \vv_m,\vv_{m+1}\}\) is an orthogonal set then follows from the first part.
🔗

🔗

🔗

It follows from the Orthogonal Lemma that for any subspace \(U\subseteq \R^n\text{,}\) any set of orthogonal vectors in \(U\) can be extended to an orthogonal basis of \(U\text{.}\) Since any set containing a single nonzero vector is orthogonal, it follows that every subspace has an orthogonal basis. (If \(U=\{\zer\}\text{,}\) we consider the empty basis to be orthogonal.)

🔗

The procedure for creating an orthogonal basis is clear. Start with a single nonzero vector \(\xx_1\in U\text{,}\) which we’ll also call \(\vv_1\text{.}\) If \(U\neq \spn\{\vv_1\}\text{,}\) choose a vector \(\xx_2\in U\) with \(\xx_2\notin\spn\{\vv_1\}\text{.}\) The Orthogonal Lemma then provides us with a vector

\begin{equation*} \vv_2 = \xx_2-\frac{\xx_2\dotp\vv_1}{\len{\vv_1}^2}\vv_1 \end{equation*}

such that \(\{\vv_1,\vv_2\}\) is orthogonal. If \(U=\spn\{\vv_1,\vv_2\}\text{,}\) we’re done. Otherwise, we repeat the process, choosing \(\xx_3\notin\spn\{\vv_1,\vv_2\}\text{,}\) and then using the Orthogonal Lemma to obtain \(\vv_3\text{,}\) and so on, until an orthogonal basis is obtained.

🔗

With one minor modification, the above procedure provides us with a major result. Suppose \(U\) is a subspace of \(\R^n\text{,}\) and start with any basis \(\{\xx_1,\ldots, \xx_m\}\) of \(U\text{.}\) By choosing our \(\xx_i\) in the procedure above to be these basis vectors, we obtain the Gram-Schmidt algorithm for constructing an orthogonal basis.

🔗

Theorem 3.2.4. Gram-Schmidt Orthonormalization Algorithm.

Let \(U\) be a subspace of \(\R^n\text{,}\) and let \(\{\xx_1,\ldots, \xx_m\}\) be a basis of \(U\text{.}\) Define vectors \(\vv_1,\ldots, \vv_m\) in \(U\) as follows:

\begin{align*} \vv_1 \amp = \xx_1 \\ \vv_2 \amp = \xx_2 - \frac{\xx_2\dotp\vv_1}{\len{\vv_1}^2}\vv_1\\ \vv_3 \amp = \xx_3 - \frac{\xx_3\dotp\vv_1}{\len{\vv_1}^2}\vv_1-\frac{\xx_3\dotp\vv_2}{\len{\vv_2}^2}\vv_2\\ \vdots \amp \\ \vv_m \amp = \xx_m - \frac{\xx_m\dotp\vv_1}{\len{\vv_1}^2}\vv_1-\cdots - \frac{\xx_m\dotp\vv_{m-1}}{\len{\vv_{m-1}}^2}\vv_{m-1}\text{.} \end{align*}

Then \(\{\vv_1,\ldots, \vv_m\}\) is an orthogonal basis for \(U\text{.}\) Moreover, for each \(k=1,2,\ldots, m\text{,}\) we have

\begin{equation*} \spn\{\vv_1,\ldots, \vv_k\} = \spn\{\xx_1,\ldots, \xx_k\}\text{.} \end{equation*}

🔗

Of course, once we’ve used Gram-Schmidt to find an orthogonal basis, we can normalize each vector to get an orthonormal basis. The Gram-Schmidt algorithm is ideal when we know how to find a basis for a subspace, but we need to know an orthogonal basis. For example, suppose we want an orthonormal basis for the nullspace of the matrix

\begin{equation*} A = \bbm 2 \amp -1 \amp 3 \amp 0 \amp 5\\0 \amp 2 \amp -3 \amp 1 \amp 4\\ -4 \amp 2 \amp -6 \amp 0 \amp -10\\ 2 \amp 1 \amp 0 \amp 1 \amp 9\ebm\text{.} \end{equation*}

First, we find any basis for the nullspace.

🔗

Let’s make that basis look a little nicer by using some scalar multiplication to clear fractions.

\begin{equation*} B=\left\{\xx_1=\bbm 3\\-6\\-4\\0\\0\ebm, \xx_2=\bbm 1\\2\\0\\-4\\0\ebm, \xx_3=\bbm 7\\4\\0\\0\\-2\ebm\right\} \end{equation*}

This is definitely not an orthogonal basis. So we take \(\vv_1=\xx_1\text{,}\) and

\begin{align*} \vv_2 \amp = \xx_2-\left(\frac{\xx_2\dotp\vv_1}{\len{\vv_1}^2}\right)\vv_1\\ \amp = \bbm 1\\2\\0\\-4\\0\ebm -\frac{-9}{61}\bbm 3\\-6\\-4\\-0\\0\ebm \text{,} \end{align*}

which equals something we probably don’t want to try to simplify. Finally, we find

\begin{equation*} \vv_3 = \xx_3-\left(\frac{\xx_3\dotp \vv_1}{\len{\vv_1}^2}\right)\vv_1-\left(\frac{\xx_3\dotp\vv_2}{\len{\vv_2}^2}\right)\vv_2\text{.} \end{equation*}

And now we probably get about five minutes into the fractions and say something that shouldn’t appear in print. This sounds like a job for the computer.

🔗

What if we want our vectors normalized? Turns out the GramSchmidt function has an optional argument of true or false. The default is false, which is to not normalize. Setting it to true gives an orthonormal basis:

🔗

OK, so that’s nice, and fairly intimidating looking. Did it work? We can specify the vectors in our list by giving their positions, which are 0, 1, and 2, respectively.

🔗

Let’s compute dot products:

🔗

Let’s also confirm that these are indeed in the nullspace.

🔗

Boom. Let’s try another example. This time we’ll keep the vectors a little smaller in case you want to try it by hand.

🔗

Example 3.2.5.

Confirm that the set \(B=\{(1,-2,1), (3,0,-2), (-1,1,2)\}\) is a basis for \(\R^3\text{,}\) and use the Gram-Schmidt Orthonormalization Algorithm to find an orthonormal basis.

🔗

Solution.

First, note that we can actually jump right into the Gram-Schmidt procedure. If the set \(B\) is not a basis, then it won’t be independent, and when we attempt to construct the third vector in our orthonormal basis, its projection on the the subspace spanned by the first two will be the same as the original vector, and we’ll get zero when we subtract the two.

🔗

We let \(\xx_1=(1,-2,1), \xx_2=(3,0,-2), \xx_3=(-1,1,2)\text{,}\) and set \(\vv_1=\xx_1\text{.}\) Then we have

\begin{align*} \vv_2 \amp = \xx_2-\left(\frac{\xx_2\dotp \vv_1}{\len{\vv_1}^2}\right)\vv_1 \\ \amp = (3,0,-2)-\frac{1}{6}(1,-2,1)\\ \amp = \frac16(17,2,-13) \text{.} \end{align*}

🔗

Aside

Next, we compute \(\vv_3\text{.}\)

\begin{align*} \vv_3 \amp = \xx_3-\left(\frac{\xx_3\dotp \vv_1}{\len{\vv_1}^2}\right)\vv_1 - \left(\frac{\xx_3\dotp \vv_2}{\len{\vv_2}^2}\right)\vv_2\\ \amp = (-1,1,2)-\frac{-1}{6}(1,-2,1)- \frac{-41}{36}(17,2,-13)\\ \amp = \frac{1}{462}\bigl((-462,462,924)+(77,-154,77)+(697,82,-533)\bigr)\\ \amp = \frac{1}{462}(312,390,468) = \frac{1}{77}(52,65,78)\text{.} \end{align*}

🔗

We got it done! But doing this sort of thing by hand makes it possible that we made a calculation error somewhere. To check our work, we can turn to the computer.

🔗

Success! Full disclosure: there was indeed a mistake in the manual computation. Whether it was a typo or a miscalculation, the \(-13/6\) entry was originally written as \(-3/6\text{.}\) This led, as you might expect, to some very wrong answers for \(\vv_3\text{.}\)

🔗

Exercises Exercises

1.

Let

\begin{equation*} \mathbf{x}_1 = {\left[\begin{array}{c}-5\\3\\0\\-3\\0\\1\\\end{array}\right]}, \mathbf{x}_2 = {\left[\begin{array}{c}-3\\0\\2\\3\\0\\-1\\\end{array}\right]}, \text{ and } \mathbf{x}_3 = {\left[\begin{array}{c}-1\\0\\0\\0\\4\\-4\\\end{array}\right]}. \end{equation*}

🔗

Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.

🔗

2.

Let

\begin{equation*} \mathbf{x}_1 = {\left[\begin{array}{c}3\\0\\4\\0\\4\\-3\\\end{array}\right]}, \mathbf{x}_2 = {\left[\begin{array}{c}3\\-4\\2\\0\\4\\0\\\end{array}\right]}, \text{ and } \mathbf{x}_3 = {\left[\begin{array}{c}3\\0\\4\\-1\\1\\0\\\end{array}\right]}. \end{equation*}

🔗

Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.

🔗

3.

Let

\begin{equation*} \mathbf{x}_1 = {\left[\begin{array}{c}0\\0\\3\\11\\1\\0\\\end{array}\right]}, \mathbf{x}_2 = {\left[\begin{array}{c}1\\0\\2\\0\\2\\2\\\end{array}\right]}, \text{ and } \mathbf{x}_3 = {\left[\begin{array}{c}2\\14\\2\\0\\2\\0\\\end{array}\right]}. \end{equation*}

🔗

Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.

🔗

4.

Let \(\displaystyle{ A = {\left[\begin{array}{cccc} 3 \amp -1 \amp 0 \amp 1\cr 6 \amp 1 \amp -9 \amp 5\cr 3 \amp -2 \amp 3 \amp 0\cr 6 \amp -1 \amp -3 \amp 3 \end{array}\right]}. }\)

🔗

Find orthonormal bases of the kernel and image of \(A\text{.}\)

🔗

You have attempted of activities on this page.

🔗

Prev Top Next