In many applied scenarios, it is not practical (or even, perhaps, possible) to find an exact solution to a problem. Sometimes we may be working with imperfect data. Other times, we may be dealing with a problem that is
overdetermined,
1 such as a system of linear equations with more equations than variables. (Quite often, both of these issues may be present.)
An overdetermined system is quite likely to be inconsistent. That is, our problem requires finding a solution to a system where no such solution exists! When no solution is possible, we can ask whether it is instead possible to find a best approximation.
What would a best approximation look like in this case? Let denote the column space of which we know is a subspace of (assuming is an matrix). The subspace is precisely the set of all vectors such that has a solution. Among these vectors, we would like to find the one that is closest to in the sense that is as small as possible.
But we know exactly what this vector should be: the orthogonal projection of onto
The presentation in this worksheet is based on the one given in the text by Nicholson (see Section 5.6). Further details may be found there, or, for the more statistically inclined, on
Wikipedia 2 .
Given an inconsistent system we have two problems to solve:
Find the vector where
Find a vector such that
The vector is then our approximate solution.
This can be done directly, of course:
Find a basis for
Use this orthogonal basis to compute
Solve the system
But there is another way to proceed: we know that so for any vector Therefore,
for any vector
Therefore, or
Solving this system, called the normal equations for will yield our approximate solution.
To begin, let’s compare the two methods discussed above for finding an approximate solution. Consider the system of equations where
1.
Confirm that the system has no solution.
2.
Find an orthogonal basis for the column space of
3.
Compute the projection of onto the column space of
4.
5.
Solve the normal equations
(3.5.1) for
Next, we want to consider a problem found in many introductory science labs: finding a line of best fit. The situation is as follows: in some experiment, data points have been found. We would like to find a function such that for each the value of is as close as possible to
Note that we have only two parameters available to tune: and We assume that some reasoning or experimental evidence has led us to conclude that a linear fit is reasonable. The challenge here is to make precise what we mean by “as close as possible”. We have differences (sometimes called residuals) that we want to make small, by adjusting and But making one of the smaller might make another one larger!
A measure of the overall error in the fit of the line is given by the sum of squares
and this is the quantity that we want to minimize. (Hence the name, “least squares”.)
Let and note that Set and Then
where (Note that we are using to denote a different sort of vector than on the previous page.)
We can safely assume that an exact solution is impossible, so we search for an approximate one, with as small as possible. (Note that the magnitude of satisfies ) But a solution that makes as small as possible is exactly the sort of approximate solution that we just learned to calculate! Solving the normal equations for we find that
6.
Find the equation of the best fit line for the following set of data points:
7.
Suppose we were instead trying to find the best quadratic fit for a dataset. What would our parameters be? What would the matrix look like? Illustrate with an example of your own.