In many applied scenarios, it is not practical (or even, perhaps, possible) to find an exact solution to a problem. Sometimes we may be working with imperfect data. Other times, we may be dealing with a problem that is
overdetermined,
โ1โ such as a system of linear equations with more equations than variables. (Quite often, both of these issues may be present.)
An overdetermined system is quite likely to be inconsistent. That is, our problem requires finding a solution to a system
where no such solution exists! When no solution is possible, we can ask whether it is instead possible to find a
best approximation.
What would a best approximation look like in this case? Let
denote the column space of
which we know is a subspace of
(assuming
is an
matrix). The subspace
is precisely the set of all vectors
such that
has a solution. Among these vectors, we would like to find the one that is
closest to
in the sense that
is as small as possible.
But we know exactly what this vector
should be: the orthogonal projection of
onto
The presentation in this worksheet is based on the one given in the text by Nicholson (see Section 5.6). Further details may be found there, or, for the more statistically inclined, on
Wikipediaโ2โ.
Given an inconsistent system
we have two problems to solve:
Find the vector where
Find a vector such that
The vector
is then our approximate solution.
This can be done directly, of course:
Find a basis for
Use this orthogonal basis to compute
Solve the system
But there is another way to proceed: we know that
so for any vector
Therefore,
Solving this system, called the
normal equations for
will yield our approximate solution.
To begin, letโs compare the two methods discussed above for finding an approximate solution. Consider the system of equations
where
1.
Confirm that the system has no solution.
2.
Find an orthogonal basis for the column space of
3.
Compute the projection
of
onto the column space of
4.
5.
Solve the normal equations
(3.5.1) for
Next, we want to consider a problem found in many introductory science labs: finding a line of
best fit. The situation is as follows: in some experiment, data points
have been found. We would like to find a function
such that for each
the value of
is as close as possible to
Note that we have only two parameters available to tune:
and
We assume that some reasoning or experimental evidence has led us to conclude that a linear fit is reasonable. The challenge here is to make precise what we mean by โas close as possibleโ. We have
differences (sometimes called
residuals)
that we want to make small, by adjusting
and
But making one of the
smaller might make another one larger!
A measure of the overall error in the fit of the line is given by the sum of squares
and this is the quantity that we want to minimize. (Hence the name, โleast squaresโ.)
Let
and note that
Set
and
Then
where
(Note that we are using
to denote a different sort of vector than on the previous page.)
We can safely assume that an exact solution
is impossible, so we search for an approximate one, with
as small as possible. (Note that the magnitude of
satisfies
) But a solution
that makes
as small as possible is exactly the sort of approximate solution that we just learned to calculate! Solving the normal equations for
we find that
6.
Find the equation of the best fit line for the following set of data points:
7.
Suppose we were instead trying to find the best
quadratic fit for a dataset. What would our parameters be? What would the matrix
look like? Illustrate with an example of your own.