Quadratic forms

Section 4.3 Quadratic forms

If you’ve done a couple of calculus courses, you’ve probably encountered conic sections, like the ellipse

\frac{x^{2}}{a^{2}} + \frac{y^{2}}{b^{2}} = 1

or the parabola

\frac{y}{b} = \frac{x^{2}}{a^{2}} .

You might also recall that your instructor was careful to avoid conic sections with equations including “cross-terms” like

x y .

The reason for this is that sketching a conic section like

x^{2} + 4 x y + y^{2} = 1

requires the techniques of the previous section.

🔗

A basic fact about orthogonal matrices is that they preserve length. Indeed, for any vector

x

R^{n}

and any orthogonal matrix

P,

‖ P x ‖^{2} = (P x) \cdot (P x) = (P x)^{T} (P x) = (x^{T} P^{T}) (P x) = x^{T} x = ‖ x ‖^{2},

🔗

since

P^{T} P = I_{n} .

🔗

Note also that since

P^{T} P = I_{n}

and

det P^{T} = det P,

we have

det (P)^{2} = det (P^{T} P) = det (I_{n}) = 1,

🔗

det (P) = \pm 1 .

det P = 1,

we have what is called a special orthogonal matrix. In

R^{2}

R^{3},

multiplication by a special orthogonal matrix is simply a rotation. (If

det P = - 1,

there is also a reflection.)

🔗

We mentioned in the previous section that the Real Spectral Theorem is also referred to as the principal axes theorem. The name comes from the fact that one way to interpret the orthogonal diagonalization of a symmetric matrix is that we are rotating our coordinate system. The original coordinate axes are rotated to new coordinate axes, with respect to which the matrix

A

is diagonal. This will become more clear once we apply these ideas to the problem of conic sections mentioned above. First, a definition.

🔗

Definition 4.3.1.

🔗

A quadratic form on variables

x_{1}, x_{2}, \dots, x_{n}

is any expression of the form

q (x_{1}, \dots, x_{n}) = \sum_{i \leq j} a_{i j} x_{i} x_{j} .

🔗

For example,

q_{1} (x, y) = 4 x^{2} - 4 x y + 4 y^{2}

and

q_{2} (x, y, z) = 9 x^{2} - 4 y^{2} - 4 x y - 2 x z + z^{2}

are quadratic forms. Note that each term in a quadratic form is of degree two. We omit linear terms, since these can be absorbed by completing the square. The important observation is that every quadratic form can be associated to a symmetric matrix. The diagonal entries are the coefficients

a_{i i}

appearing in Definition 4.3.1, while the off-diagonal entries are half the corresponding coefficients

a_{i j} .

🔗

For example the two quadratic forms given above have the following associated matrices:

A_{1} = [\begin{matrix} 4 & - 2 \\ - 2 & 4 \end{matrix}] and A_{2} = [\begin{matrix} 9 & - 2 & - 1 \\ - 2 & 4 & 0 \\ - 1 & 0 & 1 \end{matrix}] .

🔗

The reason for this is that we can then write

q_{1} (x, y) = [\begin{matrix} x & y \end{matrix}] [\begin{matrix} 4 & - 1 \\ - 1 & 1 \end{matrix}] [\begin{matrix} x \\ y \end{matrix}]

🔗

and

q_{2} (x, y, z) = [\begin{matrix} x & y & z \end{matrix}] [\begin{matrix} 9 & - 2 & - 1 \\ - 2 & 4 & 0 \\ - 1 & 0 & 1 \end{matrix}] [\begin{matrix} x \\ y \\ z \end{matrix}] .

🔗

Of course, the reason for wanting to associate a symmetric matrix to a quadratic form is that it can be orthogonally diagonalized. Consider the matrix

A_{1} .


    
        
xxxxxxxxxx
 
1
from sympy import Matrix, init_printing, factor
2
init_printing()
3
A1 = Matrix(2,2,[4,-2,-2,4])
4
p = A1.charpoly().as_expr()
5
factor(p)

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

🔗

We find distinct eigenvalues

λ_{1} = 2

and

λ_{2} = 6 .

Since

A

is symmetric, we know the corresponding eigenvectors will be orthogonal.


    
        
xxxxxxxxxx
 
1
A1.eigenvects()

    
    
    
    
        
            
                Language:
                
            
        
    
    




    
    
        
        Messages

🔗

The resulting orthogonal matrix is

P = \frac{1}{\sqrt{2}} [\begin{matrix} 1 & - 1 \\ 1 & 1 \end{matrix}],

and we find

P^{T} A P = [\begin{matrix} 2 & 0 \\ 0 & 6 \end{matrix}], or A = P D P^{T},

🔗

where

D = [\begin{matrix} 2 & 0 \\ 0 & 6 \end{matrix}] .

If we define new variables

y_{1}, y_{2}

[\begin{matrix} y_{1} \\ y_{2} \end{matrix}] = P^{T} [\begin{matrix} x_{1} \\ x_{2} \end{matrix}],

🔗

then we find that

\begin{aligned} [\begin{array}{c} x_{1} & x_{2} \end{array}] A [\begin{array}{c} x_{1} \\ x_{2} \end{array}] & = ([\begin{array}{c} x_{1} & x_{2} \end{array}] P) D (P^{T} [\begin{array}{c} x_{1} \\ x_{2} \end{array}]) \\ = [\begin{array}{c} y_{1} & y_{2} \end{array}] [\begin{array}{c} 2 & 0 \\ 0 & 6 \end{array}] [\begin{array}{c} y_{1} \\ y_{2} \end{array}] \\ = 2 y_{1}^{2} + 6 y_{2}^{2} . \end{aligned}

🔗

Note that there is no longer any cross term.

🔗

Now, suppose we want to graph the conic

4 x_{1}^{2} - 4 x_{1} x_{2} + 4 x_{2}^{2} = 12 .

By changing to the variables

y_{1}, y_{2}

this becomes

2 y_{1}^{2} + 6 y_{2}^{2} = 12,

\frac{y_{1}^{2}}{6} + \frac{y_{2}^{2}}{2} = 1 .

This is the standard from of an ellipse, but in terms of new variables. How do we graph it? Returning to the definition of our new variables, we find

y_{1} = \frac{1}{\sqrt{2}} (x_{1} + x_{2})

and

y_{2} = \frac{1}{\sqrt{2}} (- x_{1} + x_{2}) .

The

y_{1}

axis should be the line

y_{2} = 0,

x_{1} = x_{2} .

(Note that this line points in the direction of the eigenvector

[\begin{matrix} 1 \\ 1 \end{matrix}] .

) The

y_{2}

axis should be the line

y_{1} = 0,

x_{1} = - x_{2},

which is in the direction of the eigenvector

[\begin{matrix} - 1 \\ 1 \end{matrix}] .

🔗

This lets us see that our new coordinate axes are simply a rotation (by

π / 4

) of the old coordinate axes, and our conic section is, accordingly, an ellipse that has been rotated by the same angle.

🔗

Remark 4.3.2.

🔗

One reason to study quadratic forms is the classification of critical points in calculus. You may recall (if you took Calculus 1) that for a differentiable function

f (x),

f^{'} (c) = 0

and

f^{″} (c) > 0

at some number

c,

then

f

has a local minimum at

c .

Similarly, if

f^{'} (C) = 0

and

f^{″} (c) < 0,

then

f

has a local maximum at

c .

🔗

For functions of two or more variables, determining whether a critical point is a maximum or minimum (or something else) is more complicated. Or rather, it is more complicated for those unfamiliar with linear algebra! The second-order partial derivatives of our function can be arranged into a matrix called the Hessian matrix. For example, a function

f (x, y)

of two variables has first-order partial derivatives

f_{x} (x, y)

and

f_{y} (x, y)

with respect to

x

and

y,

respectively, and second-order partial derivatives

f_{x x} (x, y)

(twice with respect to

x

f_{x y} (x, y)

(first

x,

then

y

f_{y x} (x, y)

(first

y,

then

x

), and

f_{y y} (x, y)

(twice with respect to

y

🔗

The Hessian matrix at a point

(a, b)

H_{f} (a, b) = [\begin{matrix} f_{x x} (a, b) & f_{x y} (a, b) \\ f_{y x} (a, b) & f_{y y} (a, b) \end{matrix}] .

🔗

As long as the second-order partial derivatives are continuous at

(a, b),

it is guaranteed that the Hessian matrix is symmetric! That means that there is a corresponding quadratic form, and when the first-order derivatives

f_{x} (a, b)

and

f_{y} (a, b)

are both zero (a critical point), it turns out that this quadratic form provides the best quadratic approximation to

f (x, y)

near the point

(a, b) .

This is true for three or more variables as well.

🔗

The eigenvalues of this matrix then give us some information about the behaviour of our function near the critical point. If all eigenvalues are positive at a point, we say that the corresponding quadratic form is positive-definite, and the function

f

has a local minimum at that point. If all eigenvalues are negative at a point, we say that the corresponding quadratic form is negative-definite, and the function

f

has a local maximum at that point. If all eigenvalues are nonzero at a point, with some positive and some negative, we say that

f

has a saddle point. The corresponding quadratic form is called indefinite, and this term applies even if some eigenvalues are zero.

🔗

If a quadratic form corresponds to a symmetric matrix whose eigenvalues are positive or zero, we say that the quadratic form is positive-semidefinite. Similarly, a negative-semidefinite quadratic form corresponds to symmetric matrix whose eigenvalues are all less than or equal to zero.

🔗