Packet 12

Matrices III: Inner Products

In this packet we study some interrelated concepts: orthogonal matrices, symmetric matrices, inner products, quadratic forms, and positive definite matrices.

Orthogonal matrices

A matrix is called orthogonal when it satisfies .

‘Orthogonal’ ‘orthonormal’ for matrices

The term orthonormal is also used for the same class of matrices. (Warning! Not so for vectors! Orthogonal vectors are merely perpendicular, while orthonormal vectors are also unit vectors.)

Now consider the meaning of the formula . Remember the rule for matrix products (e.g. in D):

What is ? Each row of this vector is the dot product of the same row of with the vector . In turn, each row of is from a column of . Combining all these claims with the product rule, column of has rows equal to the dot products of the various columns of with the specific column of . Specifically, . In conclusion, means that

It follows that is another way of saying that and are mutually orthonormal. So, if you put an orthonormal basis into the column vectors of a matrix, the resulting matrix is orthogonal. Conversely, if a matrix is orthogonal, then its columns form an orthonormal basis. (They are orthonormal by the definition of orthogonal, and they are a basis because they are perpendicular and thus independent.)

An important fact about orthogonal matrices is that they correspond geometrically to rotations and reflections (and combinations of these actions called ‘rotoreflections’).

Orthogonal matrices preserve the dot product, preserving lengths and angles

The action of an orthogonal matrix preserves the dot product:

Why

To see this, recall (1) our interpretation of row vectors as matrices, and (2) the dot product with as the action of the transpose row vector acting by matrix multiplication on the left, so e.g. . Using this, we derive:

Symmetric matrices

A symmetric matrix is a matrix that satisfies . In terms of coefficients, this means it satisfies . A symmetric matrix is symmetric about reflection in a mirror that runs down the main diagonal:

An antisymmetric matrix (also called skew-symmetric) satisfies , in other words .

Symmetric matrices do arise naturally in many applications. This can happen for a variety of reasons. One example is PCA (principal component analysis): here the matrix in question arises as a “covariance matrix,” and it is symmetric because covariance is symmetric: . Antisymmetric matrices are less common.

Symmetric matrices have some very nice properties:

Spectral theorem

Let be any symmetric matrix, so . Then:

  • (a) has the max number of eigenvalues (some might be repeated, but they still exist).
  • (b) All eigenvalues of are real numbers.
  • (c) Eigenvectors with distinct eigenvalues are perpendicular to each other.
  • (d) There is an orthonormal basis such that becomes diagonal in this basis: in other words . Notice that is an orthogonal matrix, and are eigenvectors of .

Proofs of most of these facts are just a little beyond the scope of a first applied course. Part (c) is not hard though:

Orthogonal eigenvectors of a symmetric matrix

Suppose and and . Assume, as we do for eigenvectors, that . Then consider the sequence:

Therefore, we find that . This can only happen if , which we have denied by hypothesis, or else if , which must therefore be true.

The equation is equivalent to a very nice formula for called the spectral decomposition formula:

In this formula, the are the orthogonal basis given by the spectral theorem (remember they are also eigenvectors), and the are the corresponding eigenvalues.

SVD prelude

The spectral decomposition is an instance of the singular value decomposition (SVD) which works for general matrices. Stay tuned for more in the next Packet!

A great thing about this formula is that each is actually a projection matrix onto :

Why? Of course . Since are unit vectors, we have . This is the same thing, just written with a different order for the scalar factor (result of the dot product).

Explaining the spectral decomposition formula

The spectral decomposition formula itself comes from writing out the matrices in :

Quadratic forms and symmetric matrices

A quadratic form is a function from vectors to scalars that can be written as a polynomial with all terms having degree . For example:

In short, a quadratic form is just a polynomial in several variables that is homogeneous of degree . (‘Homogeneous’ just means that all terms have the same degree.) The difference between quadratic forms and quadratic polynomials is that the latter could also have terms of degree zero and degree one.

Applications of quadratic forms

The simplest possible quadratic form is the sum of squares: . Notice that this form gives the square distance from to . One strong application of linear algebra is the ability to extremize quadratric forms. For example, in Linear Least Squares, we extremize the distance from a point to a certain span. We will shortly see how to represent this distance as a quadratic form. In Weighted LLS, we minimize the distance with inequal weights applied to each term, such as in . This kind of weighting is useful when some data is more reliable than other data, so it should be given more weight.

Question 12-01

[Removed!]

Key fact:

Matrix of a quadratic form

Every quadratic form can be represented uniquely by a symmetric matrix, meaning there is some symmetric such that .

So there is a perfect correspondence between quadratic forms and symmetric matrices.

Example

Quadratic form

Question 12-02

Matrix of a quadratic form

What is the matrix such that is the quadratic form ?

A quadratic form can always be converted into a new quadratic form that has no cross terms by a suitable linear change of variables. If the original variables are the components of , then a linear change of variables would be a vector defined by for some change of basis transfer matrix .

Principal axes theorem: eliminating cross terms

Given a quadratic form defined by the symmetric matrix , suppose . Set , so also . Then has no cross terms in the variables.

Derivation

First plug in the new variables and manipulate:

This last expression has no cross terms:

Example

Eliminating cross terms

Problem: Eliminate the cross terms in the quadratic form .

Solution: First compute the matrix of . By reading off the components (dividing in half the cross term), we see

Now we must write for an orthogonal matrix. We find the eigenvalues and then seek eigenvectors that are orthonormal. Observe that , so the eigenvalues are .

Then we have:

so gives the relation satisfied by the rows of an eigenvector. Choose and then normalize this to the unit vector .

Then we also have:

so and we choose as an eigenvector, and normalize to the unit vector . Notice that .

Now we set and and write our expression for :

In terms of the variables, we have . The formulas relating the variables to each other are given by interpreting as the change of variable transfer matrix:

Exercise 12-01

Eliminating cross terms.

Repeat the example to eliminate cross terms for the quadratic form .

Constrained optimization, min-max principle

One simple yet important application of elimination of cross terms is constrained optimization. The goal of constrained optimization is to find the extreme values of a quadratic form where is constrained to be a unit vector, so . Geometrically, this means: find the direction of fastest increase or direction of fastest decrease of , and find the rate of increase / decrease in those directions.

If we have written so as to eliminate cross terms:

then the max / min of occurs for where is the max / min coefficient.

If is not written without cross terms, then we can first eliminate cross terms completely, and find the max / min among the set of . This will give the max / min value, and the input where this occurs in terms of the variables at . The change of variables must be made in reverse to find the corresponding input values where the max / min occurs. If the eigenvector corresponding to has already been found, its components are by definition the components of that we seek.

Example

Optimizing

In the Example of the previous section on eliminating cross terms, we obtained the formula for in terms of the variables . We can see directly from this formula that the min eigenvalue is and the max eigenvalue is .

The min of occurs when . In other words it is at the first eigenvector . In terms of the variables, this occurs at and , the components of in the standard basis.

The max of occurs when . In other words at , which corresponds to and .

Exercise 12-02

Optimizing

Repeat the example on this quadratic form, using your work in Exercise 12-01, to find the min and max of for a unit vector, and find the unit vectors that produce these extreme values.

However, it is not necessary to find all of the . These are the eigenvalues / eigenvectors, and there are some methods of computing eigenvalues that locate the largest / smallest values by a numerical procedure that does not waste time finding the other values. The power method is an example of such a method. (Read more about the power method online, if you like.)

Min-max principle and Rayleigh quotient

A closely related topic is the min-max principle for finding eigenvalues using the Rayleigh quotient. Given any symmetric matrix , its Rayleigh quotient is:

Notice that the value of is the same as the value of the quadratic form generated by , evaluated at the unit vector given by first normalizing the input . Therefore, maximizing over all possible is the same as maximizing the quadratic form of over all possible unit vectors.

The min-max principle is simply this: the min / max eigenvalues of are the min / max values of , and these occur when is an eigenvector for that eigenvalue. The use of this principle is to find min / max eigenvalues / eigenvectors of a symmetric by extremizing with numerical methods (like the power method and its variants).

The min-max principle and Rayleigh quotient is essentially just a notation and terminology for the process of normalizing and plugging it into the quadratic form arising from the symmetric matrix .

Positive definite symmetric matrices

  • A positive definite matrix is by definition a symmetric matrix all of whose eigenvalues are greater than zero.
  • A negative definite matrix is by definition a symmetric matrix all of whose eigenvalues are less than zero.
  • An indefinite matrix is by definition a symmetric matrix whose eigenvalues have mixed signs.

For example, in terms of spectral decompositions, we have:

Since the spectral decomposition reveals the quadratic form in the new variables, where , but these input variables cover the same set of possibilities, we also know:

Quadratic forms definiteness

  • When is positive definite, then its form is always positive for .
  • When is negative definite, then its form is always negative for .

Inner products and symmetric matrices

An inner product is a function sending two vector inputs to one scalar output:

which is linear in both inputs considered separately.

Rules of inner products:

  • Symmetry.
  • Respects sums (also in using symmetry).
  • Respects scalars (also in using symmetry).
  • for all , and only occurs when .

We are very familiar with the dot product, which is a valid inner product:

Using the linearity of matrix actions, we can create many candidate inner products by first acting by a symmetric matrix and then taking the usual dot product:

This definition of is automatically symmetric (i.e. it satisfies the symmetry rule) because is symmetric:

If you take any inner product given by a symmetric matrix, and plug the input into both slots, you get a quadratic form:

The last property of inner products (namely that etc.) means that a quadratic form created in this way by setting is always a positive definite quadratic form.

Conversely, any inner product is given by some positive definite symmetric matrix. To find the matrix corresponding to an inner product, simply set . This works because gives the column vector of , and then gives the row entry of this column, which is therefore the row and column entry of . By linearity, is then determined on every other pair of vectors by this data.

Addendum: determinants and eigenvectors

The determinant of a matrix is given by the formula:

The determinant of a matrix is given by the formula:

Higher determinants

You may notice a pattern here that hold true in general and allows calculation of determinants: traverse any row or column, using these entries as weights (with alternating signs), writing a weighted sum of the smaller determinants of the minor matrices (which are given by deleting the row and column in which the given entry in our traversal process lives). This process therefore reduces an determinant to a linear combination of determinants, and the reduction can be iterated until the case is reached.

The characteristic polynomial of a matrix is defined as the determinant in which is considered an indefinite variable.

For example: if then and (using the determinant formula) we have:

Key fact for finding eigenvalues

The eigenvalues of a matrix are the roots of its characteristic polynomial .

In the previous example, therefore, are the eigenvalues.

Once the eigenvalues are found by solving for , the solutions can be plugged back in for in the equation , and then we can try to find the eigenvectors by solving this equation. For the case, you can solve it using the fact that one row is a multiple of the other. In and higher, you may need to perform row reduction on (with ) in order to find solutions .

Notice that if is an eigenvector, then is also an eigenvector for any scalar . So really we are studying eigenlines, namely the spans of eigenvectors. The relation of proportionality between and gives the direction of this line.

For in the previous example, solving would mean solving . Therefore and thus , so an eigenvector is given by and a line of eigenvectors is given by the span for all . For we find , and solving this we find and therefore is an eigenvector, as is everything in the span .

Problems due 10 Apr 2024 by 12:00pm

Exercises for submission with HW (if not in class)

Quadratic form: eliminating cross terms and optimizing

  • (a) Eliminate cross terms in the quadratic form .
  • (b) Find the min and max values of from (a) where is a unit vector, and find the unit vectors that produce these extreme values. (Use your result in (a) to minimize in changed variables, and then change variables back again.)
Problem 12-01

Finding a spectral decomposition

Let .

  • (a) Find the eigenvalues of . There will be two distinct numbers, , not three.
  • (b) Row reduce and to solve the two equations and . In this way, write down a set of 3 eigenvectors, two of which have one of the eigenvalues, and the third has the other eigenvalue.
  • (c) For the ‘double’ eigenvalue, convert its two eigenvectors to orthogonal vectors (subtract a projection like in Gram-Schmidt, unless they are already perpendicular).
  • (d) Normalize all your orthogonal eigenvectors to get unit eigenvectors. Call them . Now the matrix is an orthogonal matrix, its inverse is , and we know that where has the eigenvectors along the diagonal. (In order, so for .)
  • (e) Write out the spectral decomposition: . In other words, simply compute the three projection matrices , and plug them into this formula. Check that if you were to take the sum, you would get .
Problem 12-02

Quadratic form

Let be the matrix as in Problem 12-01. Use results of 12-01 for this problem: you do not need to find the eigenvalues / eigenvectors again.

  • (a) What is the quadratic form associated to ?
  • (b) What is the quadratic form in the variables that eliminate cross terms?
  • (c) What is the max and min of for ?
  • (d) What are the vectors that yield the max and min found in (d)?
Problem 12-03

Functions of symmetric matrices

  • (a) Write out in coefficients the matrix when .
  • (b) Compute the matrix product when . (Notice that you have found .)
  • (c) Write out in coefficients the matrix when .
  • (d) Guess the coefficients in such that for as in (a). Verify your guess by writing out in coefficients and then calculating . (Notice that you have found .)
  • (e) Now drop the assumption that , and yet find by FOILing the product . To simplify the result, figure out what is for and .
  • (f) Based on (a)-(e), can you guess what should be in the matrix ? (Hint: consider that has a power series expansion: .)
Problem 12-04

Positive definite matrices

Suppose and exists. (Symmetric and invertible.) Notice that the Spectral Theorem applies.

  • (a) What are the eigenvalues of in terms of the eigenvalues of ?
  • (b) Suppose that is positive definite. Show that is also positive definite. (Hint: make use of (a).)