Packet 11

Matrices II: Theory of Linearity

Linearity and the meaning of matrix multiplication

Matrix times vector Matrix multiplication preserves linear combinations:

This also means, of course, the preservation of scaling and (thus) of zero: , .

This fact about matrix actions is not merely a convenient property, but it is part of the very essence and definition of matrices. This fact alone completely determines the formulas for matrix multiplication and matrix actions on vectors, as we see next.

Matrix action on vectors determined by linearity

Suppose that acts upon vectors and preserves linear combinations. Since it acts on vectors, there is a well-defined image of every standard basis vector . Write for these images. Note that nothing about matrix formulas is yet involved.

Now consider any vector . We are able to write using components in the standard basis:

and therefore linearity implies:

Now recall the formula for matrix acting on vector : the output is the linear combination of the columns of using as the coefficients. So, if we write into the columns of , then according to our previous formulation ( coefficients on combos of the columns of ) is precisely the matrix given by linearity, when the images of basis vectors are recorded in the matrix columns.

In Question 04-04 we proved that matrix multiplication, as given by the formulas in Packet 04 specifically in summation notation, does satisfy linearity:

Derivation of linearity of matrix multiplication

This sequence shows that the row of the vector agrees with the row of the vector . Since this is true for every row , the vectors must be completely the same.

Question 11-01

Finding a matrix using linearity

What is the 2x2 matrix that sends to and to ?

(Hint: first determine where and are sent using linearity. Then: the matrix is given by using these images as column vectors and .)

Matrix times matrix We already have a definition of matrix multiplication as the “composition of matrix action,” namely that for all vectors .

Since acts linearly and acts linearly, the above definition implies that acts linearly:

Therefore, by our previous reasoning, we can represent the action of using a matrix where the column of the matrix of is the image of under this composition action:

In other words, the columns of must be the images of the columns of under the action of .

Linearity is fundamental

In summary: the idea of matrix product, and the formula for matrix product, come from the prescriptive hypothesis of linearity of matrix action. We can do many things in linear algebra by thinking and working in terms of linearity, instead of in terms of matrix formulas for actions and multiplications.

Question 11-02

Linearity and quadratic forms

Consider the function given by . Is this function linear? If not, is it linear in each variable separately?

Question 11-03

Dual vectors

Suppose is some vector. Define a function by the formula . Is this function linear? If so, find a matrix that represents the action of . This matrix is often called the dual vector of .

Exercise 11-01

Projection and inclusion

Suppose and are two linear mappings given by the formulas:

  • (a) Check that and (given by these formulas) are linear mappings.
  • (b) Compute the matrices that correspond to and .
  • (c) Compute the matrix of the composition by evaluating the effect of this composite function upon the standard basis vectors .
  • (d) Compute the same matrix by multiplying the matrix of and the matrix of .
  • (e) Repeat (c) and (d) for . (This time you only need .)

Change of basis

The previous section encourages us to think of linearity as fundamental, and the matrix formulas as secondary, derivable from linearity.

To be specific, the array of numbers that we have taken for granted as the “matrix columns” can be understood as “really” just the images of the standard basis vectors . When a new vector is given to us in the form , then the action of on this vector is calculated using linearity as . By writing these into the columns of a matrix, and putting the coefficients into the columns of a vector, we arrive at the usual matrix-on-vector multiplication formula.

Most importantly: we do not need to know what the vectors are to make sense of the rows and columns of the matrix , provided we insist that the columns are supposed to give the images of .

This fact allows us to generalize the idea of a matrix to the idea of a matrix in a basis. The coefficient columns of a matrix in a basis are simply the vector images of the members of the basis . Images of other vectors are calculated by writing them in terms of the basis and then applying matrix-on-vector multiplication by .

Suppose we are given a specific basis for the space that may not be the standard basis. Recall what this means: (a) that is independent, and (b) that the span of is all of . (Note: we use the prime notation because we would write , with no prime, for the standard basis.)

Using (a) and (b), every vector can be written with unique coefficients as a linear combination:

Now suppose we have some image vectors , meaning that for all , and let us define the matrix using this knowledge. By linearity we calculate:

This is just like the formulation of the action of on using the basis vectors and their images under . So, we define the matrix as having column vectors . This matrix is given in the basis in the sense that when a given vector is written with coefficients , the action of on is computed using the usual matrix-on-vector multiplication formula, but with the coefficients in made of the column vectors and the coefficients for .

It can be helpful when writing the array of coefficients of to use the notation . This notation refers to the array of numbers that are used to write as a linear combination of the vectors that constitute the basis . The brackets suggest the specific array in some non-standard basis, and of course specifies that basis.

Note: some writers also use brackets for matrices in a specified basis, for example writing . Going even further, some authors distinguish the matrix itself (which presupposes a basis) from the underlying linear transformation that determines it (using the presupposed basis to interpret its coefficients). In our notation, we only put brackets on vectors. The symbols and stand for distinct concrete matrices, but the visual relation between them serves as a reminder that performs the same action as when vector components are rewritten using the basis .

Example

Matrix written in acting on vector written in

Problem: The set of vectors

is a basis of . (These vectors are independent and there are two of them, so they span all of .)

Consider the matrix and the vector . Compute the array that represents in the new basis , and then compute . Calculate the image and compare this to the image . They should agree!

Solution: First we find . We must solve the system

The solution is . Therefore .

Next we seek . The columns of are given by the images under of the items and . So we compute:

Therefore we have . This matrix sends to the vector and it sends to the vector .

Finally, observe on the one hand that

while on the other hand

Therefore as the problem statement had anticipated.

In this example, the output was written as a vector in the standard basis. This means it equals . However, if we are given the input in terms of the new basis as , then we may wish to represent the output also in terms of this basis; in other words we may want . In a sense, the matrix has been adjusted to use the basis on the input side only, but we may want a matrix that also uses the basis on the output side as well.

Such a matrix would be denoted by . This matrix operates on vectors that are given in terms of and it produces vectors that are also given in terms of . To study more effectively we introduce a new concept, the matrix of a change of basis.

Matrix of change of basis

Consider the problem of finding , the coefficients such that for some given . Let us write for the matrix with column vectors , and called it the “transfer matrix” from basis , or for short.

Notice specifically that accepts the input vector and (using matrix-on-vector multiplication) returns the output vector given in the standard basis. Therefore, in order to find , we simply need to invert and multiply its inverse by :

Now, because changes a vector from basis into a vector in basis , it is also fair to write:

Using transfer or change of basis matrices, we can compute just by taking a matrix product:

The operation converts vectors from basis into vectors in basis , and then we multiply those vectors by . The result is the same as first computing the matrix of in the basis , obtaining , and acting by this matrix.

Finally, if we wish to put the outputs of back into the basis , we just apply another transfer matrix:

Example

Finding using ‘change of basis’ transfer matrices

Problem: Find the transfer matrix of the example in the previous section and use it to calculate .

Solution: We know that

Notice that the matrix found in the example is quickly computed as the matrix product:

To find the matrix , we compute the inverse using the usual 2x2 formula for inverses:

Then we can calculate by taking the product:

Example

Directly computing change of basis matrix using row reduction

Consider the following two bases:

Problem: Find the transfer matrices and representing change of basis.

Solution: We only need to find one, since the other will be determined as the inverse of the first. Let us start with .

Let and . Now suppose we have found a matrix such that . Then is in fact the transfer matrix . Why? Observe that by definition, where , and similarly where . But these two equations precisely combine to the matrix equation , where .

So, we solve the equation using row reduction on the augmented matrix :

It follows that . By computing the inverse using the formula, we have .

Exercise 11-02

Change of basis using row reduction

Find the change of basis transfer matrices between the following two bases:

Exercise 11-03

Changing basis without knowing vectors

Assume that and are bases of , and that

Find the change of basis transfer matrix , and also compute for the vector .

Image, kernel, transpose, rank

Definitions Suppose that is a matrix giving a linear transformation. (Notice that we don’t assume .) In this section we introduce three important subspaces that are derived from . These three subspaces are defined and may be referred to and used every time we have a matrix representing a linear transformation.

  • The image of written , also called the column space or range, is the span of the columns of , which is equivalent to the set of all possible outputs for any .
  • The kernel of written , also called the null space and written , is the set of inputs that sends to , i.e. the set of such that .
  • The co-kernel of written , also called the row space of , is the span of the row vectors of , which is equivalent to the image of the transpose . It is also equivalent to the full orthogonal complement of the kernel .

For the last one, we need to define the transpose of any matrix , written as , as the matrix which reflects across the main diagonal, swapping the roles of rows and columns.

Be aware that if is a normal column vector, then its transpose is a row vector. Another way to view this: every normal column vector is actually a matrix, while its transpose row vector is a matrix.

Exercise 11-04

Transpose has ‘reversing property’

Show that .

Hint: You should use the summation notation for matrix products:

More about co-kernels The meaning of image and kernel is clear from these definitions, but the meaning of co-kernel is less so.

If is an matrix with entries , then the row vectors of are just its rows for each . These are matrices. We will use the notation for the row vectors.

By taking transposes, we can convert these row vectors into ordinary vectors . Notice that the row vector of equals the column vector of the transpose .

Now consider the connection between row vectors and dot products. A row vector , being an matrix, acts upon a column vector and returns a vector, which is to say a scalar. This action is given by the formula:

This formula is the same as taking the dot product .

By generalizing this to every row of the matrix , and recalling that the multiplication gives a vector each row of which is the corresponding row of dotted with , we obtain the general fact that if and only if is perpendicular to each of the row vectors transposed into ordinary vectors :

Co-kernel is the orthogonal complement of the kernel

The co-kernel of is precisely the subspace of (transposes of) vectors in which are perpendicular to the kernel of .

Derivation

We know that every is perpendicular to every (transposed) row vector of . This implies that is perpendicular to every vector in the span of the (transposed) row vectors of . In other words, every is perpendicular to everything in the co-kernel of .

It remains only to show that if a vector is perpendicular to every , then the vector must be in , which is to say it must be a linear combination of (tranposed) row vectors of .

Suppose that is not a linear combination of the (transposed) row vectors of . Then (imitating the Gram-Schmidt process) define where . Then is perpendicular to all rows of and thus it belongs to . Furthermore, we know that . Thus, . (We know because is not in .) In conclusion, our vector is not perpendicular to the specific vector .

Reversing the logic (i.e. taking the contrapositive), we have shown that if a vector is perpendicular to every , then it must lie in .

Co-kernel is image of transpose

Observe that is the same as (the transpose of) :

The image of is another name for the column space of , and columns of are the same data as rows of , just transposed.

Rank The rank of a matrix is defined to be the dimension of the image of . This number is the same as the number of independent columns of . There are two fundamental theorems about the rank of a matrix which relate this number to the dimensions of the other natural subspaces derived from .

Rank-Rank Theorem

The rank of a matrix equals the rank of its transpose :

Observe that is the (transpose of the) row space of . So the theorem says that the row space and the column space of always have the same number of independent vectors.

You can remember the reason for this theorem with the mnemonic:

Pivots give the independent columns and the independent rows.

Rank-Nullity Theorem

For any matrix , we have:

To remember this theorem, observe that ‘nullity’ refers to the dimension of the null space. To remember the reason for this theorem, we have another mnemonic:

Each of the columns is either a pivot or a free variable.

The mnemonics for these theorems point the way towards their proofs.

  • For matrices in RREF, both theorems are obvious from the mnemonics. (Check this!)
  • For matrices not in RREF, the key facts are:
    • (i) that row reduction is performed by left-multiplying by an invertible matrix representing a composite sequence of invertible row operations (row-adds, row-scales, row-swaps), and
    • (ii) that the dimension of the image of is equal to the dimension of the image of whenever is an invertible matrix.
    • (iii) that the row space of a matrix is equal to the row space of the matrix when is a row reduction matrix, as in (ii).

Invertible matrices preserve subspaces

The second fact (ii) is actually much more general: multiplying by an invertible matrix preserves the dimensions of all subspaces.

This fact is not hard to prove, because invertible matrices send independent / dependent vectors to independent / dependent vectors, and thus bases to bases, and thus dimensions to dimensions.

In order to make use of these theorems, you should simply compute the RREF of a matrix and count the numbers of pivots and free variables.

Problems due 3 Apr 2024 by 12:00pm

Problem 11-01

Linearity

Suppose that are vectors and that represents some linear transformation acting on .

  • (a) Suppose we know that are independent. Do we automatically know that are independent? (Justify your answer. If false, a counterexample suffices; if true, an argument is required.)
  • (b) Suppose we know that are dependent. Do we automatically know that are dependent? (Justify your answer. If false, a counterexample suffices; if true, an argument is required.)
  • (c) If parametrizes a line passing through the origin, do we automatically know that is also a line passing through the origin? What if we know that is invertible?
  • (d) If parametrizes a line not passing through the origin, do we automatically know that is also a line not passing through the origin? What if we know that is invertible?
Problem 11-02

Changing bases

  • (a) Suppose that and are bases of . Suppose that is the matrix with column vectors equal to when written in the basis . Which of the following is satisfied by ? (i) or (ii) . (You must justify your answer.)
  • (b) Find the change of coordinates transfer matrices and between the following two bases of :
  • (c) Assume that and are bases of , and that Missing \end{align*}\begin{align*}

\mathbf{b}_1&= 2\mathbf{c}_1-\mathbf{c}_2+\mathbf{c}_3\ \mathbf{b}_2&= 3\mathbf{c}_2+\mathbf{c}_3\ \mathbf{b}_3&= -3\mathbf{c}_1+2\mathbf{c}_3.\ \end{align*}

Problem 11-03

Rank-Rank and Rank-Nullity counting practice

Answer the following questions using the Rank-Rank and Rank-Nullity theorems. It may help to work with imaginary matrices in RREF and think about the pivots.

  • (a) Suppose is with . What is and ?
  • (b) Suppose is with . What is and ?
  • (c) Suppose is with . What is and ?
Problem 11-04

Bases and dimensions from row reduction

Row reduce the following matrix in order to find a basis for the row space , and another basis for the column space , and a third basis for the kernel . By counting basis elements to determine dimensions, verify the two theorems of the section of the packet about rank.

(Hint: for , you should determine which columns of have pivots; the same columns of will then be independent vectors! This happens because row reduction preserves independence / dependence relations among the column vectors of .)