Packet 11

Matrices II: Theory of Linearity

Linearity and the meaning of matrix multiplication

Matrix times vector Matrix multiplication preserves linear combinations:

A(λ1𝐮+λ2𝐯)=λ1(A𝐮)+λ2(A𝐯).

This also means, of course, the preservation of scaling and (thus) of zero: A(λ𝐱)=λ(A𝐱), A𝟎=𝟎.

This fact about matrix actions is not merely a convenient property, but it is part of the very essence and definition of matrices. This fact alone completely determines the formulas for matrix multiplication and matrix actions on vectors, as we see next.

Matrix action on vectors determined by linearity

Suppose that A acts upon vectors and preserves linear combinations. Since it acts on vectors, there is a well-defined image of every standard basis vector 𝐞1,,𝐞n. Write 𝐚i=A𝐞i for these images. Note that nothing about matrix formulas is yet involved.

Now consider any vector 𝐱. We are able to write 𝐱 using components in the standard basis:

𝐱=x1𝐞1++xn𝐞n,

and therefore linearity implies:

A𝐱=A(x1𝐞1++xn𝐞n)=x1(A𝐞1)++xn(A𝐞n)=x1𝐚1++xn𝐚n.

Now recall the formula for matrix A acting on vector 𝐱: the output is the linear combination of the columns of A using xi as the coefficients. So, if we write 𝐚1,,𝐚n into the columns of A, then A𝐱 according to our previous formulation (xi coefficients on combos of the columns of A) is precisely the matrix given by linearity, when the images 𝐚i of basis vectors 𝐱i are recorded in the matrix columns.

In Question 04-04 we proved that matrix multiplication, as given by the formulas in Packet 04 specifically in summation notation, does satisfy linearity:

Derivation of linearity of matrix multiplication

(A(λ1𝐮+λ2𝐯))i=j=1naij(λ1uj+λ2vj)=j=1naijλ1uj+aijλ2vj=j=1nλ1(aijuj)+λ2(aijvj)=λ1j=1naijuj+λ2j=1naijvj=λ1(A𝐮)i+λ2(A𝐯)i=(λ1(A𝐮)+λ2(A𝐯))i

This sequence shows that the ith row of the vector A(λ1𝐮+λ2𝐯) agrees with the ith row of the vector λ1(A𝐮)+λ2(A𝐯). Since this is true for every row i, the vectors must be completely the same.

Question 11-01

Finding a matrix using linearity

What is the 2x2 matrix that sends 2𝐞12𝐞2 to (42) and 3𝐞2 to (36)?

(Hint: first determine where 𝐞1 and 𝐞2 are sent using linearity. Then: the matrix is given by using these images as column vectors 𝐚1 and 𝐚2.)

Matrix times matrix We already have a definition of matrix multiplication as the “composition of matrix action,” namely that (AB)𝐱=A(B𝐱) for all vectors 𝐱.

Since A acts linearly and B acts linearly, the above definition implies that AB acts linearly:

(AB)(λ1𝐮+λ2𝐯)=A(B(λ1𝐮+λ2𝐯))=A(λ1(B𝐮)+λ2(B𝐯))=λ1(A(B𝐮))+λ2(A(B𝐯))=λ1((AB)𝐮)+λ2((AB)𝐯).

Therefore, by our previous reasoning, we can represent the action of AB using a matrix where the ith column of the matrix of AB is the image of 𝐞i under this composition action:

(AB)i=(AB)𝐞i=A(B𝐞i)=A𝐛i.

In other words, the columns of AB must be the images of the columns of B under the action of A.

Linearity is fundamental

In summary: the idea of matrix product, and the formula for matrix product, come from the prescriptive hypothesis of linearity of matrix action. We can do many things in linear algebra by thinking and working in terms of linearity, instead of in terms of matrix formulas for actions and multiplications.

Question 11-02

Linearity and quadratic forms

Consider the function f:2 given by f(xy)=xy. Is this function linear? If not, is it linear in each variable separately?

Question 11-03

Dual vectors

Suppose 𝐯n is some vector. Define a function f𝐯:n by the formula f𝐯(𝐱)=𝐯𝐱. Is this function linear? If so, find a 1×n matrix that represents the action of f𝐯. This matrix is often called the dual vector of 𝐯.

Exercise 11-01

Projection and inclusion

Suppose p:42 and i:24 are two linear mappings given by the formulas:

p:(u1u2u3u4)(u3u4),i:(v1v2)(v1v200).
  • (a) Check that p and i (given by these formulas) are linear mappings.
  • (b) Compute the matrices that correspond to p and i.
  • (c) Compute the matrix of the composition ip by evaluating the effect of this composite function upon the standard basis vectors 𝐞1,,𝐞4.
  • (d) Compute the same matrix by multiplying the matrix of i and the matrix of p.
  • (e) Repeat (c) and (d) for pi. (This time you only need 𝐞1,𝐞2.)

Change of basis

The previous section encourages us to think of linearity as fundamental, and the matrix formulas as secondary, derivable from linearity.

To be specific, the array of numbers that we have taken for granted as the “matrix columns” can be understood as “really” just the images 𝐚i of the standard basis vectors 𝐞i. When a new vector is given to us in the form x1𝐞1++xn𝐞n, then the action of A on this vector is calculated using linearity as x1𝐚1++xn𝐚n. By writing these 𝐚i into the columns of a matrix, and putting the coefficients (x1,,xn) into the columns of a vector, we arrive at the usual matrix-on-vector multiplication formula.

Most importantly: we do not need to know what the vectors 𝐞i are to make sense of the rows and columns of the matrix A, provided we insist that the columns 𝐚i are supposed to give the images of 𝐞i.

This fact allows us to generalize the idea of a matrix to the idea of a matrix in a basis. The coefficient columns of a matrix A in a basis are simply the vector images of the members of the basis . Images of other vectors are calculated by writing them in terms of the basis and then applying matrix-on-vector multiplication by A.

Suppose we are given a specific basis ={𝐟1,,𝐟n} for the space n that may not be the standard basis. Recall what this means: (a) that is independent, and (b) that the span of is all of n. (Note: we use the prime notation because we would write ={𝐞1,,𝐞n}, with no prime, for the standard basis.)

Using (a) and (b), every vector 𝐱 can be written with unique coefficients as a linear combination:

𝐱=ν1𝐟1++νn𝐟n.

Now suppose we have some image vectors 𝐠1,,𝐠n, meaning that A:𝐟i𝐠i for all i, and let us define the matrix A using this knowledge. By linearity we calculate:

A(𝐱)=ν1A(𝐟1)++νnA(𝐟n)=ν1𝐠1++νn𝐠n.

This is just like the formulation of the action of A on 𝐱 using the basis vectors 𝐞i and their images 𝐚i under A. So, we define the matrix A as having column vectors 𝐠i. This matrix is given in the basis in the sense that when a given vector 𝐱 is written with coefficients ν1,,νn, the action of A on 𝐱 is computed using the usual matrix-on-vector multiplication formula, but with the coefficients in A made of the column vectors 𝐠1,,𝐠n and the coefficients (ν1νn) for 𝐱.

It can be helpful when writing the array (ν1νn) of coefficients of 𝐱 to use the notation [𝐱]. This notation refers to the array of numbers ν1,νn that are used to write 𝐱 as a linear combination ν1𝐟1++νn𝐟n of the vectors 𝐟i that constitute the basis . The brackets suggest the specific array in some non-standard basis, and of course specifies that basis.

Note: some writers also use brackets for matrices in a specified basis, for example writing [A]. Going even further, some authors distinguish the matrix itself (which presupposes a basis) from the underlying linear transformation that determines it (using the presupposed basis to interpret its coefficients). In our notation, we only put brackets on vectors. The symbols A and A stand for distinct concrete matrices, but the visual relation between them serves as a reminder that A performs the same action as A when vector components are rewritten using the basis .

Example

Matrix written in acting on vector written in

Problem: The set of vectors

={𝐟1=(11),𝐟2=(11)}

is a basis of 2. (These vectors are independent and there are two of them, so they span all of 2.)

Consider the matrix A=(2201) and the vector 𝐱=(24). Compute the array [𝐱] that represents 𝐱 in the new basis , and then compute A. Calculate the image A𝐱 and compare this to the image A[𝐱]. They should agree!

Solution: First we find [𝐱]. We must solve the system

ν1(11)+ν2(11)=(24)so:ν1+ν2=2,ν1+ν2=4.

The solution is ν2=3,ν1=1. Therefore [𝐱]=(13).

Next we seek A. The columns of A are given by the images under A of the items 𝐟1 and 𝐟2. So we compute:

A𝐟1=(2201)(11)=(41),A𝐟2=(2201)(11)=(01).

Therefore we have A=(4011). This matrix sends 𝐟1=(10) to the vector (41) and it sends 𝐟2=(01) to the vector (01).

Finally, observe on the one hand that

A[𝐱]=(4011)(13)=(44),

while on the other hand

A𝐱=(2201)(24)=(4+80+4)=(44).

Therefore A[𝐱]=A𝐱 as the problem statement had anticipated.

In this example, the output A[𝐱]=(44) was written as a vector in the standard basis. This means it equals 4𝐞1+4𝐞2. However, if we are given the input 𝐱 in terms of the new basis as 𝐟1+3𝐟2, then we may wish to represent the output also in terms of this basis; in other words we may want [A[𝐱]]. In a sense, the matrix A has been adjusted to use the basis on the input side only, but we may want a matrix that also uses the basis on the output side as well.

Such a matrix would be denoted by A. This matrix operates on vectors [𝐱] that are given in terms of and it produces vectors [𝐲] that are also given in terms of . To study A more effectively we introduce a new concept, the matrix of a change of basis.

Matrix of change of basis

Consider the problem of finding ν1,,νn, the coefficients such that 𝐱=ν1𝐟1++νn𝐟n for some given 𝐱. Let us write T=(𝐟1𝐟2𝐟n) for the matrix with column vectors 𝐟1,,𝐟n, and called it the “transfer matrix” from basis , or T for short.

Notice specifically that T accepts the input vector (ν1νn) and (using matrix-on-vector multiplication) returns the output vector 𝐱=T(ν1νn) given in the standard basis. Therefore, in order to find (ν1νn), we simply need to invert T and multiply its inverse by 𝐱:

(ν1νn)=T1(x1xn).

Now, because T1 changes a vector from basis into a vector in basis , it is also fair to write:

T=T1.

Using transfer or change of basis matrices, we can compute A just by taking a matrix product:

A=AT.

The operation T converts vectors from basis into vectors in basis , and then we multiply those vectors by A. The result is the same as first computing the matrix of A in the basis , obtaining A, and acting by this matrix.

Finally, if we wish to put the outputs of A back into the basis , we just apply another transfer matrix:

A=TA=TAT.

Example

Finding A using ‘change of basis’ transfer matrices

Problem: Find the transfer matrix of the example in the previous section and use it to calculate A.

Solution: We know that

T=(𝐟1𝐟2)=(1111).

Notice that the matrix A found in the example is quickly computed as the matrix product:

A=AT=(2201)(1111)=(4011).

To find the matrix A, we compute the inverse using the usual 2x2 formula for inverses:

T=T1=(1111)1=(1/21/21/21/2).

Then we can calculate A by taking the product:

A=TA=(1/21/21/21/2)(4011)=(32125212).

Example

Directly computing change of basis matrix using row reduction

Consider the following two bases:

={𝐛1=(75),𝐛2=(31)},𝒞={𝐜1=(15),𝐜2=(22)}.

Problem: Find the transfer matrices T𝒞 and 𝒞T representing change of basis.

Solution: We only need to find one, since the other will be determined as the inverse of the first. Let us start with T𝒞.

Let B=(𝐛1𝐛2) and C=(𝐜1𝐜2). Now suppose we have found a matrix X such that BX=C. Then X is in fact the transfer matrix T𝒞. Why? Observe that by definition, T𝒞(10)=[𝐜1]=(xy) where x𝐛1+y𝐛2=𝐜1, and similarly T𝒞(01)=[𝐜2]=(zw) where z𝐛1+w𝐛2=𝐜2. But these two equations precisely combine to the matrix equation BX=C, where X=(xzyw).

So, we solve the equation BX=C using row reduction on the augmented matrix (B|C):

(BC)=(73125152)(731208/740/724/7)(13/71/72/70153)(10210153).

It follows that T𝒞=X=(2153). By computing the inverse 𝒞T=T𝒞1 using the 2×2 formula, we have 𝒞T=(3152).

Exercise 11-02

Change of basis using row reduction

Find the change of basis transfer matrices between the following two bases:

={𝐛1=(61),𝐛2=(20)},𝒞={𝐜1=(21),𝐜2=(62)}.
Exercise 11-03

Changing basis without knowing vectors

Assume that ={𝐛1,𝐛2,𝐛3} and 𝒞={𝐜1,𝐜2,𝐜3} are bases of 3, and that

𝐛1=4𝐜1𝐜2𝐛2=𝐜1+𝐜2+𝐜3𝐛3=𝐜22𝐜3.

Find the change of basis transfer matrix 𝒞T, and also compute [𝐱]𝒞 for the vector 𝐱=3𝐛1+4𝐛2+𝐛3.

Image, kernel, transpose, rank

Definitions Suppose that A:nm is a matrix giving a linear transformation. (Notice that we don’t assume n=m.) In this section we introduce three important subspaces that are derived from A. These three subspaces are defined and may be referred to and used every time we have a matrix A representing a linear transformation.

  • The image of A written Im(A), also called the column space or range, is the span of the columns of A, which is equivalent to the set of all possible outputs 𝐲=A𝐱 for any 𝐱n.
  • The kernel of A written Ker(A), also called the null space and written Nul(A), is the set of inputs that A sends to 𝟎, i.e. the set of 𝐱n such that A𝐱=𝟎.
  • The co-kernel of A written CoKer(A), also called the row space of A, is the span of the row vectors of A, which is equivalent to the image of the transpose A𝖳. It is also equivalent to the full orthogonal complement of the kernel Ker(A).

For the last one, we need to define the transpose of any matrix A, written as A𝖳, as the matrix which reflects A across the main diagonal, swapping the roles of rows and columns.

Be aware that if 𝐯 is a normal column vector, then its transpose 𝐯𝖳 is a row vector. Another way to view this: every normal column vector 𝐯n is actually a n×1 matrix, while its transpose row vector 𝐯𝖳 is a 1×n matrix.

Exercise 11-04

Transpose has ‘reversing property’

Show that (AB)𝖳=B𝖳A𝖳.

Hint: You should use the summation notation for matrix products: (AB)ij=k=1naibj.

More about co-kernels The meaning of image and kernel is clear from these definitions, but the meaning of co-kernel is less so.

If A is an m×n matrix with entries (aij), then the row vectors of A are just its rows (ai1,,ain) for each i=1,,m. These are 1×n matrices. We will use the notation 𝐚1,,𝐚m for the row vectors.

By taking transposes, we can convert these row vectors into ordinary vectors 𝐚i𝖳. Notice that the ith row vector of A equals the ith column vector of the transpose A𝖳.

Now consider the connection between row vectors and dot products. A row vector 𝐚i, being an 1×n matrix, acts upon a n×1 column vector and returns a 1×1 vector, which is to say a scalar. This action is given by the formula:

𝐚i𝐱=(ai1ain)(x1xn)=ai1x1++ainxn.

This formula is the same as taking the dot product 𝐚i𝖳𝐱.

By generalizing this to every row of the matrix A, and recalling that the multiplication A𝐱 gives a vector each row of which is the corresponding row of A dotted with 𝐱, we obtain the general fact that A𝐱=𝟎 if and only if 𝐱 is perpendicular to each of the row vectors transposed into ordinary vectors 𝐚i𝖳:

A𝐱=𝟎𝐚i𝖳𝐱=0 for all i=1,,n.

Co-kernel is the orthogonal complement of the kernel

The co-kernel of A is precisely the subspace of (transposes of) vectors in n which are perpendicular to the kernel of A.

Derivation

We know that every 𝐱Ker(A) is perpendicular to every (transposed) row vector of A. This implies that 𝐱 is perpendicular to every vector in the span of the (transposed) row vectors of A. In other words, every 𝐱Ker(A) is perpendicular to everything in the co-kernel of A.

It remains only to show that if a vector is perpendicular to every 𝐱Ker(A), then the vector must be in CoKer(A), which is to say it must be a linear combination of (tranposed) row vectors of A.

Suppose that 𝐯 is not a linear combination of the (transposed) row vectors of A. Then (imitating the Gram-Schmidt process) define 𝐯=𝐯+𝐯 where 𝐯=projco-kernel of A(𝐯). Then 𝐯 is perpendicular to all rows of A and thus it belongs to Ker(A). Furthermore, we know that 𝐯𝐯=0. Thus, 𝐯𝐯=𝐯𝐯0. (We know 𝐯0 because 𝐯 is not in CoKer(A).) In conclusion, our vector 𝐯 is not perpendicular to the specific vector 𝐯Ker(A).

Reversing the logic (i.e. taking the contrapositive), we have shown that if a vector is perpendicular to every 𝐱Ker(A), then it must lie in CoKer(A).

Co-kernel is image of transpose

Observe that Im(A𝖳) is the same as (the transpose of) CoKer(A):

The image of A𝖳 is another name for the column space of A𝖳, and columns of A𝖳 are the same data as rows of A, just transposed.

Rank The rank of a matrix A is defined to be the dimension of the image of A. This number is the same as the number of independent columns of A. There are two fundamental theorems about the rank of a matrix which relate this number to the dimensions of the other natural subspaces derived from A.

Rank-Rank Theorem

The rank of a matrix A equals the rank of its transpose A𝖳:

rankA=dimIm(A)=dimIm(A𝖳)=rankA𝖳

Observe that Im(A𝖳) is the (transpose of the) row space of A. So the theorem says that the row space and the column space of A always have the same number of independent vectors.

You can remember the reason for this theorem with the mnemonic:

Pivots give the independent columns and the independent rows.

Rank-Nullity Theorem

For any matrix A:nm, we have:

dimIm(A)+dimKer(A)=n.

To remember this theorem, observe that ‘nullity’ refers to the dimension of the null space. To remember the reason for this theorem, we have another mnemonic:

Each of the n columns is either a pivot or a free variable.

The mnemonics for these theorems point the way towards their proofs.

  • For matrices in RREF, both theorems are obvious from the mnemonics. (Check this!)
  • For matrices not in RREF, the key facts are:
    • (i) that row reduction is performed by left-multiplying A by an invertible matrix Q representing a composite sequence of invertible row operations (row-adds, row-scales, row-swaps), and
    • (ii) that the dimension of the image of QA is equal to the dimension of the image of A whenever Q is an invertible matrix.
    • (iii) that the row space of a matrix A is equal to the row space of the matrix QA when Q is a row reduction matrix, as in (ii).

Invertible matrices preserve subspaces

The second fact (ii) is actually much more general: multiplying by an invertible matrix preserves the dimensions of all subspaces.

This fact is not hard to prove, because invertible matrices send independent / dependent vectors to independent / dependent vectors, and thus bases to bases, and thus dimensions to dimensions.

In order to make use of these theorems, you should simply compute the RREF of a matrix and count the numbers of pivots and free variables.

Problems due 3 Apr 2024 by 12:00pm

Problem 11-01

Linearity

Suppose that 𝐮1,,𝐮kn are vectors and that A represents some linear transformation acting on n.

  • (a) Suppose we know that 𝐮1,,𝐮k are independent. Do we automatically know that A𝐮1,,A𝐮k are independent? (Justify your answer. If false, a counterexample suffices; if true, an argument is required.)
  • (b) Suppose we know that 𝐮1,,𝐮k are dependent. Do we automatically know that A𝐮1,,A𝐮k are dependent? (Justify your answer. If false, a counterexample suffices; if true, an argument is required.)
  • (c) If (t) parametrizes a line passing through the origin, do we automatically know that A is also a line passing through the origin? What if we know that A is invertible?
  • (d) If (t) parametrizes a line not passing through the origin, do we automatically know that A is also a line not passing through the origin? What if we know that A is invertible?
Problem 11-02

Changing bases

  • (a) Suppose that ={𝐛1,𝐛2,𝐛3} and 𝒞={𝐜1,𝐜2,𝐜3} are bases of V. Suppose that A=([𝐜1][𝐜2][𝐜3]) is the matrix with column vectors equal to 𝐜1,𝐜2,𝐜3 when written in the basis . Which of the following is satisfied by A? (i) A[𝐱]=[𝐱]𝒞 or (ii) A[𝐱]𝒞=[𝐱]. (You must justify your answer.)
  • (b) Find the change of coordinates transfer matrices 𝒞T and T𝒞 between the following two bases of 2: ={𝐛1=(18),𝐛2=(15)},𝒞={𝐜1=(14),𝐜2=(11)}.
  • (c) Assume that ={𝐛1,𝐛2,𝐛3} and 𝒞={𝐜1,𝐜2,𝐜3} are bases of V, and that \begin{align*} ParseError: Expected & or \\ or \cr or \end at end of input: \begin{align*}

\mathbf{b}_1&= 2\mathbf{c}_1-\mathbf{c}_2+\mathbf{c}_3\ \mathbf{b}_2&= 3\mathbf{c}_2+\mathbf{c}_3\ \mathbf{b}_3&= -3\mathbf{c}_1+2\mathbf{c}_3.\ \end{align*}

Find the change of basis transfer matrix $_\mathcal{C}T_\mathcal{B}$, and also compute $[\mathbf{x}]_\mathcal{C}$ for the vector $\mathbf{x}=\mathbf{b}_1-2\mathbf{b}_2+2\mathbf{b}_3$. ParseError: Can't use function '$' in math mode at position 42: …ransfer matrix $̲_\mathcal{C}T_\…
Problem 11-03

Rank-Rank and Rank-Nullity counting practice

Answer the following questions using the Rank-Rank and Rank-Nullity theorems. It may help to work with imaginary matrices in RREF and think about the pivots.

  • (a) Suppose A is 3×8 with rankA=2. What is dimKer(A) and dimCoKer(A)?
  • (b) Suppose A is 7×3 with rankA=3. What is dimKer(A) and dimCoKer(A)?
  • (c) Suppose A is 9×5 with dimKer(A)=2. What is rankA and dimCoKer(A)?
Problem 11-04

Bases and dimensions from row reduction

Row reduce the following matrix A in order to find a basis for the row space Im(A𝖳), and another basis for the column space Im(A), and a third basis for the kernel Ker(A). By counting basis elements to determine dimensions, verify the two theorems of the section of the packet about rank.

A=(258017135153111971171353)

(Hint: for Im(A), you should determine which columns of QA have pivots; the same columns of A will then be independent vectors! This happens because row reduction Q preserves independence / dependence relations among the column vectors of A.)