Packet 07

Vectors III: Orthogonality

Subspaces of โ„n are extremely important. For example, if n represents the number of pixels in an image, a vector ๐ฏโˆˆโ„n might represent an image. The number n is very large, and even computers can take a (relatively) long time to perform operations on vectors in โ„n such as multiplying nร—n matrices to compose linear transformations.

Frequently a given dataset will belong (or โ€˜nearlyโ€™ belong) to a subspace of โ„n of much smaller dimension. It is very useful to be able to describe and work with vectors in such a subspace in an efficient way. This typically means finding a good basis. A good basis should be easy to work with, and it should have a useful interpretation.

Orthogonal bases

An orthogonal basis {๐ฎ1,โ€ฆ,๐ฎk} is simply a basis in which the vectors are pairwise perpendicular, meaning ๐ฎiโ‹…๐ฎj=0 for iโ‰ j. It is very easy to tell if a basis is orthogonal: just compute all pairwise dot products. This is a central reason that orthogonal bases are very popular.

If a basis is orthogonal, there is a very fast way to find the components of any given vector in this basis. Suppose ๐ฏโˆˆUโŠ‚โ„n, meaning ๐ฏ is a vector in the subspace U of โ„n. Suppose that U is k-dimensional with orthogonal basis {๐ฎ1,โ€ฆ,๐ฎk}. The components of ๐ฏ in this basis are by definition the unique coefficients c1,โ€ฆ,ck in the linear combination:

๐ฏ=c1๐ฎ1+โ‹ฏ+ck๐ฎk.

Now, take the dot product of both sides with ๐ฎi, for any given i, using the fact that ๐ฎiโ‹…๐ฎj=0 unless j=i:

๐ฎiโ‹…๐ฏ=c1(๐ฎiโ‹…๐ฎ1)+โ‹ฏ+ck(๐ฎiโ‹…๐ฎk)=0+โ‹ฏ+0+ci(๐ฎiโ‹…๐ฎi)+0+โ‹ฏ+0,

so

ci=๐ฎiโ‹…๐ฏ๐ฎiโ‹…๐ฎi.

Summarizing:

Finding components in an orthogonal basis

Given {๐ฎ1,โ€ฆ,๐ฎk} an orthogonal basis, the components of ๐ฏ=c1๐ฎ1+โ‹ฏ+ck๐ฎk are given by the formulas:

c1=๐ฎ1โ‹…๐ฏ๐ฎ1โ‹…๐ฎ1,c2=๐ฎ2โ‹…๐ฏ๐ฎ2โ‹…๐ฎ2,โ€ฆck=๐ฎkโ‹…๐ฏ๐ฎkโ‹…๐ฎk.

If {๐ฎ1,โ€ฆ,๐ฎk} is not only orthogonal but also orthonormal, then the components are even simpler:

c1=๐ฎ1โ‹…๐ฏ,c2=๐ฎ2โ‹…๐ฏ,โ€ฆck=๐ฎkโ‹…๐ฏ.

The fact that components are so easy to calculate for orthogonal bases is one of their most important features, and frequently is the main reason they are used.

Question 07-01

Orthogonal vectors, finding components

Show that the set of vectors is orthogonal:

๐ฎ1=(110),๐ฎ2=(โˆ’112),๐ฎ3=(2โˆ’22),

and then compute the components of (111) in the basis {๐ฎ1,๐ฎ2,๐ฎ3}.

The above method can be written out succinctly in what is sometimes called the Fourier expansion formula:

๐ฏ=โˆ‘i=1k(๐ฎiโ‹…๐ฏ๐ฎiโ‹…๐ฎi)๐ฎi.

A simple observation shows another advantage of orthogonal sets of vectors โ€“ that they are automatically independent:

Orthogonal vectors are automatically independent

Suppose that {๐ฎ1,โ€ฆ,๐ฎk} is an orthogonal set.

Assume hypothetically that x1๐ฎ1+โ‹ฏ+xk๐ฎk=0.

By dotting both sides with any ๐ฎi, we see that xi(๐ฎiโ‹…๐ฎi)=0, so we must have xi=0 (because ๐ฎiโ‹…๐ฎiโ‰ 0 since we assume ๐ฎiโ‰ 0). This can be done for any i, so in fact xi=0 for all i.

Therefore {๐ฎ1,โ€ฆ,๐ฎk} is an independent set of vectors.

Projection

Projection and orthogonal bases Recall the projection formula:

proj๐ฎi(๐ฏ)=(๐ฏโ‹…๐ฎi๐ฎiโ‹…๐ฎi)๐ฎi.

So ci is just the scalar coefficient of the projection of ๐ฏ onto ๐ฎi (sometimes called the โ€˜scalar projectionโ€™).

Using this fact, we could equally write a vector as a decomposition according to an orthogonal basis using a sum of projections to the basis elements:

๐ฏ=proj๐ฎ1(๐ฏ)+โ‹ฏ+proj๐ฎk(๐ฏ).

Projection to a subspace The projection formula for proj๐ฎ(๐ฏ) provides the closest vector to ๐ฏ that is on the line spanned by ๐ฎ, and the difference ๐ฏโˆ’proj๐ฎ(๐ฏ) is perpendicular to ๐ฎ. Furthermore, proj๐ฎ(๐ฏ) does not depend on the length of ๐ฎ at all, in fact it only depends on the subspace spanned by ๐ฎ, which is a line. Therefore, proj๐ฎ could really be called projL where L=โŸจ๐ฎโŸฉ is the subspace spanned by ๐ฎ.

It is extremely useful to be able to compute the projection of a vector ๐ฏ onto any subspace UโŠ‚โ„n. This result should be the unique vector ๐ฐ inside U which is closest to ๐ฏ. It is also the unique vector ๐ฐ such that the displacement ๐ฏโˆ’๐ฐ is perpendicular to U (meaning: perpendicular to every vector in U).

Now suppose that we have an orthogonal basis ๐ฎ1,โ€ฆ,๐ฎk of U. We can define the projection of ๐ฏโˆˆโ„n onto the subspace U in terms of this basis:

Projection of ๐ฏ to a subspace UโŠ‚โ„n

projU(๐ฏ)=proj๐ฎ1(๐ฏ)+โ‹ฏ+proj๐ฎk(๐ฏ).
  • Notice that projU(๐ฏ) is a linear combination of basis vectors. (Consider the separate formulas for proj๐ฎi(๐ฏ).) Therefore it lies inside the subspace U. We might have projU(๐ฏ)โ‰ ๐ฏ though, because here we do not assume that ๐ฏโˆˆU.

  • In addition, using this formula we can deduce that the displacement vector ๐ฑ=๐ฏโˆ’projU(๐ฏ) is perpendicular to every vector in U: Consider the dot product against any basis vector ๐ฎi:

๐ฎiโ‹…๐ฑ=๐ฎiโ‹…๐ฏโˆ’๐ฎiโ‹…projU(๐ฏ)=๐ฎiโ‹…๐ฏโˆ’(๐ฎiโ‹…(๐ฏโ‹…๐ฎ1๐ฎ1โ‹…๐ฎ1)๐ฎ1+โ‹ฏ+๐ฎiโ‹…(๐ฏโ‹…๐ฎk๐ฎkโ‹…๐ฎk)๐ฎk)=๐ฎiโ‹…๐ฏโˆ’((๐ฏโ‹…๐ฎ1๐ฎ1โ‹…๐ฎ1)(๐ฎiโ‹…๐ฎ1)+โ‹ฏ+(๐ฏโ‹…๐ฎk๐ฎkโ‹…๐ฎk)(๐ฎiโ‹…๐ฎk))=๐ฎiโ‹…๐ฏโˆ’(0+โ‹ฏ+0+(๐ฏโ‹…๐ฎi๐ฎiโ‹…๐ฎi)(๐ฎiโ‹…๐ฎi)+0+โ‹ฏ+0)=๐ฎiโ‹…๐ฏโˆ’๐ฏโ‹…๐ฎi(๐ฎiโ‹…๐ฎi)(๐ฎiโ‹…๐ฎi)=๐ฎiโ‹…๐ฏโˆ’๐ฏโ‹…๐ฎi=0.

So ๐ฎiโ‹…๐ฑ=0 for all i. Any element of U can be written as a linear combination of the basis vectors ๐ฎi, so ๐ฑ is perpendicular to any vector in U.

Question 07-02

Perpendicularity to a subspace

Explain and justify in more detail the last statement: โ€œ๐ฎiโ‹…๐ฑ=0 for all iโ€ implies โ€œ๐ฎโ‹…๐ฑ=0 for all ๐ฎโˆˆU.โ€

To summarize, orthogonal bases allow us to:

  • Easily calculate the components of any vector in the orthogonal basis
  • Formulate the projection of any vector onto a subspace using an orthogonal basis of the subspace

Gram-Schmidt process

In this section, we learn a procedure to create an orthogonal basis of a given subspace, starting only with some vectors that are known to generate the subspace as their span. If the original vectors are not independent, we will discover that fact along the way, and possibly wind up with a smaller number of vectors.

center Let us illustrate the process with a span of three vectors V=โŸจ๐ฏ1,๐ฏ2,๐ฏ3โŸฉโŠ‚โ„n. We do not assume they are independent.

  • Write ๐ฐ1=๐ฏ1.
  • Exchange ๐ฏ2 for ๐ฐ2=๐ฏ2โˆ’proj๐ฐ1(๐ฏ2). Since ๐ฐ2 is a linear combination of ๐ฐ1 and ๐ฏ2, and ๐ฏ2 is a linear combination of ๐ฐ1 and ๐ฐ2, we know โŸจ๐ฏ1,๐ฏ2,๐ฏ3โŸฉ=โŸจ๐ฐ1,๐ฐ2,๐ฏ3โŸฉ. Furthermore, ๐ฐ1 and ๐ฐ2 are orthogonal.
  • Now exchange ๐ฏ3 for ๐ฐ3=๐ฏ3โˆ’projV2(๐ฏ3), where V2=โŸจ๐ฐ1,๐ฐ2โŸฉ. Since ๐ฐ3 is a linear combination of ๐ฏ3 and items in V2, we know โŸจ๐ฏ1,๐ฏ2,๐ฏ3โŸฉ=โŸจ๐ฐ1,๐ฐ2,๐ฐ3โŸฉ, and ๐ฐ3 is orthogonal to everything in V2.

We conclude with V=โŸจ๐ฐ1,๐ฐ2,๐ฐ3โŸฉ, where the ๐ฐiโ‹…๐ฐj=0 unless i=j. So these vectors are orthogonal.

Question 07-03

Dependence?

What happens in the above Gram-Schmidt process if {๐ฏ1,๐ฏ2,๐ฏ3} is a dependent set?

Example

Gram-Schmidt illustration for 3 vectors

Problem: Use the Gram-Schmidt process to find an orthogonal basis for the following span:

๐ฏ1=(1100),๐ฏ2=(0110),๐ฏ3=(1011).

Solution: We first define ๐ฐ1=๐ฏ1. Then compute proj๐ฐ1(๐ฏ2):

proj๐ฐ1(๐ฏ2)=๐ฏ2โ‹…๐ฐ1๐ฐ1โ‹…๐ฐ1๐ฐ1=12๐ฐ1,

and therefore ๐ฐ2=๐ฏ2โˆ’12๐ฐ1=(โˆ’1/21/210). At this point we have โŸจ๐ฏ1,๐ฏ2,๐ฏ3โŸฉ=โŸจ๐ฐ1,๐ฐ2,๐ฏ3โŸฉ. Since we are just trying to preserve the span, it is convenient to change the definition of ๐ฐ2 to (โˆ’1120) by multiplying it by 2. This wonโ€™t change the span, and we eliminate the fractions.

Finally compute projV2(๐ฏ3). Since V2=โŸจ๐ฐ1,๐ฐ2โŸฉ, and this is actually an orthogonal basis for V2, we can use the formula for proj that splits it into components according to the orthogonal basis:

projV2(๐ฏ3)=proj๐ฐ1(๐ฏ3)+proj๐ฐ2(๐ฏ3)=๐ฏ3โ‹…๐ฐ1๐ฐ1โ‹…๐ฐ1๐ฐ1+๐ฏ3โ‹…๐ฐ2๐ฐ2โ‹…๐ฐ2๐ฐ2=12๐ฐ1+16๐ฐ2=(1/32/31/30).

Therefore ๐ฐ3=(2/3โˆ’2/32/31), and we can scale this up as with ๐ฐ2 and use instead (2โˆ’223). So we have an orthogonal basis:

โŸจ(1100),(โˆ’1120),(2โˆ’223)โŸฉ.
Exercise 07-01

Gram-Schmidt practice

Perform Gram-Schmidt to find an orthogonal basis of the subspace:

โŸจ(2010),(0111),(2232)โŸฉ.

General Gram-Schmidt process

The Gram-Schmidt process can be continued in the same way for more than three vectors. Here is the general rule, in the abstract, for ๐ฏ1,โ€ฆ,๐ฏk:

  • Set ๐ฐ1=๐ฏ1.
  • Write Vj=โŸจ๐ฏ1,โ€ฆ,๐ฏjโŸฉ.
  • Set ๐ฐj+1=๐ฏj+1โˆ’projVj(๐ฏj+1).
    • First step: use j=1,j+1=2.
    • We know that โŸจ๐ฐ1,โ€ฆ,๐ฐjโŸฉ=โŸจ๐ฏ1,โ€ฆ,๐ฏjโŸฉ by considering each formula ๐ฏj+1โˆ’projVj(๐ฏj+1).
    • We know that ๐ฐiโ‹…๐ฐj=0 when iโ‰ j (orthogonality of the ๐ฐi we are building).
    • Therefore we compute using: projVj(๐ฏj+1)=proj๐ฐ1(๐ฏj+1)+โ‹ฏ+proj๐ฐj(๐ฏj+1).
  • Repeat with the next step ๐ฐj+2.
  • Produce a new list of ๐ฐi vectors with โŸจ๐ฐ1,โ€ฆ,๐ฐkโŸฉ=โŸจ๐ฏ1,โ€ฆ,๐ฏkโŸฉ.
    • If the original vectors ๐ฏ1,โ€ฆ,๐ฏk were independent, then we never have ๐ฐj+1=0 for any j.
    • If there is some redundancy in ๐ฏ1,โ€ฆ,๐ฏk, then at some point projVj(๐ฏj+1)=๐ฏj+1 and ๐ฐj+1=0.
    • We can detect this and throw out any ๐ฐj=0 from the final list to obtain truly independent vectors ๐ฐ1โ€ฒ,โ€ฆ,๐ฐโ„“โ€ฒ, none of which is zero.
    • The dimension of the space โŸจ๐ฏ1,โ€ฆ,๐ฏkโŸฉ is then โ„“, then number of nonzero items ๐ฐj we obtained.

Application: distance from point to subspace

Many applications of linear algebra involve the measure of distance from a given vector to a subspace that does not include the vector. Typically the subspace is known as the span of a list of given vectors.

Suppose ๐ฏโˆˆโ„n and U=โŸจ๐ฎ1,โ€ฆ,๐ฎkโŸฉโŠ‚โ„n is a subspace, given as the span of the vectors ๐ฎ1,โ€ฆ,๐ฎk.

Here is a standard solution to the problem of finding the distance from ๐ฏ to U:

  • First use Gram-Schmidt to produce an orthogonal basis {๐ฐ1,โ€ฆ๐ฐโ„“} of U. Itโ€™s possible that โ„“<k.
  • Now compute the projection projU(๐ฏ).
  • The orthogonal part ๐ฏโŸ‚=๐ฏโˆ’projU(๐ฏ) is a vector from U straight to ๐ฏ.
  • |๐ฏโŸ‚| is the distance from ๐ฏ to U.

Example

Computing distance of point to subspace

Problem: Find the closest point on the subspace given by the span

U=โŸจ(023โˆ’1),(1โˆ’21โˆ’1),(2013)โŸฉ

to the vector ๐ฑ=(โˆ’1,โˆ’3,2,1).

Solution: Label the vectors in the span ๐ฏ1,๐ฏ2,๐ฏ3, in order. Notice that they are already pairwise orthogonal, so we can skip Gram-Schmidt. Then compute projU(๐ฑ):

projU(๐ฑ)=proj๐ฏ1(๐ฑ)+proj๐ฏ2(๐ฑ)+proj๐ฏ3(๐ฑ)=๐ฑโ‹…๐ฏ1๐ฏ1โ‹…๐ฏ1๐ฏ1+๐ฑโ‹…๐ฏ2๐ฏ2โ‹…๐ฏ2๐ฏ2+๐ฑโ‹…๐ฏ3๐ฏ3โ‹…๐ฏ3๐ฏ3=โˆ’114๐ฏ1+67๐ฏ2+314๐ฏ3=(9/7โˆ’13/76/7โˆ’1/7).

Therefore

๐ฑโˆ’projU(๐ฑ)=(โˆ’16/7โˆ’8/78/78/7)

and the length of this vector is 44849, which is our answer.

Basic linear regression as projection

Linear regression: projection of dependent data vector to subspace of โ€œbest fit linesโ€

Basic linear regression is essentially an application of the above idea. Later we will focus on the details for a more general version; here is a preview summary:

  • Data is given by vectors ๐ฑ and ๐ฒ matching input to output by the order of components. If there are n data pairs (x,y), then ๐ฑ,๐ฒ will have n components.
  • The line of best fit should have the form y=mx+b. We seek m and b, as well as the correlation coefficient r2 which expresses how close the data ๐ฒ is to the best fit line. It is specifically the sum of the square distances from each component, i.e. each data pair.
  • The span โŸจ๐ฑ,๐ŸโŸฉ is all vectors of the form {m๐ฑ+b๐Ÿ|m,bโˆˆโ„}. These vectors encode outputs of the possible fit lines evaluated at the input data ๐ฑ.
  • The best fit line is y=mx+b where m๐ฑ+b๐Ÿ is the closest vector to ๐ฒ in the span โŸจ๐ฑ,๐ŸโŸฉ.
    • Convert โŸจ๐ฑ,๐ŸโŸฉ to the orthogonal basis โŸจ๐ฑ^,๐ŸโŸฉ. Here ๐ฑ^=๐ฑโˆ’xโ€พ๐Ÿ where xโ€พ is the average of the components of ๐ฑ.
    • Compute ๐ณ=projโŸจ๐ฑ,๐ŸโŸฉ(๐ฒ)=proj๐ฑ^(๐ฒ)+proj๐Ÿ(๐ฒ).
    • Define m,bโ€ฒ as the coefficients proj๐ฑ^(๐ฒ)=m๐ฑ^ and proj๐Ÿ(๐ฒ)=bโ€ฒ๐Ÿ.
    • Therefore we have ๐ณ=m๐ฑ^+bโ€ฒ๐Ÿ, so then ๐ณ=m๐ฑ+(bโ€ฒโˆ’mxโ€พ)๐Ÿ, and we have found m and b=bโ€ฒโˆ’mxโ€พ.
  • The distance |๐ฒโˆ’(m๐ฑ+b๐Ÿ)| determines the correlation r. The absolute distance is divided by the relative length of ๐ฒ^=๐ฒโˆ’yโ€พ๐Ÿ to get a scale-invariant number. The final formula is: 1โˆ’r2=|๐ฒโˆ’(m๐ฑ+b๐Ÿ)|2|๐ฒโˆ’yโ€พ๐Ÿ|2. center
Exercise 07-02

Computing a best fit line

Following the steps above, compute the best fine line (i.e. compute m and b) for the data used in Problem 05-03:

(โˆ’3,4),(1,โˆ’1),(0,โˆ’1),(4,โˆ’3),(โˆ’2,1).

In Problem 05-03, you already found xโ€พ and yโ€พ, and implicitly ๐ฑ^ and ๐ฒ^.

Derivation of correlation coefficient formula

Here we verify that

|๐ฒโˆ’(m๐ฑ+b๐Ÿ)|2|๐ฒโˆ’yโ€พ๐Ÿ|2=1โˆ’(๐ฑ^โ‹…๐ฒ^)2(๐ฑ^โ‹…๐ฑ^)(๐ฒ^โ‹…๐ฒ^).

First, since |๐ฒโˆ’yโ€พ๐Ÿ|2=๐ฒ^โ‹…๐ฒ^=|๐ฒ^|2, it is sufficient (confirm this for yourself!) to show that

|๐ฒโˆ’(m๐ฑ+b๐Ÿ)|2=|๐ฒ^|2โˆ’(๐ฑ^โ‹…๐ฒ^)2|๐ฑ^|2=|๐ฒ^|2โˆ’|proj๐ฑ^(๐ฒ)|2.

Since m and b are chosen to give the closest vector to ๐ฒ via projection, we know:

๐ฒโˆ’(m๐ฑ+b๐Ÿ)=๐ฒโˆ’projโŸจ๐ฑ,๐ŸโŸฉ(๐ฒ)=๐ฒโˆ’projโŸจ๐ฑ^,๐ŸโŸฉ(๐ฒ)=๐ฒโˆ’๐ฒโ‹…๐ฑ^๐ฑ^โ‹…๐ฑ^๐ฑ^โˆ’๐ฒโ‹…๐Ÿ๐Ÿโ‹…๐Ÿ๐Ÿ=๐ฒโˆ’yโ€พ๐Ÿโˆ’๐ฒโ‹…๐ฑ^๐ฑ^โ‹…๐ฑ^๐ฑ^=๐ฒ^โˆ’proj๐ฑ^(๐ฒ)=๐ฒ^โˆ’proj๐ฑ^(๐ฒ^),

where we have used the fact that y^=๐ฒโ‹…๐Ÿ๐Ÿโ‹…๐Ÿ, as well as the fact that ๐ฒ=๐ฒ^+๐Ÿ and ๐Ÿ is perpendicular to ๐ฑ^ so it can be dropped from the penultimate projection. (Confirm this for yourself!)

Finally, we know that proj๐ฑ^(๐ฒ^) and ๐ฒ^โˆ’proj๐ฑ^(๐ฒ^) are perpendicular. (The former lies in the span of ๐ฑ^ and the latter is perpendicular to this span.) Therefore

|๐ฒ^|2=|๐ฒ^โˆ’proj๐ฑ^(๐ฒ^)+proj๐ฑ^(๐ฒ^)|2=|๐ฒ^โˆ’proj๐ฑ^(๐ฒ^)|2+|proj๐ฑ^(๐ฒ^)|2,|๐ฒ^โˆ’proj๐ฑ^(๐ฒ^)|2=|๐ฒ^|2โˆ’|proj๐ฑ^(๐ฒ^)|2,

and by plugging in we see |๐ฒโˆ’(m๐ฑ+b๐Ÿ)|2=|๐ฒ^|2โˆ’|proj๐ฑ^(๐ฒ)|2.

Problems due Tuesday 12 Mar 2024 by 11:59pm

Problem 07-01

Gram-Schmidt calculation

Perform Gram-Schmidt to find an orthogonal basis for the span of the column vectors of the following matrix:

(โˆ’10137โˆ’1121โˆ’53โˆ’6313โˆ’316โˆ’16โˆ’2521โˆ’5โˆ’7)
Problem 07-02

Distance point to plane

How far is the point ๐ฑ=(1,2,3,4) from the span:

โŸจ(1โˆ’100),(1112)โŸฉ?
Problem 07-03

Decomposition into subspace and its complement

Suppose VโŠ‚โ„n is a subspace. Prove that every vector ๐ฑโˆˆโ„n can be written uniquely as a sum:

๐ฑ=๐ฏ+๐ฐ

such that ๐ฏโˆˆV and ๐ฐ is perpendicular to V (i.e. perpendicular to everything in V).

(Hint: the standard way to show an item is unique is to posit two such items with the specified property, and show that they are equal to each other.)

Problem 07-04

Best fit line, nonzero averages

Find the best fit line for these five data points, and compute the correlation coefficient:

(โˆ’5,โˆ’5),(โˆ’4,3)(โˆ’3,1)(โˆ’2,โˆ’3)(โˆ’1,4).