Theory 1

Write μX=E[X] and μY=E[Y].

Observe that the random variables XμX and YμY are “centered at zero,” meaning that E[XμX]=0=E[YμY].

Covariance

Suppose X and Y are any two random variables on a probability model. The covariance of X and Y measures the typical synchronous deviation of X and Y from their respective means.

Then the defining formula for covariance of X and Y is:

Cov[X,Y]=E[(XμX)(YμY)]

There is also a shorter formula:

Cov[X,Y]=E[XY]E[X]E[Y]

To derive the shorter formula, first expand the product (XμX)(YμY) and then apply linearity.

Notice that covariance is always symmetric:

Cov[X,Y]=Cov[Y,X]

The self covariance equals the variance:

Cov[X,X]=Var[X]

The sign of Cov[X,Y] reveals the correlation type between X and Y:

CorrelationSign
Positively correlatedCov[X,Y]>0
Negatively correlatedCov[X,Y]<0
UncorrelatedCov[X,Y]=0

Correlation coefficient

Suppose X and Y are any two random variables on a probability model.

Their correlation coefficient is a rescaled version of covariance that measures the synchronicity of deviations:

ρ[X,Y]=Cov[X,Y]σXσY=Cov[XσX,YσY]

The rescaling ensures:

1ρX,Y+1

center

Covariance depends on the separate variances of X and Y as well as their relationship.

Correlation coefficient, because we have divided out σXσY, depends only on their relationship.

Theory 2

Covariance bilinearity

Given any three random variables X, Y, and Z, we have:

Cov[X+Y,Z]=Cov[X,Z]+Cov[Y,Z]Cov[Z,X+Y]=Cov[Z,X]+Cov[Z,Y]

Covariance and correlation: shift and scale

Covariance scales with each input, and ignores shifts:

Cov[aX+b,Y]=aCov[X,Y]=Cov[X,aY+b]

Whereas shift or scale in correlation only affects the sign:

ρ[aX+b,Y]=sign(a)ρ[X,Y]=ρ[X,aY+b]

Extra - Proof of covariance bilinearity

Cov[X+Y,Z]E[(X+Y(μX+μY))(ZμZ)]E[(XμX+YμY)(ZμZ)]E[(XμX)(ZμZ)]+E[(YμY)(ZμZ)]Cov[X,Z]+Cov[Y,Z]

Extra - Proof of covariance shift and scale rule

Cov[aX+b,Y]E[(aX+b)Y]E[aX+b]E[Y]E[aXY+bY]aE[X]E[Y]E[b]E[Y]aE[XY]+bE[Y]aE[X]E[Y]bE[Y]a(E[XY]E[X]E[Y])

Independence implies zero covariance

Suppose that X and Y are any two random variables on a probability model.

If X and Y are independent, then:

Cov[X,Y]=0

Proof:

We know both of these:

E[XY]=E[X]E[Y](independence)Cov[X,Y]=E[XY]μXμY(shorter form)

But E[XY]=E[X]E[Y]=μXμY, so those terms cancel and Cov[X,Y]=0.

Sum rule for variance

Suppose that X and Y are any two random variables on a probability space.

Then:

Var[X+Y]=Var[X]+Var[Y]+2Cov[X,Y]

When X and Y are independent:

Var[X+Y]=Var[X]+Var[Y]

Extra - Proof: Sum rule for variance

Var[X+Y]E[(X+Y(μX+μY))2]E[((XμX)+(YμY))2]E[(XμX)2+(YμY)2+2(XμX)(YμY)]Var[X]+Var[Y]+2Cov[X,Y]

Extra - Proof that 1ρ+1

(1) Create standardizations:

X~=XμXσX,Y~=YμYσY

Now X~ and Y~ satisfy:

E[X~]=0=E[Y~]andVar[X~]=1=Var[Y~]

Observe that Var[W]0 for any W. Variance can’t be negative.


(2) Apply the variance sum rule.

Apply to X~ and Y~:

0Var[X~+Y~]=Var[X~]+Var[Y~]+2Cov[X~,Y~]

Simplify:

1+1+2Cov[X~,Y~]01+Cov[X~,Y~]0

Notice effect of standardization:

Cov[X~,Y~]=ρ[X,Y]

Therefore ρ[X,Y]1.


(3) Modify and reapply variance sum rule.

Change to X~Y~:

0Var[X~Y~]=Var[X~]+Var[Y~]+2Cov[X~,Y~]

Simplify:

1+12Cov[X~,Y~]01Cov[X~,Y~]0