Summations

01 Theory

Theory 1

In many contexts it is useful to consider random variables that are summations of a large number of variables.

Summation formulas: E[X] and Var[X]

Suppose X is a large sum of random variables:

X=X1+X2++Xn

Then:

E[X]=E[X1]+E[X2]++E[Xn]Var[X]=Var[X1]++Var[Xn]+2i<jCov[Xi,Xj]

If Xi and Xj are uncorrelated (e.g. if they are independent):

Var[X]=Var[X1]++Var[Xn]

Extra - Derivation of variance of a sum

Using the definition:

Var[X1++Xn]=E[(X1++Xn(μX1++μXn))2]=E[((X1μX1)++(XnμXn))2]=E[i,j(XiμXi)(XjμXj)]=i,jCov(Xi,Xj)=iVar[Xi]+2i<jCov[Xi,Xj]

In the last line we use the fact that Cov[X,X]=Var[X] for the first term, and the symmetry property of covariance for the second term with the factor of 2.

Link to original

02 Illustration

Example - Binomial expectation and variance

Binomial expectation and variance

Suppose we have repeated Bernoulli trials X1,,Xn with XiBer(p).

The sum is a binomial variable: Sn=i=1nXi.

We know E[Xi]=p and Var[Xi]=pq.

The summation rule for expectation:

E[Sn]=i=1nE[Xi]i=1npnp

The summation rule for variance:

Var[Sn]=i=1nVar[Xi]+2i<jCov[Xi,Xj]i=1npq+20npq Link to original

Example - Pascal expectation and variance

Pascal expectation and variance

(1) Let XPasc(,p).

Let X1,X2, be independent random variables, where:

  • X1 counts the trials until the first success
  • X2 counts the trials after the first success until the second success
  • Xi counts the trials after the (i1)th success until the ith success

Observe that X=i=1Xi.


(2) Notice that XiGeom(p) for every i. Therefore:

E[Xi]=1pVar[Xi]=1pp2

(3) Using the summation rule, conclude:

E[X]i=11ppVar[X]i=1qp2qp2Link to original

Example - Multinomial covariances

Multinomial covariances

Each trial of an experiment has possible outcomes labeled 1,,r with probabilities of occurrence p1,,pr. The experiment is run n times.

Let Xi count the number of occurrences of outcome i. So XiBin(n,pi).

Find Cov[Xi,Xj].

Solution

Notice that Xi+Xj is also a binomial variable with success probability p=pi+pj. (‘Success’ is an outcome of either i or j. ‘Failure’ is any other value.)

The variance of a binomial is known to be npq.

Compute Cov[Xi,Xj] by solving:

Var[Xi+Xj]=Var[Xi]+Var[Xj]+2Cov[Xi,Xj]n(pi+pj)(1(pi+pj))=npi(1pi)+npj(1pj)+2Cov[Xi,Xj]Cov[Xi,Xj]=npipj Link to original

Months with a birthday

Months with a birthday

Suppose study groups of 10 are formed from a large population.

For a typical study group, how many months out of the year contain a birthday of a member of the group? (Assume all 12 months have equal duration.)

Solution

(1) Let Xi be 1 if month i contains a birthday, and 0 otherwise.

So we seek E[X1++X12]. This equals E[X1]++E[X12].

The answer will be 12E[Xi] because all terms are equal.


(2) For a given i:

P[no birthday in month i]=(1112)10

The complement event:

P[at least one birthday in month i]=1(1112)10

(3) Therefore:

12E[Xi]=12(1(1112)10)6.97 Link to original

Example - Hats in the air

Hats in the air

All n sailors throws their hats in the air, and catch a random hat when they fall back down.

(a) How many sailors do you expect will catch the hat they own?

(b) What is the variance of this number?

Solution

Strangely, the answers are both 1, regardless of the number of sailors. Here is the reasoning:

(a) Let Xi=1 when sailor i catches their own hat, and Xi=0 otherwise. Thus Xi is Bernoulli with p=1/n.

Now X=i=1nXi counts the total number of hats caught by their owners.

Note that E[Xi]=1/n. Therefore:

E[X]E[i=1nXi]i=1nE[Xi]i=1n1n1

(b) We know:

Var[X]=i=1nVar[Xi]+2i<jCov[Xi,Xj]

Now calculate Var[Xi]:

Use Var[Xi]=E[Xi2]E[Xi]2. Observe that Xi2=Xi. Therefore:

Var[Xi]1n1n2n1n2

Now calculate Cov[Xi,Xj]:

Cov[Xi,Xj]=E[XiXj]E[Xi]E[Xj]

We need to compute E[XiXj].

Notice that XiXj=1 when i and j both catch their own hats, and 0 otherwise. So it is Bernoulli. Then:

P[Xi=1andXj=1]1n(n1)E[XiXj]=1n(n1)

Therefore:

Cov[Xi,Xj]1n(n1)1n1n1n2(n1)

Putting everything together:

Var[X]i=1nVar[Xi]+2i<jCov[Xi,Xj]i=1nn1n2+2i<j1n2(n1)n1n+n(n1)1n2(n1)1 Link to original

Central Limit Theorem

03 Theory

Theory 1

Video by 3Blue1Brown:

IID variables

Random variables are called independent, identically distributed when they are independent and have the same distribution.

IID variables: Same distribution, different values

Independent variables cannot be correlated, so the values taken by IID variables will disagree on all (most) outcomes.

We do have:

same distributionsame PMF or PDF

Standardization

Suppose X is any random variable.

The standardization of X is:

Z=XμXσX

The variable Z has E[Z]=0 and Var[Z]=1. We can reconstruct X by:

X=σXZ+μX

Suppose X1,X2,,Xn is a collection of IID random variables.

Define:

Sn=i=1nXiZn=Snnμσn

where:

μ=E[Xi]σ2=Var[Xi](every i)

So Zn is the standardization of Sn.

Let Z be a standard normal random variable, Z𝒩(0,1).

Central Limit Theorem

Suppose Sn=i=1nXi for IID variables Xi, and Zn are the standardizations of Sn.

Then for any interval [a,b]:

limnP[aZnb]=Φ(b)Φ(a)

We say that Zn converges in probability to the standard normal Z.


The distribution of a very large sum of IID variables is determined merely by μ and σ2 from the original IID variables, while the data of higher moments fades away.

The name “normal distribution” is used because it arises from a large sum of repetitions of any other kind of distribution. It is therefore ubiquitous in applications.

Misuse of the CLT

It is important to learn when the CLT is applicable and when it is not. Many people (even professionals) apply it wrongly.

For example, sometimes one hears the claim that if enough students take an exam, the distribution of scores will be approximately normal. This is totally wrong!

Intuition for the CLT

The CLT is about the distribution of simultaneity, or (in other words) about accumulated alignment between independent variables.

With a large n, deviations of the total sum are predominantly created by simultaneous (correlated) deviations of a large portion of summands away from their means, rather than the contributions of individual summands deviating a large amount.

Simultaneity across a large n of independent items is described by… the bell curve.

Link to original

04 Illustration

Exercise - Test scores distribution

Test scores distribution

Explain what is wrong with the claim that test scores should be normally distributed when a large number of students take a test.

Can you imagine a scenario with a good argument that test scores would be normally distributed?

(Hint: think about the composition of a single test instead of the number of students taking the test.)

Link to original

Exercise - Height follows a bell curve

Height follows a bell curve

The height of female American basketball players follows a bell curve. Why?

Link to original

05 Theory - extra

Theory 2

Link to original

06 Theory

Theory 3

Normal approximations rely on the limit stated in the CLT to approximate probabilities for large sums of variables.

Normal approximation of binomial

Let Sn=X1++Xn for IID variables Xi with μ=E[Xi] and σ2=Var[Xi].

The normal approximation of Sn is:

FSn(s)Φ(snμσn)

For example, suppose XiBer(p), so SnBin(n,p). We know μ=p and σ=pq. Therefore:

FSn(s)Φ(snpnpq)

A rule of thumb is that the normal approximation to the binomial is effective when npq>10.

Efficient computation

This CDF is far easier to compute for large n than the CDF of Sn itself. The factorials in (nk) are hard (even for a computer) when n is large, and the summation adds another n factor to the scaling cost.

Link to original

07 Illustration

Example - Binomial estimation: 10,000 flips

Binomial estimation: 10,000 flips

Flip a fair coin 10,000 times. Write H for the number of heads.

Estimate the probability that 4850<H<5100.

Solution

(1) Check the rule of thumb: p=q=0.5 and n=10,000, so npq=250010 and the approximation is effective.


(2) Now, calculate needed quantities:

μ=E[Xi]μ=0.5nμ=5000σ2=Var[Xi]σ=0.5σn=50

(3) Set up CDF:

FH(h)=Φ(h500050)

(4) Compute desired probability:

P[4850<H<5100]=FH(5100)FH(4850)Φ(10050)Φ(15050)Φ(2)Φ(3)0.9772(10.9987)0.9759 Link to original

Example - Summing 1000 dice

Summing 1000 dice

Suppose 1,000 dice are rolled.

Estimate the probability that the total sum of rolled numbers is more than 3,600.

Solution

(1) Let Xi be the number rolled on the ith die.

Let S=i=1nXi, so S sums up the rolled numbers.

We seek P[S3600].


(2) Now, calculate needed quantities:

μ=E[Xi]μ=7/2nμ=3500σ2=Var[Xi]σ=3512σn=3500012

(3) Set up CDF:

FS(s)Φ(s35003500012)

(4) Compute desired probability:

P[S3600]=1FS(3600)1Φ(10054.01)1Φ(1.852)0.968 Link to original

Exercise - Estimating S1000

Estimating S1000

The odds of a random poker hand containing one pair is 0.42.

Estimate the probability that at least 450 out of 1000 poker hands will contain one pair.

Link to original

Exercise - Nutrition study

Nutrition study

A nutrition review board will endorse a diet if it has any positive effect in at least 65% of those tested in a certain study with 100 participants.

Suppose the diet is bogus, but 50% of participants display some positive effect by pure chance.

What is the probability that it will be endorsed?

Answer

0.0019=1Φ(2.9)

Link to original