Theory 1

Video by 3Blue1Brown:

IID variables

Random variables are called independent, identically distributed when they are independent and have the same distribution.

IID variables: Same distribution, different values

Independent variables cannot be correlated, so the values taken by IID variables will disagree on all (most) outcomes.

We do have:

same distributionsame PMF or PDF

Standardization

Suppose X is any random variable.

The standardization of X is:

Z=XμXσX

The variable Z has E[Z]=0 and Var[Z]=1. We can reconstruct X by:

X=σXZ+μX

Suppose X1,X2,,Xn is a collection of IID random variables.

Define:

Sn=i=1nXiZn=Snnμσn

where:

μ=E[Xi]σ2=Var[Xi](every i)

So Zn is the standardization of Sn.

Let Z be a standard normal random variable, Z𝒩(0,1).

Central Limit Theorem

Suppose Sn=i=1nXi for IID variables Xi, and Zn are the standardizations of Sn.

Then for any interval [a,b]:

limnP[aZnb]=Φ(b)Φ(a)

We say that Zn converges in probability to the standard normal Z.


The distribution of a very large sum of IID variables is determined merely by μ and σ2 from the original IID variables, while the data of higher moments fades away.

The name “normal distribution” is used because it arises from a large sum of repetitions of any other kind of distribution. It is therefore ubiquitous in applications.

Misuse of the CLT

It is important to learn when the CLT is applicable and when it is not. Many people (even professionals) apply it wrongly.

For example, sometimes one hears the claim that if enough students take an exam, the distribution of scores will be approximately normal. This is totally wrong!

Intuition for the CLT

The CLT is about the distribution of simultaneity, or (in other words) about accumulated alignment between independent variables.

With a large n, deviations of the total sum are predominantly created by simultaneous (correlated) deviations of a large portion of summands away from their means, rather than the contributions of individual summands deviating a large amount.

Simultaneity across a large n of independent items is described by… the bell curve.

Theory 2

Theory 3

Normal approximations rely on the limit stated in the CLT to approximate probabilities for large sums of variables.

Normal approximation

Let Sn=X1++Xn for IID variables Xi with μ=E[Xi] and σ2=Var[Xi].

The normal approximation of Sn is:

FSn(s)Φ(snμσn)

For example, suppose XiBer(p), so SnBin(n,p). We know μ=p and σ=pq. Therefore:

FSn(s)Φ(snpnpq)

A rule of thumb is that the normal approximation to the binomial is effective when npq>10.

Efficient computation

This CDF is far easier to compute for large n than the CDF of Sn itself. The factorials in (nk) are hard (even for a computer) when n is large, and the summation adds another n factor to the scaling cost.

Theory 4

De Moivre-Laplace Continuity Correction Formula

The normal approximation to a discrete distribution, for integers a and b close together, should be improved by adding 0.5 to the range on either side:

P[aSnb]P[a0.5σnZ+nμb+0.5]Φ(b+0.5nμσn)Φ(a0.5nμσn)