Summations
01 Theory
In many contexts it is useful to consider random variables that are summations of a large number of variables.
Summation formulas:
and Suppose
is a large sum of random variables: Then:
If
and are uncorrelated (e.g. if they are independent):
Extra - Derivation of variance of a sum
Using the definition:
In the last line we use the fact that
for the first term, and the symmetry property of covariance for the second term with the factor of 2.
02 Illustration
Example - Binomial expectation and variance
Example - Pascal expectation and variance
Example - Multinomial covariances
Example - Hats in the air
Months with a birthday
Central Limit Theorem
03 Theory
IID variables
Random variables are called independent, identically distributed when they are independent and have the same distribution.
IID variables: Same distribution, different values
Independent variables cannot be correlated, so the values taken by IID variables will disagree on all (most) outcomes.
We do have:
Standardization
Suppose
is any random variable. The standardization of
is:
The variable
has and . We can reconstruct by:
Suppose
Define:
where:
So
Let
Central Limit Theorem
Suppose
for IID variables , and are the standardizations of . Then for any interval
: We say that
converges in probability to the standard normal .
Here is a good explainer video by 3blue1brown.
The distribution of a very large sum of IID variables is determined merely by
The name “normal distribution” is used because it arises from a large sum of repetitions of any other kind of distribution. It is therefore ubiquitous in applications.
Misuse of the CLT
It is important to learn when the CLT is applicable and when it is not. Many people (even professionals) apply it wrongly.
For example, sometimes one hears the claim that if enough students take an exam, the distribution of scores will be approximately normal. This is totally wrong!
Intuition for the CLT
The CLT is about the distribution of simultaneity, or (in other words) about accumulated alignment between independent variables.
With a large
, deviations of the total sum are predominantly created by simultaneous (correlated) deviations of a large portion of summands away from their means, rather than the contributions of individual summands deviating a large amount. Simultaneity across a large
of independent items is described by… the bell curve.
Extra - Derivation of CLT
04 Illustration
Exercise - Test scores distribution
Exercise - Height follows a bell curve
05 Theory
Normal approximations rely on the limit stated in the CLT to approximate probabilities for large sums of variables.
Normal approximation
Let
for IID variables with and . The normal approximation of
is:
For example, suppose
A rule of thumb is that the normal approximation to the binomial is effective when
Efficient computation
This CDF is far easier to compute for large
than the CDF of itself. The factorials in are hard even for a computer when is large, and the summation adds another factor to the scaling cost.
06 Illustration
Example - Binomial estimation: 10,000 flips
Example - Summing 1000 dice
Exercise - Estimating
The odds of a random poker hand containing one pair is 0.42.
Estimate the probability that at least 450 out of 1000 poker hands will contain one pair.
Exercise - Nutrition study
07 Theory
De Moivre-Laplace Continuity Correction Formula
The normal approximation to a discrete distribution, for integers
and close together, should be improved by adding 0.5 to the range on either side:
08 Illustration
Example - Continuity correction of absurd normal approximation