Theory 1
Video by 3Blue1Brown:
IID variables
Random variables are called independent, identically distributed when they are independent and have the same distribution.
IID variables: Same distribution, different values
Independent variables cannot be correlated, so the values taken by IID variables will disagree on all (most) outcomes.
We do have:
Standardization
Suppose
is any random variable. The standardization of
is: The variable
has and . We can reconstruct by:
Suppose
Define:
where:
So
Let
Central Limit Theorem
Suppose
for IID variables , and are the standardizations of . Then for any interval
: We say that
converges in probability to the standard normal .
The distribution of a very large sum of IID variables is determined merely by
The name “normal distribution” is used because it arises from a large sum of repetitions of any other kind of distribution. It is therefore ubiquitous in applications.
Misuse of the CLT
It is important to learn when the CLT is applicable and when it is not. Many people (even professionals) apply it wrongly.
For example, sometimes one hears the claim that if enough students take an exam, the distribution of scores will be approximately normal. This is totally wrong!
Intuition for the CLT
The CLT is about the distribution of simultaneity, or (in other words) about accumulated alignment between independent variables.
With a large
, deviations of the total sum are predominantly created by simultaneous (correlated) deviations of a large portion of summands away from their means, rather than the contributions of individual summands deviating a large amount. Simultaneity across a large
of independent items is described by… the bell curve.
Theory 2
Extra - Moment Generating Functions
Theory 1
In order to show why the CLT is true, we introduce the technique of moment generating functions. Recall that the
moment of a distribution is simply . Write for this value. Recall the power series for
: The function
has the property of being a bijective differentiable map from to , and it converts addition to multiplication: . Given a random variable
, we can compose with to obtain a new variable. Define the moment generating function of as follows: This is a function of
and returns values in . It is called the moment generating function because it contains the data of all the higher moments . They can be extracted by taking derivatives and evaluating at zero: It is reasonable to consider
as a formal power series in the variable that has the higher moments for coefficients. Example - Moment generating function of a standard normal
We compute
where . From the formula for expected value of a function of a random variable, we have: Complete the square in the exponent:
. Thus: The last factor can be taken outside the integral:
Exercise - Moment generating function of an exponential variable
Compute
for . Moment generating functions have the remarkable property of encoding the distribution itself:
Distributions determined by MGFs
Assume
and both converge. If , then . Moreover, if
for any interval of values , then for all and . Be careful about moments vs. generating functions!
Sometimes the moments all exist, but they grow so fast that the moment generating function does not converge. For example, the log-normal distribution
for has this property. The fact above does not apply when this happens.
When moment generating functions approximate each other, their corresponding distributions also approximate each other:
Distributions converge when MGFs converge
Suppose that
for all on some interval . (In particular, assume that converges on some such interval.) Then for any , we have: Link to originalExercise Using an MGF
Suppose
is nonnegative and when and when . Find a bound on using (a) Markov’s Inequality, and (b) Chebyshev’s Inequality.
Extra - Derivation of CLT
Theory 2
The main role of moment generating functions in the proof of the CLT is to convert the sum
into a product by putting the sum into an exponent. We have
, and recall , so and . First, compute the MGF of . We have: Exchange the sum in the exponent for a product of exponentials:
Now since the
are independent, the factors are also independent of each other. Use the product rule when are independent to obtain: Now expand the exponential in its Taylor series and use linearity of expectation:
We don’t give a complete argument for the final approximation, but a few remarks are worthwhile. For fixed
, and assuming the moments have adequately bounded growth in , the series in each factor converges for all . Using Taylor’s theorem we could write an error term as a shrinking function of . The real trick of analysis is to show that in the product of factors, these error terms shrink fast enough that the limit value is not affected. In any case, the factors of the last line are independent of
, so we have: But
Link to originalis the MGF of . Therefore , so .
Theory 3
Normal approximations rely on the limit stated in the CLT to approximate probabilities for large sums of variables.
Normal approximation
Let
for IID variables with and . The normal approximation of
is:
For example, suppose
A rule of thumb is that the normal approximation to the binomial is effective when
Efficient computation
This CDF is far easier to compute for large
than the CDF of itself. The factorials in are hard (even for a computer) when is large, and the summation adds another factor to the scaling cost.
Theory 4
De Moivre-Laplace Continuity Correction Formula
The normal approximation to a discrete distribution, for integers
and close together, should be improved by adding 0.5 to the range on either side: