All sailors throws their hats in the air, and catch a random hat when they fall down.
How many sailors do you expect will catch the hat they own?
What is the variance of this number?
Solution
Strangely, the answers are both 1, regardless of the number of sailors. Here is the reasoning:
(1) Let be an indicator of sailor catching their own hat. So when sailor catches their own hat, and otherwise. Thus is Bernoulli with success probability .
Then counts the total number of hats caught by original owners.
(2) Note that .
Therefore:
(3) Similarly:
We need and .
(4) Use . Observe that . Therefore:
(5) Now for covariance:
We need to compute .
Notice that when and both catch their own hats, and 0 otherwise.
Random variables are called independent, identically distributed when they are independent and have the same distribution.
IID variables: Same distribution, different values
Independent variables cannot be correlated, so the values taken by IID variables will disagree on all (most) outcomes.
We do have:
Standardization
Suppose is any random variable.
The standardization of is:
The variable has and . We can reconstruct by:
Suppose is a collection of IID random variables.
Define:
where:
So is the standardization of .
Let be a standard normal random variable, .
Central Limit Theorem
Suppose for IID variables , and are the standardizations of .
Then for any interval :
We say that converges in probability to the standard normal .
The distribution of a very large sum of IID variables is determined merely by and from the original IID variables, while the data of higher moments fades away.
The name “normal distribution” is used because it arises from a large sum of repetitions of any other kind of distribution. It is therefore ubiquitous in applications.
Misuse of the CLT
It is important to learn when the CLT is applicable and when it is not. Many people (even professionals) apply it wrongly.
For example, sometimes one hears the claim that if enough students take an exam, the distribution of scores will be approximately normal. This is totally wrong!
Intuition for the CLT
The CLT is about the distribution of simultaneity, or (in other words) about accumulated alignment between independent variables.
With a large , deviations of the total sum are predominantly created by simultaneous (correlated) deviations of a large portion of summands away from their means, rather than the contributions of individual summands deviating a large amount.
Simultaneity across a large of independent items is described by… the bell curve.
Normal approximations rely on the limit stated in the CLT to approximate probabilities for large sums of variables.
Normal approximation
Let for IID variables with and .
The normal approximation of is:
For example, suppose , so . We know and . Therefore:
A rule of thumb is that the normal approximation to the binomial is effective when .
Efficient computation
This CDF is far easier to compute for large than the CDF of itself. The factorials in are hard even for a computer when is large, and the summation adds another factor to the scaling cost.