Summations

01 Theory

Theory 1

In many contexts it is useful to consider random variables that are summations of a large number of variables.

Summation formulas: and

Suppose is a large sum of random variables:

Then:

If and are uncorrelated (e.g. if they are independent):

Extra - Derivation of variance of a sum

Using the definition:

In the last line we use the fact that for the first term, and the symmetry property of covariance for the second term with the factor of 2.

Link to original

02 Illustration

Example - Binomial expectation and variance

Binomial expectation and variance

(1) Suppose we have repeated Bernoulli trials with .

The sum is a binomial variable: .


(2) We know and .

The summation rule for expectation:


(3) The summation rule for variance:

Link to original

Example - Pascal expectation and variance

Pascal expectation and variance

(1) Let .

Let be independent random variables, where:

  • counts the trials until the first success
  • counts the trials after the first success until the second success
  • counts the trials after the success until the success

Observe that .


(2) Notice that for every . Therefore:


(3) Using the summation rule, conclude:

Link to original

Example - Multinomial covariances

Multinomial covariances

Each trial of an experiment has possible outcomes labeled with probabilities of occurrence . The experiment is run times.

Let count the number of occurrences of outcome . So .

Find .

Solution

Notice that is also a binomial variable with success probability . (‘Success’ is an outcome of either or .)

The variance of a binomial is known to be for whatever relevant and .

So we compute by solving:

Link to original

Example - Hats in the air

Hats in the air

All sailors throws their hats in the air, and catch a random hat when they fall down.

How many sailors do you expect will catch the hat they own? What is the variance of this number?

Solution

Strangely, the answers are both 1, regardless of the number of sailors. Here is the reasoning:

(1) Let be an indicator of sailor catching their own hat. So when sailor catches their own hat, and otherwise. Thus is Bernoulli with success probability .

Then counts the total number of hats caught by original owners.


(2) Note that .

Therefore:


(3) Similarly:

We need and .


(4) Use . Observe that . Therefore:


(5) Now for covariance:

We need to compute .

Notice that when and both catch their own hats, and 0 otherwise.

We have:

Therefore:


(6) Putting everything together back in (1):

Link to original

Months with a birthday

Months with a birthday

Suppose study groups of 10 are formed from a large population.

For a typical study group, how many months out of the year contain a birthday of a member of the group? (Assume the 12 months have equal duration.)

Solution

(1) Let be 1 if month contains a birthday, and 0 otherwise.

So we seek . This equals .

The answer will be because all terms are equal.


(2) For a given :

The complement event:


(3) Therefore:

Link to original

Central Limit Theorem

03 Theory

Theory 1

Video by 3Blue1Brown:

IID variables

Random variables are called independent, identically distributed when they are independent and have the same distribution.

IID variables: Same distribution, different values

Independent variables cannot be correlated, so the values taken by IID variables will disagree on all (most) outcomes.

We do have:

Standardization

Suppose is any random variable.

The standardization of is:

The variable has and . We can reconstruct by:


Suppose is a collection of IID random variables.

Define:

where:

So is the standardization of .

Let be a standard normal random variable, .

Central Limit Theorem

Suppose for IID variables , and are the standardizations of .

Then for any interval :

We say that converges in probability to the standard normal .


The distribution of a very large sum of IID variables is determined merely by and from the original IID variables, while the data of higher moments fades away.

The name “normal distribution” is used because it arises from a large sum of repetitions of any other kind of distribution. It is therefore ubiquitous in applications.

Misuse of the CLT

It is important to learn when the CLT is applicable and when it is not. Many people (even professionals) apply it wrongly.

For example, sometimes one hears the claim that if enough students take an exam, the distribution of scores will be approximately normal. This is totally wrong!

Intuition for the CLT

The CLT is about the distribution of simultaneity, or (in other words) about accumulated alignment between independent variables.

With a large , deviations of the total sum are predominantly created by simultaneous (correlated) deviations of a large portion of summands away from their means, rather than the contributions of individual summands deviating a large amount.

Simultaneity across a large of independent items is described by… the bell curve.

Extra - Derivation of CLT

Link to original

04 Illustration

Exercise - Test scores distribution

Test scores distribution

Explain what is wrong with the claim that test scores should be normally distributed when a large number of students take a test.

Can you imagine a scenario with a good argument that test scores would be normally distributed?

(Hint: think about the composition of a single test instead of the number of students taking the test.)

Link to original

Exercise - Height follows a bell curve

Height follows a bell curve

The height of female American basketball players follows a bell curve. Why?

Link to original

05 Theory

Theory 2

Normal approximations rely on the limit stated in the CLT to approximate probabilities for large sums of variables.

Normal approximation

Let for IID variables with and .

The normal approximation of is:

For example, suppose , so . We know and . Therefore:

A rule of thumb is that the normal approximation to the binomial is effective when .

Efficient computation

This CDF is far easier to compute for large than the CDF of itself. The factorials in are hard even for a computer when is large, and the summation adds another factor to the scaling cost.

Link to original

06 Illustration

Example - Binomial estimation: 10,000 flips

Binomial estimation: 10,000 flips

Flip a fair coin 10,000 times. Write for the number of heads.

Estimate the probability that .

Solution

(1) Check the rule of thumb: and , so and the approximation is effective.


(2) Now, calculate needed quantities:


(3) Set up CDF:


(4) Compute desired probability:

Link to original

Example - Summing 1000 dice

Summing 1000 dice

About 1,000 dice are rolled.

Estimate the probability that the total sum of rolled numbers is more than 3,600.

Solution

(1) Let be the number rolled on the die.

Let , so counts the total sum of rolled numbers.

We seek .


(2) Now, calculate needed quantities:


(3) Set up CDF:


(4) Compute desired probability:

Link to original

Exercise - Estimating

Estimating S1000

The odds of a random poker hand containing one pair is 0.42.

Estimate the probability that at least 450 out of 1000 poker hands will contain one pair.

Link to original

Exercise - Nutrition study

Nutrition study

A nutrition review board will endorse a diet if it has any positive effect in at least 65% of those tested in a certain study with 100 participants.

Suppose the diet is bogus, but 50% of participants display some positive effect by pure chance.

What is the probability that it will be endorsed?

Answer

Link to original

07 Theory

Theory 3

De Moivre-Laplace Continuity Correction Formula

The normal approximation to a discrete distribution, for integers and close together, should be improved by adding 0.5 to the range on either side:

Link to original

08 Illustration

Example - Continuity correction of absurd normal approximation

Continuity correction of absurd normal approximation

Let denote the number of sixes rolled after rolls of a fair die. Estimate .

Solution

We have , and and .

The usual approximation, since is continuous, gives an estimate of 0, which is useless.

Now using the continuity correction:

The exact solution is 0.0318, so this estimate is quite good: the error is 1.9%.

Link to original