Law of Large Numbers

Markov and Chebyshev

A tire shop has 500 customers per day on average.

(a) Estimate the odds that more than 700 customers arrive today.

(b) Assume the variance in daily customers is 10. Repeat (a) with this information.

Solution

Write for the number of daily customers.

(a) Using Markov’s inequality with , we have:

(b) Using Chebyshev’s inequality with , we have:

The Chebyshev estimate is much smaller!

LLN: Average winnings

A roulette player bets as follows: he wins $100 with probability 0.48 and loses $100 with probability 0.52. The expected winnings after a single round is therefore 100\cdot 0.48 - $100\cdot 0.52-$4$.

By the LLN, if the player plays repeatedly for a long time, he expects to lose 4$ per round on average.

The ‘expects’ in the last sentence means: the PMF of the cumulative average winnings approaches this PMF:

This is by contrast to the ‘expects’ of expected value: the probability of achieving the expected value (or something near) may be low or zero! For example, a single round of this game.

Enough samples

Suppose are IID samples of .

(a) Compute and and .

(b) Use the finite LLN to find such that:

(c) How many samples are needed that to guarantee that:

Statistical testing

One-tail test: Weighted die

Your friend gives you a single regular die, and say she is worried that it has been weighted to prefer the outcome of 2. She wants you to test it.

Design a significance test for the data of 20 rolls of the die to determine whether the die is weighted. Use significance level .

Solution

Let count the number of 2s that come up.

The Claim: “the die is weighted to prefer 2” The null hypothesis : “the die is normal”

Assuming is true, then , and therefore:


⚠️ Notice that “prefer 2” implies the claim is for more 2s than normal.

Therefore: Choose a one-tail rejection set.

Need such that

  • Equivalently:

Solve for by computing conditional CDF values:

01234567
0.0260.1300.3290.5670.7690.8980.9630.989

Therefore, choose . Then and no smaller (integer) will have Type I error below 0.05.

The final answer is:

Two-tail test: Circuit voltage

A boosted AC circuit is supposed to maintain an average voltage of with a standard deviation of . Nothing else is known about the voltage distribution.

Design a two-tail test incorporating the data of 40 independent measurements to determine if the expected value of the voltage is truly . Use .

Solution

Use as the decision statistic, i.e. the sample mean of 40 measurements of .

The Claim to test: The null hypothesis :

Rejection region:

where is chosen so that


Assuming , we expect that:

Recall Chebyshev’s inequality:


Now solve:

Therefore the rejection region should be:

One-tail test with a Gaussian: Weight loss drug

Assume that in the background population in a specific demographic, the distribution of a person’s weight satisfies . Suppose that a pharmaceutical company has developed a weight-loss drug and plans to test it on a group of 64 individuals.

Design a test at the significance level to determine whether the drug is effective.

Solution

Since the drug is tested on 64 individuals, we use the sample mean as the decision statistic.

The Claim: “the drug is effective in reducing weight” The null hypothesis : “no effect: weights on the drug still follow

Assuming is true, then .

⚠️ One-tail test because the drug is expected to reduce weight (unidirectional).

Rejection region:


Compute that .

Since , we know that .


Furthermore:

Then:


Solve:

Therefore, the rejection region: