Deviation and Large Numbers

01 Theory - Sample mean

Sample mean and its variance

The sample mean of a set of IID random variables is an RV that averages the first instances:

Statistics of the sample mean (for any ):

The sample mean is typically applied to repeated trials of an experiment. The trials are independent, and the probability distribution of outcomes should be identical from trial to trial.

Notice that the variance of the sample mean limits to 0 as . As more trials are repeated, and the average of all results is taken, the fluctuations of this average will shrink toward zero.

As the distribution of will converge to a PMF with all the probability concentrated at and none elsewhere.

02 Theory - Tail estimation

Every distribution must trail off to zero for large enough . The regions where trails off to zero (large magnitude of ) are informally called ‘tails’.

Tail probabilities

A tail probability is a probability with one of these forms:

Markov’s inequality

Assume that . Take any .

Then the Markov’s inequality states:

Chebyshev’s inequality

Take any , and .

Then the Chebyshev’s inequality states:

Markov vs. Chebyshev

Chebyshev’s inequality works for any , and it usually gives a better estimate than Markov’s inequality.

The main value of Markov’s inequality is that it only requires knowledge of .

Think of Chebyshev’s inequality as a tightening of Markov’s inequality using the additional data of .

Derivation of Markov’s inequality - Continuous RV

Under the hypothesis that and , we have:

On the range we may convert to , making the integrand bigger:

Simplify:

Also:

Therefore:

Extra - Derivation of Chebyshev’s inequality

Notice that the variable is always positive. Chebyshev’s inequality is a simple application of Markov’s inequality to this variable.

Specifically, using as the Markov constant, Markov’s inequality yields:

Then, by monotonicity of square roots:

And of course . Chebyshev’s inequality follows.

03 Illustration

Markov’s inequality derivation - Discrete RV

Derive Markov’s inequality for a discrete RV.

Example - Markov and Chebyshev

Markov and Chebyshev

04 Theory - Law of Large Numbers

Let be a collection of IID random variables with for any , and for any .

Recall the sample mean:

Recall that .

Law of Large Numbers (weak form)

For any , by Chebyshev’s inequality we have:

Therefore:

And the complement:

05 Illustration

Example - LLN: Average winnings

LLN: Average winnings

Exercise - Enough samples

Enough samples

Statistical testing

06 Theory - Significance testing

Significance test

Ingredients of a significance test (unary hypothesis test):

  • Null hypothesis event
    • Identify a Claim
    • Then: is background assumption (supposing Claim isn’t known)
    • Goal is to invalidate in favor of Claim
  • Rejection Region (decision rule): an event
    • is unlikely assuming
    • Directionality: is more likely if Claim
    • Write in terms of decision statistic and significance level
  • Ability to compute
    • Usually: inferred from or
    • Adjust to achieve

Significance level

Suppose we are given a null hypothesis and a rejection region .

The significance level of is:

Sometimes the condition is dropped and we write , e.g. when a background model without assuming is not known.

Null hypothesis implies a distribution

Frequently will not take the form of an event in a sample space, .

Usually is unspecified, yet determines a known distribution.

At a minimum, the assumption of must determine numbers .

More generally, we do not need these details:

  • Background sample space
  • Non-conditional distribution (full model): or
  • Complement conditionals: or

In basic statistical inference theory, there are two kinds of error.

  • Type I error concludes with rejecting when is true.
  • Type II error concludes with maintaining when is false.

Type I error is usually a bigger problem. We want to consider “innocent until proven guilty.”

 is true is false
Maintain null hypothesisMade right callWrong acceptance
Reject null hypothesisWrong rejection
Made right call

To design a significance test at , we must identify , and specify with the property that .

When is written using a variable , we must choose between:

  • One-tail rejection region: with or with
  • Two-tail rejection region: with

07 Illustration

Example - One-tail test: Weighted die

One-tail test: Weighted die

Two-tail test: Circuit voltage

Two-tail test: Circuit voltage

One-tail test with a Gaussian: Weight loss drug

One-tail test with a Gaussian: Weight loss drug