Deviations

01 Theory - Tail estimation

Theory 2 - Tail estimation

Every distribution must trail off to zero for large enough |X|. The regions where X trails off to zero (large magnitude of X) are informally called ‘tails’.

Tail probabilities

A tail probability is a probability with one of these forms, for any c>0:

P[Xc]P[Xc]P[|XμX|c]

Markov’s inequality

Assume that X0. Take any c>0.

Then the Markov’s inequality states:

P[Xc]μXc

Chebyshev’s inequality

Take any X, and c>0.

Then the Chebyshev’s inequality states:

P[|XμX|c]σX2c2

Markov vs. Chebyshev

Chebyshev’s inequality works for any X, and it usually gives a better estimate than Markov’s inequality.

The main value of Markov’s inequality is that it only requires knowledge of μX.

Think of Chebyshev’s inequality as a tightening of Markov’s inequality using the additional data of σX.

Derivation of Markov’s inequality - Continuous RV

Under the hypothesis that X0 and c>0, we have:

μX=E[x]=0xfX(x)dx=0cxfX(x)dx+cxfX(x)dx

On the range cx< we may convert x to c, making the integrand smaller:

cxfX(x)dxccfX(x)dx

Simplify:

ccfX(x)dxccfX(x)dxcP[Xc]

Also:

0cxfX(x)dx0

Therefore:

0xfX(x)dxcP[Xc]E[x]cP[Xc]

Extra - Derivation of Chebyshev’s inequality

Notice that the variable (XμX)2 is always positive. Chebyshev’s inequality is a simple application of Markov’s inequality to this variable.

Specifically, using c2 as the Markov constant, Markov’s inequality yields:

P[(XμX)2c2]E[(XμX)2]c2

Then, by monotonicity of square roots:

(XμX)2c2|XμX|c

And of course E[(XμX)2]=σX2. Chebyshev’s inequality follows.

Link to original

02 Illustration

Example - Markov and Chebyshev

Markov and Chebyshev

A tire shop has 500 customers per day on average.

(a) Estimate the odds that more than 700 customers arrive today.

(b) Assume the variance in daily customers is 100. Repeat (a) with this information.

Solution

Write X for the number of daily customers.

(a) Using Markov’s inequality with c=700, we have:

P[X700]5007000.71

(b) Using Chebyshev’s inequality with c=200, we have:

P[|X500|200]10020020.0025

The Chebyshev estimate is much smaller!

Link to original

03 Theory - Sample mean

Theory 1 - Sample mean

Sample mean and its variance

The sample mean of a set X1,X2,,Xn of IID random variables is an RV giving the average value:

Mn(X)=1n(X1++Xn)

Statistics of the sample mean:

E[Mn(X)]=E[Xi],Var[Mn(X)]=Var[Xi]n(for any i)

The sample mean is typically applied to repeated trials of an experiment. The trials are independent, and the probability distribution of outcomes should be identical from trial to trial.

Notice that the variance of the sample mean limits to 0 as n. As more trials are repeated, and the average of all results is taken, the fluctuations of this average will shrink toward zero.

As n the distribution of Mn(X) will converge to a PMF with all the probability concentrated at E[Xi] and none elsewhere.

Link to original

Large Numbers

04 Theory - Law of Large Numbers

Theory 3 - Law of Large Numbers

Let X1,X2,,Xn be a collection of IID random variables with μ=E[Xi] for any i and σ2=Var[Xi] for any i.

Recall the sample mean:

Mn(X)=1n(X1++Xn),E[Mn(X)]=μX,Var[Mn(X)]=σX2n

Law of Large Numbers (weak form)

For any c>0, by Chebyshev’s inequality we have:

P[|Mn(X)μX|c]σX2nc2(finite LLN)

Therefore:

limnP[|Mn(X)μX|<c]=1(infinite LLN)
Link to original

05 Illustration

Example - LLN: Average winnings

LLN: Average winnings

A roulette player bets as follows: he wins $100 with probability 0.48 and loses $100 with probability 0.52. The expected winnings after a single round is therefore 1000.481000.52 which equals $4.

By the LLN, if the player plays repeatedly for a long time, he expects to lose $4 per round on average.

The ‘expects’ in the last sentence means: the PMF of the cumulative average winnings approaches this PMF:

PMn(X)(k)={1k=40k4

This is by contrast to the ‘expects’ of expected value: the probability of achieving the expected value (or something near) may be low or zero! For example, a single round of this game cannot result in a $4 loss.

Link to original

Exercise - Enough samples

Enough samples

Suppose X1,X2,,Xn are IID samples of XBer(0.6).

(a) Compute E[Xi] and Var[Xi] and Var[M100(X)].

(b) Use the finite LLN to find α such that:

P[|M100(X)0.6|0.05]α

(c) How many samples n are needed to guarantee that:

P[|Mn(X)0.6|0.1]0.05Link to original