Theory 1 - Sample mean

Sample mean and its variance

The sample mean of a set X1,X2,,Xn of IID random variables is an RV giving the average value:

Mn(X)=1n(X1++Xn)

Statistics of the sample mean:

E[Mn(X)]=E[Xi],Var[Mn(X)]=Var[Xi]n(for any i)

The sample mean is typically applied to repeated trials of an experiment. The trials are independent, and the probability distribution of outcomes should be identical from trial to trial.

Notice that the variance of the sample mean limits to 0 as n. As more trials are repeated, and the average of all results is taken, the fluctuations of this average will shrink toward zero.

As n the distribution of Mn(X) will converge to a PMF with all the probability concentrated at E[Xi] and none elsewhere.

Theory 2 - Tail estimation

Every distribution must trail off to zero for large enough |X|. The regions where X trails off to zero (large magnitude of X) are informally called ‘tails’.

Tail probabilities

A tail probability is a probability with one of these forms, for any c>0:

P[Xc]P[Xc]P[|XμX|c]

Markov’s inequality

Assume that X0. Take any c>0.

Then the Markov’s inequality states:

P[Xc]μXc

Chebyshev’s inequality

Take any X, and c>0.

Then the Chebyshev’s inequality states:

P[|XμX|c]σX2c2

Markov vs. Chebyshev

Chebyshev’s inequality works for any X, and it usually gives a better estimate than Markov’s inequality.

The main value of Markov’s inequality is that it only requires knowledge of μX.

Think of Chebyshev’s inequality as a tightening of Markov’s inequality using the additional data of σX.

Derivation of Markov’s inequality - Continuous RV

Under the hypothesis that X0 and c>0, we have:

μX=E[x]=0xfX(x)dx=0cxfX(x)dx+cxfX(x)dx

On the range cx< we may convert x to c, making the integrand bigger:

cxfX(x)dxccfX(x)dx

Simplify:

ccfX(x)dxccfX(x)dxcP[Xc]

Also:

0cxfX(x)dx0

Therefore:

0xfX(x)dxcP[Xc]E[x]cP[Xc]

Extra - Derivation of Chebyshev’s inequality

Notice that the variable (XμX)2 is always positive. Chebyshev’s inequality is a simple application of Markov’s inequality to this variable.

Specifically, using c2 as the Markov constant, Markov’s inequality yields:

P[(XμX)2c2]E[(XμX)2]c2

Then, by monotonicity of square roots:

(XμX)2c2|XμX|c

And of course E[(XμX)2]=σX2. Chebyshev’s inequality follows.

Theory 3 - Law of Large Numbers

Let X1,X2,,Xn be a collection of IID random variables with μ=E[Xi] for any i and σ2=Var[Xi] for any i.

Recall the sample mean:

Mn(X)=1n(X1++Xn),E[Mn(X)]=μ,Var[Mn(X)]=σ2n

Law of Large Numbers (weak form)

For any c>0, by Chebyshev’s inequality we have:

P[|Mn(X)μ|c]σ2nc2(finite LLN)

Therefore:

limnP[|Mn(X)μ|<c]=1(infinite LLN)