Theory 1

In order to show why the CLT is true, we introduce the technique of moment generating functions. Recall that the nth moment of a distribution X is simply E[Xn]. Write μn for this value.

Recall the power series for ex:

ex=1+x+12!x2+13!x3+

The function f(x)=ex has the property of being a bijective differentiable map from to >0, and it converts addition to multiplication: ex+y=exey.

Given a random variable X, we can compose X with f(x)=ex to obtain a new variable. Define the moment generating function of X as follows:

MX(t)=E[etX].

This is a function of t and returns values in . It is called the moment generating function because it contains the data of all the higher moments μn. They can be extracted by taking derivatives and evaluating at zero:

MX(t)=1+E[X]t+E[X2]t22!+E[X3]t33!+MX(n)(0)=E[Xn]=μn.

It is reasonable to consider MX(t) as a formal power series in the variable t that has the higher moments for coefficients.

Example - Moment generating function of a standard normal

We compute MZ(t) where Z𝒩(0,1). From the formula for expected value of a function of a random variable, we have:

E[etZ]=12π+etxx2/2dx.

Complete the square in the exponent: txx2/2=12(xt)2+12t2. Thus:

etxx2/2=e12(xt)2e12t2.

The last factor can be taken outside the integral:

E[etZ]=et2/212π+e12(xt)2dx=et2/2=MZ(t).

Exercise - Moment generating function of an exponential variable

Compute MX(t) for XExp(λ).

Moment generating functions have the remarkable property of encoding the distribution itself:

Distributions determined by MGFs

Assume MX(t) and MY(t) both converge. If MX(t)=MY(t), then XY.

Moreover, if MX(t)=MY(t) for any interval of values t(ε,ε), then MX(t)=MY(t) for all t and XY.

Be careful about moments vs. generating functions!

Sometimes the moments all exist, but they grow so fast that the moment generating function does not converge. For example, the log-normal distribution eZ for Z𝒩(0,1) has this property.

The fact above does not apply when this happens.

When moment generating functions approximate each other, their corresponding distributions also approximate each other:

Distributions converge when MGFs converge

Suppose that MXn(t)MX(t) for all t on some interval t(ε,+ε). (In particular, assume that MX(t) converges on some such interval.) Then for any [a,b], we have:

limnP(Xn[a,b])=P(X[a,b]).

Exercise Using an MGF

Suppose X is nonnegative and MX(t)=(12t)3/2 when t<1/2 and MX(t)= when t1/2. Find a bound on P(X>8) using (a) Markov’s Inequality, and (b) Chebyshev’s Inequality.

Theory 2

The main role of moment generating functions in the proof of the CLT is to convert the sum X1++Xn into a product eX1eXn by putting the sum into an exponent.

We have Sn=X1++Xn, and recall Zn=Snnμσn, so E[Zn]=0 and Var(Zn)=1. First, compute the MGF of Zn. We have:

MZn(t)=E[etZn]=E[etSnnμσn]

Exchange the sum in the exponent for a product of exponentials:

exp(tSnnμσn)=exp(tσni=1n(Xiμ))=i=1nexp(tσn(Xiμ))

Now since the Xi are independent, the factors exp(tσn(Xiμ)) are also independent of each other. Use the product rule E[XY]=E[X]E[Y] when X,Y are independent to obtain:

MZn(t)=i=1nE[exp(tσn(Xiμ))]

Now expand the exponential in its Taylor series and use linearity of expectation:

MZn(t)=i=1nE[1+tσn(Xiμ)+12!(tσn)2(Xiμ)2+]=i=1n(1+tσnE[Xiμ]+12!(tσn)2E[(Xiμ)2]+)=i=1n(1+0+t22!n+)i=1n(1+t22n).

We don’t give a complete argument for the final approximation, but a few remarks are worthwhile. For fixed n,σ,μ, and assuming the moments E[(Xiμ)k] have adequately bounded growth in k, the series in each factor converges for all t. Using Taylor’s theorem we could write an error term as a shrinking function of n. The real trick of analysis is to show that in the product of n factors, these error terms shrink fast enough that the limit value is not affected.

In any case, the factors of the last line are independent of n, so we have:

MZn(t)(1+t22n)nnexp(t22)

But et22 is the MGF of Z𝒩(0,1). Therefore MZn(t)MZ(t), so FZn(x)FZ(x).