Theory 1 - Minimum mean square error

Suppose our problem is to estimate or guess or predict the value of a random variable X in a particular run of our experiment. Assume we have the distribution of X. Which value do we choose as our guess?

There is no single best answer to this question. The “best guess” number depends on additional factors in the problem context.

One method is to pick a value where the PMF or PDF of X is maximal. This is a value of highest probability. There may be more than one!

Another method is to pick the expected value E[X]. This value may be impossible!

For the normal distribution, or any symmetrical distribution, these are the same value. For most distributions, though, they are not the same.


Mean square error (MSE)

Given some estimate x^ for a random variable X, the mean square error (MSE) of x^ is:

E[(Xx^)2]

The MSE quantifies the typical (square of the) error. Error here means the difference between the true value X and the estimate x^.

Other error estimates are reasonable and useful in niche contexts. For example, E[|Xx^|] or Max|Xx^|. They are not frequently used, so we do not consider their theory further.


In problem contexts where large errors are more costly than small errors (i.e. many real problems), the most likely value of X (the point with maximal PDF) may fare poorly as an estimate.

It turns out that the expected value E[X] also happens to be the value that minimizes the MSE.

Expected value minimizes MSE

Given a random variable X, its expected value x^=E[X] is the estimate of X with minimal mean square error.

The MSE error for x^=E[X] is:

E[(Xx^)2]=Var[X]

Proof that E[X] minimizes MSE

Expand the MSE error:

E[(Xx^)2]E[X2]2x^E[X]+x^2

Now minimize this parabola. Differentiate:

ddx^E[(Xx^)2]02E[X]+2x^

Find zeros:

02E[X]+2x^=02x^=2E[X]x^=E[x]

When the estimate x^ is made in the absence of information (besides the distribution of X), it is called a blind estimate. Therefore, x^B=E[X] is the blind minimal MSE estimate, and eB=Var[X] is the error of this estimate.

In the presence of additional information, for example that event A is known, then the MSE estimate is x^A=E[X|A] and the error of this estimate is eX|A=Var[X|A].

The MSE estimate can also be conditioned on another variable, say Y:

Minimal MSE of X given Y

The minimal MSE estimate of X given another variable Y:

x^M(y)=E[X|Y=y]

The error of this estimate is Var[X|Y=y], which equals E[(Xx^M(y))2|Y=y].

Notice that the minimal MSE of X given Y can be used to define a random variable:

X^M(Y)=E[X|Y]=x^M(Y)

This is a derived variable from Y given by composing Y with the function x^M.

The variable X^M(Y) provides the minimal MSE estimates of X when experimental outcomes are viewed as providing the information of Y only, and the model is used to derive estimates of X from this information.

Theory 2 - Line of minimal MSE

Linear approximation is very common in applied math.

One could consider the linearization of x^M(y) (its tangent line at some point) instead of the exact function x^M(y).

One could instead minimize the MSE over all linear functions of Y. (Each of which is an RV.) The line with minimal MSE is called the linear estimator.

The difference here is:

  • line of best fit at a single point vs.
  • line of best fit over the whole range of X and Y—weighted by likelihoods

Linear estimator: Line of minimal MSE

Let L(y) be an arbitrary line L(y)=ay+b. Let X^L(Y)=L(Y)=aY+b.

The mean square error (MSE) of this line L is:

eL(a,b)=E[(XX^L(Y))2]

The linear estimator of X in terms of Y is the line Lmin with minimal MSE, and it is:

Lmin(y)=ρX,YσXσY(yμY)+μX

The error value at the (best) linear estimator, eLmin, is:

eLmin=σX2(1ρX,Y2)

Theorem: The error variable of the linear estimator, XX^Lmin(Y), is perfectly uncorrelated with Y.

Slope and ρX,Y

Notice:

X^Lmin(Y)μXσX=ρX,Y(YμYσY)

Thus, for standardized variables X and Y, it turns out ρX,Y is the slope of the linear estimator.

center

In each graph, E[X]=E[Y]=0 and Var[X]=Var[Y]=1.

The line of minimal MSE is the “best fit” line, X^Lmin(Y)=ρX,YY.