Suppose our problem is to estimate or guess or predict the value of a random variable in one run of the experiment. Assume we have the distribution of . Which value do we choose?
There is no single best answer to this question. The best answer is a function of additional factors in the problem context.
One method is to pick a value where the PMF or PDF of is maximal. This is a value of highest probability. (There may be more than one.)
Another method is to pick the expected value .
For the normal distribution, or any symmetrical distribution, these are the same value. For most distributions they are not the same value.
Mean square error
Given an estimate for a random variable , the mean square error (MSE) of is:
The MSE quantifies the typical (square of the) error, meaning the difference between the true value and the estimate . The expected value calculates the typical value of this error.
Other error estimates are reasonable and useful in niche contexts. For example, or . They are not frequently used, so we do not consider their theory further.
In problem contexts where large errors are more costly than small errors (many real problems), the most likely value of (point with maximal PDF) may fare poorly as an estimate.
It turns out the expected value also happens to be the value that minimizes the MSE.
Minimal mean square error
Given a random variable , its expectation provides the estimate with minimal mean square error.
The MSE error itself of :
Proof that gives minimal MSE
Expand the MSE error:
Minimize this parabola. Differentiate:
Find zeros:
When the estimate is made in the absence of information (besides the distribution of ), it is called a blind estimate. Therefore, is the blind minimal MSE estimate, and is the error of this estimate.
In the presence of additional information, namely that event is known, then the MSE estimate is and the error of this estimate is .
The MSE estimate can also be conditioned on another variable, say .
Minimal MSE of given
The minimal MSE estimate of given another variable :
The error of this estimate is , which equals .
Notice that the minimal MSE of given can be used to define a random variable:
This variable is a derived variable of given by post-composition with the function .
The variable provides the minimal MSE estimates of when experimental outcomes are viewed as providing the information of only, and the model is used to derive estimates of from this information.