Statistical testing cont’d

01 Theory - Binary testing, MAP and ML

Binary hypothesis test

Ingredients of a binary hypothesis test:

  • Complementary hypotheses and
    • Maybe also know the prior probabilities and
    • Goal: determine which case we are in, or
  • Decision rule made of complementary events and
    • is likely given , while is likely given
    • Decision rule: outcome , accept ; outcome , accept
    • Usually: written in terms of decision statistic using a design
    • We cover three designs:
      • MAP and ML (minimize ‘error probability’)
      • MC (minimizes ‘error cost’)
    • Designs use and (or , ) to construct and

MAP design

Suppose we know:

  • Both prior probabilities and
  • Both conditional distributions and (or and )

The maximum a posteriori probability (MAP) design for a decision statistic :

Discrete case:

Continuous case:

Then .

The MAP design minimizes the total probability of error.

ML design

Suppose we know only:

  • Both conditional distributions

The maximum likelihood (ML) design for :

ML is a simplified version of MAP. (Set and to .)


The probability of a false alarm, a Type I error, is called .

The probability of a miss, a Type II error, is called .

Total probability of error:

False alarm false alarm

Suppose sets off a smoke alarm, and is ‘no fire’ and is ‘yes fire’.

Then is the odds that we get an alarm assuming there is no fire.

This is not the odds of experiencing a false alarm (no context). That would be .

This is not the odds of a given alarm being a false one. That would be .

02 Illustration

Example - ML test: Smoke detector

ML test: Smoke detector

Example - MAP test: Smoke detector

MAP test: Smoke detector

03 Theory - MAP criterion proof

Explanation of MAP criterion - discrete case

First, we show that the MAP design selects for all those which render more likely than .

Observe this Calculation:

Now, take the condition for , and cross-multiply:

Divide both sides by and apply the above Calculation in reverse:

This is what we sought to prove.


Next, we verify that the MAP design minimizes the total probability of error.

The total probability of error is:

Expand this with summation notation (assuming the discrete case):

Now, how do we choose the set (and thus ) in such a way that this sum is minimized?

Since all terms are positive, and any may be placed in or in freely and independently of all other choices, the total sum is minimized when we minimize the impact of placing each .

So, for each , we place it in if:

That is equivalent to the MAP condition.

04 Theory - MC design

  • Write for cost of false alarm, i.e. cost when is true but decided .
    • Probability of incurring cost is .
  • Write for cost of miss, i.e. cost when is true but decided .
    • Probability of incurring cost is .

Expected value of cost incurred

MC design

Suppose we know:

  • Both prior probabilities and
  • Both conditional distributions and (or and )

The minimum cost (MC) design for a decision statistic :

Discrete case:

Continuous case:

Then .

The MC design minimizes the expected value of the cost of error.

MC minimizes expected cost

Inside the argument that MAP minimizes total probability of error, we have this summation:

The expected value of the cost has a similar summation:

Following the same reasoning, we see that the cost is minimized if each is placed into precisely when the MC design condition is satisfied, and otherwise it is placed into .

05 Illustration

Example - MC Test: Smoke detector

MC Test: Smoke detector

Mean square error

06 Theory - Minimum mean square error

Suppose our problem is to estimate or guess or predict the value of a random variable in one run of the experiment. Assume we have the distribution of . Which value do we choose?

There is no single best answer to this question. The best answer is a function of additional factors in the problem context.

One method is to pick a value where the PMF or PDF of is maximal. This is a value of highest probability. (There may be more than one.)

Another method is to pick the expected value .

For the normal distribution, or any symmetrical distribution, these are the same value. For most distributions they are not the same value.


Mean square error

Given an estimate for a random variable , the mean square error (MSE) of is:

The MSE quantifies the typical (square of the) error, meaning the difference between the true value and the estimate . The expected value calculates the typical value of this error.

Other error estimates are reasonable and useful in niche contexts. For example, or . They are not frequently used, so we do not consider their theory further.


In problem contexts where large errors are more costly than small errors (many real problems), the most likely value of (point with maximal PDF) may fare poorly as an estimate.

It turns out the expected value also happens to be the value that minimizes the MSE.

Minimal mean square error

Given a random variable , its expectation provides the estimate with minimal mean square error.

The MSE error itself of :

Proof that gives minimal MSE

Expand the MSE error:

Minimize this parabola. Differentiate:

Find zeros:


When the estimate is made in the absence of information (besides the distribution of ), it is called a blind estimate. Therefore, is the blind minimal MSE estimate, and is the error of this estimate.

In the presence of additional information, namely that event is known, then the MSE estimate is and the error of this estimate is .

The MSE estimate can also be conditioned on another variable, say .

Minimal MSE of given

The minimal MSE estimate of given another variable :

The error of this estimate is , which equals .

Notice that the minimal MSE of given can be used to define a random variable:

This variable is a derived variable of given by post-composition with the function .

The variable provides the minimal MSE estimates of when experimental outcomes are viewed as providing the information of only, and the model is used to derive estimates of from this information.

07 Illustration

Example - Minimal MSE estimate given PMF, given fixed event

Minimal MSE estimate given PMF

Exercise - Minimal MSE estimate from joint PDF

Minimal MSE estimate from joint PDF

08 Theory - Line of minimal MSE

Linear approximation is very common in applied math.

One could consider the linearization of (its tangent line) instead of the exact function .

Instead, one can minimize the MSE over all possible linear functions of . The line with minimal MSE is called the linear estimator.

Line of minimal MSE

Let be the line . Let .

The mean square error (MSE) of is:

The linear estimator is the line with minimal MSE, and it is:

The minimal error value is:

The variable of minimal error, , is uncorrelated with .

Slope and

Notice:

Thus, is the slope of the minimal MSE line for standardized variables and .

center|600

In each graph, and .

The line of minimal MSE is the “best fit” line, .

09 Illustration

Example - Estimating on a variable interval

Estimating on a variable interval

Exercise - Line of minimal MSE given joint PDF

Line of minimal MSE given joint PDF