Significance testing

06 Theory - Significance testing

Theory 1 - Significance testing

Significance test

Ingredients of a significance test (unary hypothesis test):

  • H0 — Null hypothesis event
    • Identify a Claim
    • Then: H0 is background assumption (supposing Claim isn’t known)
    • Goal is to invalidate H0 in favor of Claim
  • R — Rejection Region event (decision rule)
    • R is written in terms of decision statistic X and significance level α
    • R is unlikely assuming H0. R is more likely if Claim
    • “If X falls in R, this Test rejects H0.”
  • P[R|H0] — Able to compute this
    • Usually: inferred from fX|H0 or PX|H0
    • Adjust R to achieve P[R|H0]=α

Significance level

Suppose we are given a null hypothesis H0 and a rejection region R.

The significance level of R is:

α=P[R|H0]=P[reject H0|H0 is true]

Sometimes the condition is dropped and we write α=P[R], e.g. when a background model without assuming H0 is not known.

Null hypothesis implies a distribution

Usually S is unspecified, yet H0 determines a known distribution.

In this case H0 will not take the form of an event in a sample space, H0S.

At a minimum, H0 must determine P[R|H0].

We do NOT need these details:

  • Background sample space S
  • Non-conditional distribution (full model): fX or PX
  • Complement conditionals: fX|H0c or PX|H0c

In basic statistical inference theory, there are two kinds of error.

  • Type I error concludes with rejecting H0 when H0 is true.
  • Type II error concludes with maintaining H0 when H0 is false.

Type I error is usually a bigger problem. We want to consider H0 as “innocent until proven guilty.”

H0 is trueH0 is false
Maintain null hypothesisMade right callWrong acceptance
Type II Error
Reject null hypothesisWrong rejection
Type I Error
Made right call

To design a significance test at α, we must identify H0, and specify R with the property that P[R|H0]=α.

When R is written using a variable X, we must choose between:

  • One-tail rejection region: x with R(x)r or x with R(x)r
  • Two-tail rejection region: x with |R(x)μ|c
Link to original

07 Illustration

Example - One-tail test: Weighted die

One-tail test: Weighted die

Your friend gives you a single regular die, and say she is worried that it has been weighted to prefer the outcome of 2. She wants you to test it.

Design a significance test for the data of 20 rolls of the die to determine whether the die is weighted. Use significance level α=0.05.

Solution

Let X count the number of 2s that come up.

The Claim: “the die is weighted to prefer 2” The null hypothesis H0: “the die is normal”

Assuming H0 is true, then XBin(20,1/6), and therefore:

PX|H0(k)=(20k)(1/6)k(5/6)20k

⚠️ Notice that “prefer 2” implies the claim is for more 2s than normal.

Therefore: Choose a one-tail rejection region.

Need r such that:

P[Xr|H0]=0.05P[X<r|H0]=0.95

Solve for r by computing conditional CDF values:

k:01234567
FX|H0(k):0.0260.1300.3290.5670.7690.8980.9630.989

Therefore, choose r=6:

P[X6|H0]<0.04, but P[X5|H0]>0.05. Final answer:

R={x|x6} Link to original

Two-tail test: Circuit voltage

Two-tail test: Circuit voltage

A boosted AC circuit is supposed to maintain an average voltage of 130V with a standard deviation of 2.1V. Nothing else is known about the voltage distribution.

Design a two-tail test incorporating the data of 40 independent measurements to determine if the expected value of the voltage is truly 130V. Use α=0.02.

Solution

Use M40(V) as the decision statistic, i.e. the sample mean of 40 measurements of V.

The Claim to test: E[V]130

The null hypothesis H0: E[V]=130

Rejection region:

|M40130|c

where c is chosen so that P[|M40130|c]=0.02


Assuming H0, we expect that:

E[M40]=130,σM402=2.12400.110

Recall Chebyshev’s inequality:

P[|M40130|c]σM402c20.110c2

Now solve:

0.110c2=0.02c2.348

Therefore the rejection region should be:

M40<127.65132.35<M40 Link to original

One-tail test with a Gaussian: Weight loss drug

One-tail test with a Gaussian: Weight loss drug

Assume that in the background population in a specific demographic, the distribution of a person’s weight W satisfies W𝒩(190,242). Suppose that a pharmaceutical company has developed a weight-loss drug and plans to test it on a group of 64 individuals.

Design a test at the α=0.01 significance level to determine whether the drug is effective.

Solution

Since the drug is tested on 64 individuals, we use the sample mean M64(W) as the decision statistic.

The Claim: “the drug is effective in reducing weight”

The null hypothesis H0: “no effect: weights on the drug still follow 𝒩(190,242)

Assuming H0 is true, then W𝒩(190,242).

⚠️ One-tail test because the drug is expected to reduce weight (unidirectional). Rejection region:

M64(W)r

Calculate:

σM642242649

⚠️ Standardized M64(W) is approximately normal!

(The standardization of M64(W) removes the effect of 1n. As if it’s the summation.)

So, standardize and apply CLT:

M64(W)1909𝒩(0,1),P[M64(W)r]P[Zr1903]=Φ(r1903)

Solve:

P[M64(W)r]=0.01Φ(r1903)=0.01Φ(190r3)=0.99190r3=2.33r=183.01

Therefore, the rejection region:

M64(W)183.01Link to original

Binary hypothesis testing

01 Theory - Binary testing, MAP and ML

Theory 1 - Binary testing, MAP and ML

Binary hypothesis test

Ingredients of a binary hypothesis test:

  • H0 and H1 — Complementary hypotheses
    • Maybe also know the prior probabilities P[H0] and P[H1]
    • Goal: determine which case we are in, H0 or H1
  • A0 and A1 — Complementary events of the Decision Rule
    • Directionality: given H0, A0 is likely; given H1, A1 is likely
    • Decision Rule: outcome A0, accept H0; outcome A1, accept H1
    • Usually: Ai written in terms of decision statistic X using a design
    • We cover three designs:
      • MAP and ML (minimize ‘error probability’)
      • MC (minimizes ‘error cost’)
    • Designs use PX|H0 and PX|H1 (or fX|H0, fX|H1) to construct A0 and A1

MAP design

Suppose we know:

  • P[H0] and P[H1]
    • Both prior probabilities
  • PX|H0(x) and PX|H1(x) (or fX|H0(x) and fX|H1(x))
    • Both conditional distributions

The maximum a posteriori probability (MAP) design for a decision statistic X:

A0=set of x for which:

Discrete case:

PX|H0(x)P[H0]PX|H1(x)P[H1]

Continuous case:

fX|H0(x)P[H0]fX|H1(x)P[H1]

And A1={x|xA0}.

The MAP design minimizes the total probability of error.

ML design

Suppose we don’t know the priors, we know only:

  • PX|H0(x) and PX|H1(x) (or fX|H0(x) and fX|H1(x))
    • Both conditional distributions

The maximum likelihood (ML) design for X:

A0=set of x for which:PX|H0(x)PX|H1(x)(discrete)fX|H0(x)fX|H1(x)(continuous)

ML is a simplified version of MAP. (Set P[H0] and P[H1] to 0.5.)


The false alarm error rate is called PFA. The miss error rate is called PMiss.

PFA=P[A1|H0]PMiss=P[A0|H1]

Total probability of error:

PERR=P[A1|H0]P[H0]+P[A0|H1]P[H1]

Wrong meanings of PFA

Suppose A1 sets off a smoke alarm, and H0 is ‘no fire’ and H1 is ‘yes fire’.

Then PFA is the odds that we get an alarm assuming there is no fire.

This is not the odds of experiencing a false alarm (no context). That would be P[A1H0].

This is not the odds of a given alarm being a false one. That would be P[H0|A1].

Link to original

02 Illustration

Example - ML test: Smoke detector

ML test: Smoke detector

Suppose that a smoke detector sensor is configured to produce 8V when there is smoke, and 0V otherwise. But there is background noise with distribution 𝒩(0,32).

Design an ML test for the detector electronics to decide whether to activate the alarm.

What are the three error probabilities? (Type I, Type II, Total.)

Solution

First, establish the conditional distributions:

X|H0𝒩(0,32)X|H1𝒩(8,32)

Density functions:

fX|H0=12π9e12(x03)2fX|H1=12π9e12(x83)2

The ML condition becomes:

12π9e12(x03)2?12π9e12(x83)212(x03)2?12(x83)2x2?(x8)2x4

Therefore, A0 is x4, while A1 is x>4.

The decision rule is: activate alarm when x>4.


Type I error:

PFA=P[A1|H0]P[X>4|H0]1P[X0343|H0]1P[Z1.3333]0.0912

Type II error:

PMiss=P[A0|H1]P[X4|H1]P[X83483|H1]P[Z1.3333]0.0912

Total error:

PERR=PFA0.5+PMiss0.50.0912 Link to original

Example - MAP test: Smoke detector

MAP test: Smoke detector

Suppose that a smoke detector sensor is configured to produce 8V when there is smoke, and 0V otherwise. But there is background noise with distribution 𝒩(0,32).

Suppose that the background chance of smoke is 5%. Design a MAP test for the alarm.

What are the three error probabilities? (Type I, Type II, Total.)

Solution

First, establish priors:

P[H0]=0.95P[H1]=0.05

The MAP condition becomes:

12π9e12(x03)20.95?12π9e12(x83)20.05e12(x03)2?e12(x83)20.050.9512(x03)2?12(x83)2+ln(0.050.95)x2?(x8)218ln(0.050.95)x7.31

Therefore, A0 is x7.31, while A1 is x>7.31.

The decision rule is: activate alarm when x>7.31.


Type I error:

PFA=P[A1|H0]P[X>7.31|H0]1P[Z2.4367]0.007411

Type II error:

PMiss=P[A0|H1]P[X7.31|H1]P[Z0.23]0.4090

Total error:

PERR=PFA0.95+PMiss0.050.02749 Link to original

03 Theory - MAP criterion proof

Theory 2 - MAP criterion proof

Explanation of MAP criterion - discrete case

First, we show that the MAP design selects for A0 all those x which render H0 more likely than H1. This will be used in the next step to show that MAP minimizes probability of error.

Observe this calculation:

P[Hi|X=x]=P[X=x|Hi]P[Hi]P[X](Bayes’ Rule)=PX|Hi(x)P[Hi]P[X](Conditional PMF)

Recall the MAP criterion:

PX|H0(x)P[H0]PX|H1(x)P[H1]

Divide both sides by P[X] and apply the above Calculation in reverse:

P[H0|X=x]P[H1|X=x]

This is what we sought to prove.


Next, we verify that the MAP design minimizes the total probability of error.

The total probability of error is:

PERR=P[A1|H0]P[H0]+P[A0|H1]P[H1]

Expand this with summation notation (assuming the discrete case):

xA1PX|H0(x)P[H0]+xA0PX|H1(x)P[H1]

Now, how do we choose the set A0 (and thus A1=A0c) in such a way that this sum is minimized?

Since all terms are positive, and any x may be placed in A1 or in A0 freely and independently of all other choices, the total sum is minimized when we minimize the impact of placing each x.

So, for each x, we place it in A0 if:

PX|H0(x)P[H0]PX|H1(x)P[H1]

That is equivalent to the MAP criterion.

Link to original

04 Theory - MC design

Theory 3 - MC design

  • Write C10 for cost of false alarm, i.e. cost when H0 is true but decided H1.
    • Probability of incurring cost C10 is PFAP[H0].
  • Write C01 for cost of miss, i.e. cost when H1 is true but decided H0.
    • Probability of incurring cost C01 is PMissP[H1].

Expected value of cost incurred

E[C]=P[A1|H0]P[H0]C10+P[A0|H1]P[H1]C01

MC design

Suppose we know:

  • Both prior probabilities P[H0] and P[H1]
  • Both conditional distributions PX|H0(x) and PX|H1(x) (or fX|H0(x) and fX|H1(x))

The minimum cost (MC) design for a decision statistic X:

A0=set of x for which:

Discrete case:

PX|H0(x)P[H0]C10PX|H1(x)P[H1]C01

Continuous case:

fX|H0(x)P[H0]C10fX|H1(x)P[H1]C01

Then A1={x|xA0}.

The MC design minimizes the expected value of the cost of error.

MC minimizes expected cost

Inside the argument that MAP minimizes total probability of error, we have this summation:

PERR=xA1PX|H0(x)P[H0]+xA0PX|H1(x)P[H1]

The expected value of the cost has a similar summation:

E[C]=xA1PX|H0(x)P[H0]C10+xA0PX|H1(x)P[H1]C01

Following the same reasoning, we see that the cost is minimized if each x is placed into A0 precisely when the MC design condition is satisfied, and otherwise it is placed into A1.

Link to original

05 Illustration

Example - MC Test: Smoke detector

MC Test: Smoke detector

Suppose that a smoke detector sensor is configured to produce 8V when there is smoke, and 0V otherwise. But there is background noise with distribution 𝒩(0,32).

Suppose that the background chance of smoke is 5%. Suppose the cost of a miss is 50× the cost of a false alarm. Design an MC test for the alarm.

Compute the expected cost.

Solution

We have priors:

P[H0]=0.95P[H1]=0.05

And we have costs:

C10=1C01=50

(The ratio of these numbers is all that matters in the inequalities of the condition.)

The MC condition becomes:

12π9e12(x03)20.951?12π9e12(x83)20.0550e12(x03)2?e12(x83)22.50.9512(x03)2?12(x83)2+ln(2.50.95)x2?(x8)218ln(2.50.95)x2.91

Therefore, A0 is x2.91, while A1 is x>2.91.

The decision rule is: activate alarm when x>2.91.


Type I error:

PFA=P[A1|H0]P[X>2.91|H0]0.1660

Type II error:

PMiss=P[A0|H1]P[X2.91]0.04488

Total error:

PERR=PFA0.95+PMiss0.050.1599

PMF of total cost:

PC(c)={0.002244c=500.1577c=10.840056c=0

Therefore E[C]=0.27.

Link to original