Bayes’ Theorem

01 Theory

Theory 1

Bayes’ Theorem

For any events A and B:

P[B|A]=P[B]P[A|B]P[A]

Note: Bayes’ Theorem is sometimes called Bayes’ Rule.

Bayes’ Theorem - Derivation

Start with the observation that AB=BA, in other words event “A AND B” equals event “B AND A”.

Apply the multiplication rule to each product:

P[AB]=P[A]P[B|A]P[BA]=P[B]P[A|B]

Equate them and rearrange:

P[AB]=P[BA]P[A]P[B|A]=P[B]P[A|B]P[B|A]=P[B]P[A|B]P[A]

The main application of Bayes’ Theorem is to calculate P[A|B] when it is easy to calculate P[B|A] from the problem setup. Often this occurs in multi-stage experiments where event A describes outcomes of an intermediate stage.

Note: These lecture notes use alphabetical order A, B as a mnemonic for temporal or logical order, i.e. that A comes first in time, or that A is the prior conditional from which it is easy to calculate B.

Link to original

02 Illustration

Example - Bayes’ Theorem - COVID tests

Bayes’ Theorem: COVID tests

Assume that 0.5% of people have COVID. Suppose a COVID test gives a (true) positive on 96% of patients who have COVID, but gives a (false) positive on 2% of patients who do not have COVID. Bob tests positive. What is the probability that Bob has COVID?

Solution

(1) Label events:

  • Event AP: Bob is actually positive for COVID
  • Event AN: Bob is actually negative; note AN=APc
  • Event TP: Bob tests positive
  • Event TN: Bob tests negative; note TN=TPc

(2) Identify known data:

  • Know: P[TP|AP]=96%
  • Know: P[TP|AN]=2%
  • Know: P[AP]=0.5% and therefore P[AN]=99.5%

We seek: P[AP|TP]


(3) Translate Bayes’ Theorem:

Set A=TP and B=AP in Bayes’:

P[AP|TP]=P[AP]P[TP|AP]P[TP]

We know all values on the right except P[TP]


(4) Denominator: apply Total Probability (Division into Cases):

Observe that TPAP and TPAN are exclusive events, and that:

TP=TPAPTPAN.

Therefore:

P[TP]=P[AP]P[TP|AP]+P[AN]P[TP|AN]

Plug in data and compute:

P[TP]=5100096100+995100021000.0247

(5) Plug in and compute:

P[AP|TP]=P[AP]P[TP|AP]P[TP]0.960.0050.024719%

Intuition - COVID testing

Some people find this low number surprising. In order to repair your intuition, think about it like this: roughly 2.5% of tests are positive, with roughly 2% coming from false positives, and roughly 0.5% from true positives. Only 1/5 of all the positive results are true ones!

(This rough approximation assumes that 96%100%.)

If two tests both come back positive, the odds of COVID are now 98%.

If only people with symptoms are tested, so that, say, 20% of those tested have COVID, that is, P[AP|TP]=20%, then one positive test implies a COVID probability of 92%.

Link to original

Practice exercise

Inferring bin from marble

There are marbles in bins in a room:

  • Bin 1 holds 7 red and 5 green marbles.
  • Bin 2 holds 4 red and 3 green marbles.

Your friend goes in the room, shuts the door, and selects a random bin, then draws a random marble. (Equal odds for each bin, then equal odds for each marble in that bin.) He comes out and shows you a red marble.

What is the probability that this red marble was taken from Bin 1?

Link to original

Independence

03 Theory

Theory 1

Two events are independent when information about one of them does not change our probability estimate for the other.

Independence

Events A and B are independent when these (logically equivalent) equations hold:

  • P[B|A]=P[B]
  • P[A|B]=P[A]
  • P[BA]=P[B]P[A]

Note that the last equation is symmetric in A and B:

  • Check: BA=AB and P[B]P[A]=P[A]P[B]
  • This symmetric version is the preferred definition of the concept of independence.

Multiple-independence

A collection of events A1,,An is mutually independent when every subcollection Ai1,,Aik satisfies:

P[Ai1Aik]=P[Ai1]P[Aik]

A potentially weaker condition for a collection A1,,An is called pairwise independence, which holds when all 2-member subcollections are independent:

P[AiAj]=P[Ai]P[Aj]for allij

One could also define 3-member independence, or n-member independence. Plain ‘independence’ means any-member independence.

Link to original

04 Illustration

Practice exercise

Independence and complements

Prove that these are logically equivalent statements:

  • A and B are independent
  • A and Bc are independent
  • Ac and Bc are independent

Make sure you demonstrate both directions of each equivalency.

Link to original

Example - Checking independence by hand

Independence by hand: red and green marbles

A bin contains 4 red and 7 green marbles. Two marbles are drawn.

Let R1 be the event that the first marble is red, and let G2 be the event that the second marble is green.

(a) Show that R1 and G2 are independent if the marbles are drawn with replacement.

(b) Show that R1 and G2 are not independent if the marbles are drawn without replacement.

Solution

(a) With replacement.

Identify knowns:

  • Know: P[R1]=411
  • Know: P[G2]=711

Now compute both sides of independence relation:

P[R1G2]=P[R1]P[G2]

The right side is 411711.

For P[R1G2], we have 47 ways to get R1G2, and 112 total outcomes. So left side is 47112, which equals the right side.


(b) Without replacement. This is a bit harder.

(1) Identify knowns:

Know: P[R1]=411 and therefore P[R1c]=711

We seek: P[G2] and P[R1G2]


(2) Find P[G2] using Total Probability (Division into Cases):

G2=G2R1G2R1cP[G2]=P[R1]P[G2|R1]+P[R1c]P[G2|R1c]

Find RHS factors by counting, then compute:

P[G2]=411710+71161070110

(3) Find P[R1G2] using multiplication rule:

P[R1G2]=P[R1]P[G2|R1]41171028110

(4) Compare both sides:

Left side: P[R1G2]=28110.

Right side:

P[R1]P[G2]=41170110=28121

But 2811028121 so P[R1G2]P[R1]P[G2] and they are not independent.

Link to original

Tree diagrams

05 Theory

Theory 1

A tree diagram depicts the components of a multi-stage experiment. Nodes represent sources of randomness.

center

An outcome of the experiment is represented by a complete path taken from the root (left-most node, only one option) to a leaf (right-most node, many options). The branch chosen at a given node represents the outcome of a “sub-experiment.” So a complete path encodes the outcomes of all sub-experiments along the way.

Each branch emanating from a node is labeled with a probability value. This is the probability that the sub-experiment of that node has the outcome of that branch. (In the example, 0.8=P[A|B1].) This is also the conditional probability of the branch’s right node, given its left node as known.

Therefore, branch values from any given node must sum to 1.

The probability of a given outcome is the product of the probabilities along each branch of the path from the root to that outcome.

For example, for outcome AB1, we have P[AB1]=P[A]P[B1|A].

Generally, remember that

P[ABCD]=P[ABC]P[D|ABC]=P[AB]P[C|AB]P[D|ABC]=P[A]P[B|A]P[C|AB]P[D|ABC]

This overall outcome probability may be written at the final leaf. (Not to be confused with the branch value of the last branch.)

One can also use a tree diagram to remember quickly how to calculate certain probabilities.

For example, what is P[A] in the diagram?

  • Answer: add up the path probabilities for all paths terminating in A. We obtain:
0.24+0.36+0.180.78

For example, what is P[B1|N]?

  • Answer: divide the leaf probability of B1N by the total probability of N. We obtain:
P[B1|N]=0.060.06+0.04+0.120.27Link to original

06 Illustration

Example - Tree diagrams: Marble transferred, marble drawn

Marble transferred, marble drawn

Setup:

  • Bin 1 holds five red and four green marbles.
  • Bin 2 holds four red and five green marbles.

Experiment:

  • You take a random marble from Bin 1 and put it in Bin 2 and shake Bin 2.
  • Then you draw a random marble from Bin 2 and look at it.

Questions:

(a) What is the probability you draw a red marble?

(b) Supposing that you drew a red marble, what is the probability that a red marble was transferred?

Solution

Construct the tree diagram:

Identify sub-experiments, label events, compute probabilities:

center


(a) Compute P[DR]:

Add up leaf numbers for DR at leaf:

P[DR]=2590+1690=4190

(b) Compute P[TR|DR]:

Conditional probability definition:

P[TR|DR]=P[TRDR]P[DR]25/9041/902541

Interpretation: value of desired path over value of all possible paths.

Link to original

Counting

07 Theory

Theory 1

In many “games of chance,” it is assumed based on symmetry principles that all outcomes are equally likely. From this assumption we infer a rule for the probability measure P[].

P[A]=|A||S|

In words: the probability of event A is the number of outcomes in A divided by the total number of possible outcomes.

When this formula applies, it is important to be able to count the total outcomes as well as the outcomes that satisfy various conditions.

Permutations

Permutations count the number of ordered lists one can form from a set of items. For a list of r items taken from a total collection of n items, the number of permutations is:

P(n,r)=n!(nr)!

Why is this formula true?

There are n choices for the first item. Then n1 for the second item, after the first has been chosen and removed from the set of possibilities. Then n2 for the third, then …, then nr+1 for the rth item. So the total number of possibilities is the product:

n(n1)(n2)(nr+1)

We can express this with factorials using a technical observation:

n!(nr)!=n(n1)(n2)(nr+1)(nr)(nr1)1n(n1)(n2)(nr+1)(nr)(nr1)1n(n1)(n2)(nr+1)

Combinations

Combinations count the number of subsets (ignoring order) one can form from some items. For a subset of r items taken from a total collection of n items, the number of combinations is:

C(n,r)=(nr)=n!r!(nr)!

This formula can be derived from the formula for permutations.

The set of possible permutations can be partitioned into combinations: each combination determines a subset. By additionally specifying an ordering of the elements in a chosen subset, we obtain a permutation. For a given subset of r elements taken from n items, there are r! ways to determine an ordering of them in a list. Therefore, the number of permutations must be a factor of r! times the number of combinations. (For every combination of size r, there are r! ways to order the items in a list.)

The notation (nr) is also called the binomial coefficient because it provides the coefficient values of a binomial expansion:

(x+y)n=i=1n(ni)xniyi

For example:

(x+y)4=x4+4x3y+6x2y2+4xy3+y4

There are also higher combinations that give multinomial coefficients:

Multinomial coefficient

The general multinomial coefficient is defined by the formula:

(nr1,r2,,rk)=n!r1!r2!rk!

where ri and r1+r2++rk=n.

The multinomial coefficient measures the number of ways to partition n items into subsets with sizes r1,r2,,rk, respectively.

Notice that (53,2)=(53), so we have already defined these values (i.e. with k=2) when we defined binomial coefficients. But when k>2, the formula gives new values. They correspond to the coefficients in multinomial expansions. For example, k=3 gives the coefficients for (x+y+z)n.

Link to original

08 Illustration

Practice exercise

Counting teams with Cooper

A team of 3 student volunteers is formed at random from a class of 40. What is the probability that Cooper is on the team?

Link to original

Example - Counting VA license plates

Counting VA license plates

VA license plates have three letters (with no I, O, or Q) followed by four numerals. A random plate is seen on the road.

(a) What is the probability that the numerals occur in strictly increasing order?

(b) What is the probability that at least one number is repeated?

Solution

(a) Numerals in increasing order.

(1) Count total plates:

  • Have 232323 options for letters.
  • Have 10101010 options for numbers.

Thus 233104 possible plates.


(2) Count ways to have 4 numerals that occur in increasing order:

There are (104) ways to choose 4 distinct numerals from 10 options.

For each choice of four distinct numerals (i.e. for each combination), there is exactly one ordering that’s increasing.

Therefore, there are (104) ways to have 4 numerals that occur in increasing order.


(3) Count ways to have a list of 3 letters (excluding I, O, Q).

There are 26 total letters, 3 are excluded, thus 23 options for each letter.

Repetition is allowed, thus we have 232323=233 total ways.


(3) Compute probability:

Total count of desired plates (taking the product of possibilities):

233(104)

Let E label the event that a plate has increasing numerals. The counting formula for probability is:

P[E]=|E||S|

Evaluate using our data:

P[E]233(104)23310410!4!6!10000211000

(b) At least one number repeated.

“At least” is hard to work with! Lots of ways that can happen.

Try the complement event, which is much simpler: “no repetition”

Let Ec be event that no numbers are repeated. All are distinct. Then E is our desired event.

Count the possibilities:

|Ec|=23232310987

The total number of plates is still 233104.

Therefore, license plates with at least one number repeated:

|E|=|S||E|2331042331098760348320

Desired outcomes over total outcomes:

|E||S|603483202331040.496 Link to original

Example - Combinations: Groups with Haley and Hugo

Haley and Hugo from 2 groups of 3

A UVA class has 40 students. Suppose the professor chooses 3 students on Wednesday at random, and again 3 on Friday. What is the probability that Haley is chosen today and Hugo on Friday?

Solution

(1) Count total outcomes:

  • We have (403) possible groups chosen Wednesday.
  • We have (403) possible groups chosen Friday.

Therefore (403)×(403) possible groups in total. (Product of possibilities.)


(2) Count desired outcomes:

The possible groups of 3 that include Haley can be counted by counting the subgroups of 2 formed of the other students in Haley’s group.

  • Therefore we have (392) groups that contain Haley.
  • Similarly, we have (392) groups that contain Hugo.

Therefore we count (392)×(392) total desired outcomes.


(3) Compute probability:

Let E label the desired event. By the counting rule:

P[E]=|E||S|

Evaluate using our data:

P[E](392)×(392)(403)×(403)(39382!4039383!)2(340)2 Link to original

Counting out 4 teams

Counting out 4 teams

A board game requires 4 teams of players. How many configurations of teams are there out of a total of 17 players if the number of players per team is 4, 4, 4, 5, respectively.

Link to original