In my pocket I have a jumble of coins: 5 dimes, 4 quarters, 3 nickels, 3 pennies, and one big 50-piece. I draw three at random. What is the expected value of the three?
Solution
02
(1) Let be the value of the first coin drawn, let be the value of the second coin drawn, and let be the value of the third coin drawn. The central trick to efficiently solve this problem is to notice that are all identically distributed.
One can see this by the following argument: using an ordered triple , write down all possible permutations of drawings. Notice that the number of triples where is a dime is equal to the number of triples where is a dime is equal to the number of triples where is a dime.
We can further extend this observation to all the values. Thus, the distributions of are all the same.
(2) Another, nicer, argument is to notice that we can swap and in these ordered triples without changing the overall set, and similarly for and there exists a bijection between ; ; and identical distribution.
Thus, we have that .
(3) Let be the sum of the values of the three coins. Then,
A 100 Watt light bulb’s expected lifetime is 600 hours, with variance 360,000. An advertising board uses one of these light bulbs at a time, and when one burns out, it is immediately replaced with another. (The lifetime of each bulb is independent from the others.) Let the continuous random variable be the total number of hours of advertising from 10 bulbs.
(a) Find the expected value of .
(b) Find the variance of .
(c) Use the CLT to approximate the probability that is less than 5,500 hours. (You should decide whether it is appropriate to use the continuity correction.)
Frank is a competitive hot dog eater. He eats in with .
What is the probability that Frank manages to consume in or less, in an upcoming competition? Use a normal approximation from the CLT to estimate this probability.
State the reason that the normal approximation is applicable.
Solution
04
(1) The normal approximation in this case is applicable since:
Assumptions:
Frank eats a large number of hot dogs the sample size, or , is sufficiently large
We assume that the amount of time Frank spends on each hot dog does not depend on how many he has had previously the times to consume each hot dog are independent and identically distributed
(2) Let be the time taken to eat the -th hot dog. Let be the time taken to eat hot dogs. Then seconds with seconds.
Suppose a lottery game requires that you purchase a $10 game card and advertises a 10% probability of winning a prize.
Use the Central Limit Theorem and the continuity correction to approximate the probability of winning at least 20 times when you purchase 100 of these game cards.
At a high school math competition, students take a test with 10 questions. Each question is worth one point and the probability of a student getting any one question correct is 0.55, independent of the other questions.
(a) Find the variance of , the average score for 15 students.
(b) Use the Law of Large Numbers to find an upper bound for the probability that is greater than 6.
A factory assembly line machine is cutting paperclips to length before folding. Each paperclip is supposed to be long. The length of paperclips is approximately normally distributed with standard deviation .
(a) Design a significance test with that is based on the average of 5 measurements (sample mean). What is the rejection region? What is the probability of Type I error?
(b) What is the probability of Type II error, given that the average paperclip length on the machine is actually ?
Solution
01
(a)
Null Hypothesis : “The expected paper clip length is not inches”
Alternative Hypothesis : “The expected paper clip length is inches”
Let be the length of the paperclips for our sample. For the sample, assume and . Thus, .
By symmetry, since we want a two-tailed test, it suffices to find the rejection region at one tail (we can then extrapolate for the second tail). We then have, for the lower rejection region:
Solving for , we have . By symmetry, the lower bound for our upper rejection region is . Thus, our rejection region is
By definition, .
(b)
A Type-II error occurs when the Null Hypothesis is incorrectly accepted, when it is actually false.
A redditor claims that 10% of people have blue eyes, but you think it is not that many. You work at the DMV for the summer, so you write down the eye color recorded on drivers’ licenses of various people in the database.
(a) Suppose you record the eye color of 1000 people and let be the number that are blue. If the rejection region is , what is the significance level of the test?
(b) Take again the experiment in (a). If you want a significance level of , what should the rejection region be in your test?
(c) Suppose the fact is that 7% of people have blue eyes. How likely is it that your test in (b) rejects ?
The number of days it takes for a package to arrive after being shipped with a particular company is a random variable, . When the shipping process is operating at full capacity and delays are not common, the PMF of is given in the following table:
1
2
3
4
5
6
7
8
9
0.041
0.229
0.379
0.237
0.045
0.021
0.019
0.017
0.012
Design a significance test at the level that uses the value of X for one package to test the null hypothesis: the shipping process is operating at full capacity. You should clearly state which values of X are in the rejection region.
Solution
03
The rejection region is . So if shipping takes 8 or 9 days, we will reject .
In a digital communication channel, it is assumed that a bit is received in error with probability . Someone challenges this hypothesis: they believe the error rate is higher than . Assume 100,000 bits are transmitted. Design a one-tailed significance test using and , the number of bits received in error, to decide whether to reject the hypothesis that the error rate is . Your rejection region should be of the form . You do not have to use the continuity correction.
You are testing gram samples of pure Uranium to see if they are enriched. You have a Geiger counter that counts a number of gamma rays that come from nearby fission events in 1 second intervals after you press the count button.
If the sample is enriched, you expect a Poisson distribution of gamma rays in the counter with an average of 20. If the sample is not enriched (the null hypothesis), the average count will be 10.
(a) Design an ML test to decide whether it is ordinary or enriched (). What is ? What are the probabilities of Type I, Type II, and Total error?
(b) After running the test many times, you have noticed that 70% of the samples are ordinary, while 30% are enriched. Now design an MAP test. What is ? What are the probabilities of Type I, Type II, and Total error?
(c) Missing a bit of enriched Uranium is obviously a major problem. The damage to your reputation and pocketbook of missing enriched Uranium is the damage caused by incorrectly labeling ordinary Uranium as enriched. Now design an MC test. What is ? What are the probabilities of Type I, Type II, and Total error?
(d) What is the expected cost of each application of the MC test, assuming the cost of a false alarm is $10,000? What is this number for the MAP test?
A metal detector for an event produces a reading, , that varies between 0 and 10 according to the PDFs given below. (Note is a continuous random variable.)
Based on the reading, a security guard will stop and search a person or let them pass. Suppose it is known that 10% of people passing through security are carrying metal objects.
a person is not carrying metal objects
a person is carrying metal objects
Suppose it is 20 times worse to neglect searching someone who is carrying metal than to search someone who is not carrying metal. Design a minimum cost test that uses the value of the reading, X to decide whether the security guard will stop that person. Clearly state the decision rule.
A doctor is planning to use a new, inexpensive medical test to detect a particular disease. The test score, , tends to be higher for patients with the disease. The PMFs for the test score for patients with and without the disease are shown below. From a previously used, more expensive test, it is known that 20% of the population has this disease.
Patients without the disease:
1
2
3
4
5
0.5
0.3
0.15
0.05
0
Patients with the disease:
1
2
3
4
5
0.05
0.1
0.3
0.35
0.2
Design a binary hypothesis test that will minimize the doctor’s probability of error. Let : the patient does not have the disease and : the patient does have the disease. Determine for which test scores the doctor should diagnose the patient as having the disease. Clearly denote which scores result in which decisions.
Solution
08
1
2
3
4
5
If or the doctor should diagnose the patient as having the disease.
If the doctor should not diagnose the patient as having the disease.