The sample mean of a set of IID random variables is an RV that averages the first instances:
Statistics of the sample mean (for any ):
The sample mean is typically applied to repeated trials of an experiment. The trials are independent, and the probability distribution of outcomes should be identical from trial to trial.
Notice that the variance of the sample mean limits to 0 as . As more trials are repeated, and the average of all results is taken, the fluctuations of this average will shrink toward zero.
As the distribution of will converge to a PMF with all the probability concentrated at and none elsewhere.
02 Theory - Tail estimation
Every distribution must trail off to zero for large enough . The regions where trails off to zero (large magnitude of ) are informally called ‘tails’.
Tail probabilities
A tail probability is a probability with one of these forms:
Markov’s inequality
Assume that . Take any .
Then the Markov’s inequality states:
Chebyshev’s inequality
Take any , and .
Then the Chebyshev’s inequality states:
Markov vs. Chebyshev
Chebyshev’s inequality works for any , and it usually gives a better estimate than Markov’s inequality.
The main value of Markov’s inequality is that it only requires knowledge of .
Think of Chebyshev’s inequality as a tightening of Markov’s inequality using the additional data of .
Derivation of Markov’s inequality - Continuous RV
Under the hypothesis that and , we have:
On the range we may convert to , making the integrand bigger:
Simplify:
Also:
Therefore:
Extra - Derivation of Chebyshev’s inequality
Notice that the variable is always positive. Chebyshev’s inequality is a simple application of Markov’s inequality to this variable.
Specifically, using as the Markov constant, Markov’s inequality yields:
Then, by monotonicity of square roots:
And of course . Chebyshev’s inequality follows.
03 Illustration
Markov’s inequality derivation - Discrete RV
Derive Markov’s inequality for a discrete RV.
Example - Markov and Chebyshev
04 Theory - Law of Large Numbers
Let be a collection of IID random variables with for any , and for any .
Recall the sample mean:
Recall that .
Law of Large Numbers (weak form)
For any , by Chebyshev’s inequality we have:
Therefore:
And the complement:
05 Illustration
Example - LLN: Average winnings
Exercise - Enough samples
Statistical testing
06 Theory - Significance testing
Significance test
Ingredients of a significance test (unary hypothesis test):
Null hypothesis event
Identify a Claim
Then: is background assumption (supposing Claim isn’t known)
Goal is to invalidate in favor of Claim
Rejection Region (decision rule): an event
is unlikely assuming
Directionality: is more likely if Claim
Write in terms of decision statistic and significance level
Ability to compute
Usually: inferred from or
Adjust to achieve
Significance level
Suppose we are given a null hypothesis and a rejection region .
The significance level of is:
Sometimes the condition is dropped and we write , e.g. when a background model without assuming is not known.
Null hypothesis implies a distribution
Frequently will not take the form of an event in a sample space, .
Usually is unspecified, yet determines a known distribution.
At a minimum, the assumption of must determine numbers .
More generally, we do not need these details:
Background sample space
Non-conditional distribution (full model): or
Complement conditionals: or
In basic statistical inference theory, there are two kinds of error.
Type I error concludes with rejecting when is true.
Type II error concludes with maintaining when is false.
Type I error is usually a bigger problem. We want to consider “innocent until proven guilty.”
is true
is false
Maintain null hypothesis
Made right call
Wrong acceptance
Reject null hypothesis
Wrong rejection
Made right call
To design a significance test at , we must identify , and specify with the property that .
When is written using a variable , we must choose between: