It is often of limited use to know the value of an estimator given an observed collection of observations, since the single value does not indicate how close we should expect to be to . For example, if a poll estimates that a randomly selected voter has 46% probability of being a supporter of candidate A and a 42% probability of being a supporter of candidate B, then knowing more information about the distributions of the estimators is essential if we want to know
Definition (Confidence interval)
Consider an unknown probability distribution from which we get independent observations , and suppose that is the value of some statistical functional of . A confidence interval for is an interval-valued function of the sample data . A confidence interval has confidence level if it contains with probability at least .
Consider a distribution of the form , and let be the maximum functional (so
Solution. We expect to be a little larger than the largest observation, so we look for a confidence interval of the form . We'd like to make the interval short so that it's
For example, the probability that all 10 observations will be less than 90% of is 34.9%. So with probability about 65.1%, we will trap the value of in the interval
We can replace 90% with a variable and solve the equation to find that is the shortest 90% confidence interval.
This first example was exceptionally amenable to analysis because we can solve exactly for the relevant probabilities. Estimators based on sums of observations are more typical, and in those cases we usually use the normal approximation:
Show that if is unbiased and approximately normally distributed, then is an approximate confidence interval, where is the CDF of the standard normal distribution.
Solution. A normal random variable is within standard deviations of its mean with probability . Since the mean of is , this implies that includes with probability approximately .
One thousand people are polled, and 462 of them express a preference for candidate A, while 417 express a preference for candidate B. Suppose that the 1000 preferences are chosen independently from a distribution on which assigns probability mass and to the first two outcomes. Use the normal approximation to find 95% confidence intervals for the functionals and .
Note: although it is a bit of a cheat, you can approximate with when you calculate the standard error (and similarly for B).
Solution. The standard deviation of a Bernoulli random variable with parameter is . Therefore, the average of 1000 independent observations from such a distribution is within units of (on the number line) with probability about 95%.
Although we don't know the value of in this expression, we don't lose too much by approximating it with . Making this substitution, we get a confidence interval of . The standard deviation for B works out to the same value to the nearest tenth, so we get as a 95% confidence interval for .
Warning. In the standard confidence interval framework (as described above), the value of the statistical functional is not random. Furthermore, the values of our estimators are random, even though they will realize concrete real-number values once the data are collected. This is opposite to the way probability questions are usually framed (asking for a given random variable how much of its probability mass lies in a particular, fixed interval).
One way to avoid the pitfall of thinking of the parameter as random is to speak of the random confidence interval trapping the value of the statistical functional, rather than speaking of the unknown parameter as falling into the given interval.
Suppose we have a 95% confidence interval for . For each of the following statements, determine whether it's true or false.
Given observed values and , has a 95% chance of falling within .
Suppose we have a large number of draws from the distribution, and we progressively update and according to the observations we've made so far. Then the sequence of confidence intervals contains at least 95% of the time, on average.
If we are estimating a function-valued feature of rather than a single number (for example, a regression function), then we might want to provide a confidence band which traps the whole graph of the function with specified probability (we'll see an example, the DKW theorem, in the next section).
Definition (Confidence band)
Let , and suppose that is a function from the set of distributions to the set of real-valued functions on .
A confidence band for is pair of random functions and from to defined in terms of independent observations from and having everywhere on with probability at least .