Confidence Intervals

Knowing the properties of a population or a random variable, we were able to predict distributions of a sample. Now, we are interested in taking a sample and using it to estimate the properties of the population or random variable.

Samples for estimating proportions

Steps for calculating the proportion of the population from the sample:

Calculate the proportion of the sample. This provides an estimate for the population.
Obtain the standard deviation of the population if available. The standard deviation of the sample can be used as an estimate for large samples (greater than 20). It is calculated as the square-root of the quantity of p(1-p), where p is the proportion in the sample.
The standard error of the estimated proportion is the standard deviation divided by the square-root of the sample size.
The confidence interval is centered around the estimated proportion from the sample. The distance of the interval's endpoints from the estimate depend on the level of confidence. For a 95% confidence level, the distance is the standard error times 1.960 (approximately 2, the text calls this value z*---see section 6.1).

Samples for estimating a mean

Steps for calculating the mean of the population from the sample:

Calculate the mean of the sample. This provides an estimate for the population.
Obtain the standard deviation of the population if available. The standard deviation of the sample can be used as an estimate for large samples (greater than 20).
The standard error of the estimated mean is the standard deviation divided by the square-root of the sample size.
The confidence interval is centered around the estimated mean from the sample. The distance of the interval's endpoints from the estimate depend on the level of confidence. For a 95% confidence level, the distance is the standard error times 1.960 (approximately 2, the text calls this value z*---see section 6.1).

T-distribution

Most statistical tools use the t-distribution for calculating a confidence interval for a mean. The t-distribution considers the error of using the standard deviation of the sample to estimate the standard deviation of the population (or random variable). Use of the t-distribution assumes a normal distribution for the data, but it still produces good approximations for large sample sizes (at least 15 without outliers or strong skewness---see Robustness of the t procedures in section 7.1).

Sources of error

Two things can go wrong in producing an estimate:

Chance error. Also called statistical error. This is simply the possibility of randomly choosing a sample that is not representative of the population.
Selection bias. This involves having a sampling method that does not produce a good random sample. Calculating a statistical confidence interval is useless if your sampling method tends to select observations that are not representative of the population.

Last modified: Wed Feb 09 13:08:47 Central Standard Time 2005