Confidence Intervals
Knowing the properties of a population or a random variable, we
were able to predict distributions of a sample. Now, we are
interested in taking a sample and using it to estimate the
properties of the population or random variable.
Samples for estimating proportions
Steps for calculating the proportion of the population from the
sample:
- Calculate the proportion of the sample. This provides an
estimate for the population.
- Obtain the standard deviation of the population if
available. The standard deviation of the sample can be used as
an estimate for large samples (greater than 20). It is
calculated as the square-root of the quantity of p(1-p), where p
is the proportion in the sample.
- The standard error of the estimated proportion is the
standard deviation divided by the square-root of the sample
size.
- The confidence interval is centered around the estimated
proportion from the sample. The distance of the interval's
endpoints from the estimate depend on the level of confidence.
For a 95% confidence level, the distance is the standard error
times 1.960 (approximately 2, the text calls this value z*---see
section 6.1).
Samples for estimating a mean
Steps for calculating the mean of the population from the
sample:
- Calculate the mean of the sample. This provides an
estimate for the population.
- Obtain the standard deviation of the population if
available. The standard deviation of the sample can be used as
an estimate for large samples (greater than 20).
- The standard error of the estimated mean is the
standard deviation divided by the square-root of the sample
size.
- The confidence interval is centered around the estimated
mean from the sample. The distance of the interval's
endpoints from the estimate depend on the level of confidence.
For a 95% confidence level, the distance is the standard error
times 1.960 (approximately 2, the text calls this value z*---see
section 6.1).
T-distribution
Most statistical tools use the
t-distribution for calculating a confidence interval for a mean.
The t-distribution considers the error of using the standard
deviation of the sample to estimate the standard deviation of the
population (or random variable). Use of the t-distribution
assumes a normal distribution for the data, but it still produces
good approximations for large sample sizes (at least 15 without
outliers or strong skewness---see Robustness of the t procedures
in section 7.1).
Sources of error
Two things can go wrong in producing an estimate:
- Chance error. Also called statistical error. This
is simply the possibility of randomly choosing a sample that is
not representative of the population.
- Selection bias. This involves having a sampling
method that does not produce a good random sample. Calculating
a statistical confidence interval is useless if your sampling
method tends to select observations that are not representative
of the population.
Last modified: Wed Feb 09 13:08:47 Central Standard
Time 2005