Distributions of sets of observations and samples from populations

An SRS assumes that each sample item has an equal chance of being selected. Based on what we know of the population, we can predict what the sample distribution will be. Later, we will examine the sample and make predictions on the population.

We often call the standard deviation of a sample distribution the standard error.

Observation sets for Counts and Proportions

These are for binary ("success" or "failure") outcomes. Each element in the population has one of two possible outcomes. Knowing the probability or proportion of "yes" outcomes in the population, we can predict the distribution of counts and proportions in a set of size n.

Binomial setting

These cases are described by a binomial distribution. Here are the properties of a binomial setting:

Our sample consists of n observations.
The n observations are collected independently. That is, choosing one observation from the population does not influence how the other observations are chosen.
Each observation falls into one of two categories. We can call these two categories "success" and "failure".
The probability of "success", called p, is the same for all observations.

The binomial distribution is often characterized as B(n, p).

Note: we will discuss the technical difference between a binomial setting and a Simple Random Sample in class. For large populations the difference is not significant.

Properties of binomial distributions

If we obtain multiple sets of observations, the distribution of counts has the following properties:

The mean number of counts is np.
The standard deviation of the counts is square-root of the quantity np(1 - p).

If we obtain multiple sets of observations, the distribution of proportions has the following properties:

The mean proportion is p.
The standard deviation of the proportion is square-root of the quantity p(1 - p) / n.

Examples of binomial distributions

Coin tosses: If we call heads a success, the probability of tossing heads is p.
Left-handedness of randomly chosen person. We can have p be the proportion of left-handed people in the general population. This is estimated at 13%.

Here is a simulation program that generates 500 samples of 100 coin tosses.

Observation sets for Means and Sums

In the binomial setting, each observation has only a possibility of two different outcomes (e.g. "Success" or "Failure"). However, populations often consist of items of many different values. In this case, our set of observations would then have a mean value.

Distribution of means from a set of observations

If we collect many sets of observations, the mean of their means equals equal the mean of the population. The standard deviation of their means equals the standard deviation of the population divided by the square root of the size of the observation set (n).

For large populations, means and standard deviations of SRSs can also be calculated using the above rule.

Central Limit Theorem

Means and sums from SRSs fit a normal distribution.

For means, the distribution has the following parameters:

The mean is the population's mean (or the mean of the random variable).
The standard deviation is the standard deviation of the population (or of the random variable) divided by the square root of the sample size (n).

For sums, the distribution has the following parameters:

The mean is the population's mean (or the mean of the random variable) times the sample size.
The standard deviation is the standard deviation of the population (or of the random variable) times the square-root of the sample size (n).

Note that the text does not discuss calculating sums from a sample.

Example

This excel file has the dates of 100 pennies. We will discuss taking simple random samples from this population of 100.

Last modified: Wed Feb 09 13:08:47 Central Standard Time 2005