Project 1

To Projects

IT 223 -- Project 2a

Part A: Analyzing a Univariate Dataset

Collect a univariate dataset containing at least 15 observations that follows the ideal measurement model:
- Actual measurement = true measurement + random error
Here are some suggestions:
- Time for a cup of water to boil on a stove
- Time for your dog to come to you when you call it
- Time to go to work over the same route
- Time for a Red Line train to travel from the Fullerton stop to the Jackson stop
- Time for an email message sent to yourself to arrive
- Number of words on random pages of a book that does not contain figures or tables
- Number of words in newspaper articles
- Your pulse rate measured at random times during the day
- Your weight measured at random times during the day
- Weight of packages of meat at a store of approximately the same weight (you may need to go to more than one store)
- Weight of each potato in a 10 pound bag
- Lengths of "30 second" television commercials in seconds
- Newspaper prices of used cars, which are of the same make and model
- Heights of persons of the same gender in this class
- The roundtrip ping time for an IP packet
- The number of time for a spring loaded kitchen timer to "time" 30 seconds. (Because of random error, there may be more variation than you think.)
Then use R to answer these questions. Type the answers to the starred questions at the top of your output file (.docx file type). In your answers, include the R statements that you used to obtain your answer. Also make sure that your name is shown at the top of your submission as well as in the filename.
Important: Do not sort the data before performing the analyses.
1. *Describe your dataset and the circumstances under which you collected the data.
2. Print your dataset.
3. *What are Q0, Q1, Q2, Q3, and Q4?
4. *What are the sample mean and standard deviation?
5. Use R to plot histograms with three different interval widths.
6. *Use R to plot a boxplot. What does the boxplot tell you?
7. *List any moderate or extreme outliers using the boxplot. See Hint 1 below.
8. Create a new column of z-scores in the dataset. Print the z-scores.
9. *List any moderate or extreme outliers using z-scores. See Hint 2 below.
10. *Plot your dataset by observation number. Describe this plot using the terms unbiased, biased, homoschedastic, and heteroscedastic. Can you think of any lurking variables, also called confounding variables, that might cause your dataset to deviate from an ideal measurement model? A lurking variable is a variable not included in the dataset that might affect the results.

Hints:

To find outliers using Q1, Q3, and IQR:
- An extreme outlier is a data point that is more than 3 IQRs below Q1 or more than 3 IQRs above Q3.
- An extreme outlier is a data point that is more than 1.5 IQRs below Q1 or more than 1.5 IQRs above Q3, and is not an extreme outlier.
To find outliers using z-scores:
- An extreme outlier is a data point that has a z-score of more than 3 or less than -3.
- An mild outlier is a data point that has a z-score greater than 2 or less than -2, and is not an extreme outlier.
To compute z-scores:
```
> z <- (d - mean(d)) / sd(d)
```
To plot the dataset d by observation number:
```
> plot(1:length(d), d)
```