Collect a univariate dataset containing at
least 15 observations that follows the ideal measurement
model:
Actual measurement = true measurement + random error
Here are some suggestions:
Time for a cup of water to boil on a stove
Time for your dog to come to you when you call it
Time to go to work over the same route
Time for a Red Line train to travel from the Fullerton stop to the Jackson stop
Time for an email message sent to yourself to arrive
Number of words on random pages of a book that does not contain figures or tables
Number of words in newspaper articles
Your pulse rate measured at random times during the day
Your weight measured at random times during the day
Weight of packages of meat at a store of approximately the same weight (you may need to go to more than one store)
Weight of each potato in a 10 pound bag
Lengths of "30 second" television commercials in seconds
Newspaper prices of used cars, which are of the same make and model
Heights of persons of the same gender in this class
The roundtrip ping time for an IP packet
The number of time for a spring loaded kitchen timer to "time" 30 seconds.
(Because of random error, there may be more variation than you think.)
Then use R to answer these questions. Type the answers to the starred
questions at the top of your output file (.docx file type). In your answers,
include the R statements that you used to obtain your answer.
Also make sure that your name is shown at the top of your submission
as well as in the filename.
Important: Do not sort the data before performing the
analyses.
*Describe your dataset and the circumstances under which you collected the data.
Print your dataset.
*What are Q0, Q1, Q2, Q3, and Q4?
*What are the sample mean and standard deviation?
Use R to plot histograms with three different interval widths.
*Use R to plot a boxplot. What does the boxplot tell you?
*List any moderate or extreme outliers using the boxplot. See Hint 1 below.
Create a new column of z-scores in the dataset.
Print the z-scores.
*List any moderate or extreme outliers using z-scores. See Hint 2
below.
*Plot your dataset by observation number. Describe this plot using
the terms unbiased, biased, homoschedastic, and heteroscedastic.
Can you think of any lurking variables, also called
confounding variables, that might cause your dataset
to deviate from an ideal measurement model? A lurking variable is a variable
not included in the dataset that might affect the results.
Hints:
To find outliers using Q1, Q3, and IQR:
An extreme outlier is a data point that is more than 3 IQRs below Q1
or more than 3 IQRs above Q3.
An extreme outlier is a data point that is more than 1.5 IQRs below Q1
or more than 1.5 IQRs above Q3, and is not an extreme outlier.
To find outliers using z-scores:
An extreme outlier is a data point that has a z-score of more than
3 or less than -3.
An mild outlier is a data point that has a z-score greater than
2 or less than -2, and is not an extreme outlier.