Collect a univariate dataset containing at
least 15 observations that follows the ideal measurement
model:
Actual measurement = true measurement + random error
Here are some suggestions:
Time for a cup of water to boil on a stove
Time for your dog to come to you when you call it
Time to go to work over the same route
Time for a Red Line train to travel from the Fullerton stop to the Jackson stop
Time for an email message sent to yourself to arrive
Number of words on random pages of a book that does not contain figures or tables
Number of words in newspaper articles
Your pulse rate measured at random times during the day
Your weight measured at random times during the day
Weight of packages of meat at a store of approximately the same weight (you may need to go to more than one store)
Weight of each potato in a 10 pound bag
Lengths of "30 second" television commercials in seconds
Newspaper prices of used cars, which are of the same make and model
Heights of persons of the same gender in this class
The roundtrip ping time for an IP packet
The number of time for a spring loaded kitchen timer to "time" 30 seconds.
(Because of random error, there may be more variation than you think.)
Then use R to answer these questions. Type the answers to the starred
questions at the top of your output file (.docx file type). In your answers,
include the R statements that you used to obtain your answer.
Also make sure that your name is shown at the top of your submission
as well as in the filename.
Important: Do not sort the data before performing the
analyses.
*Describe your dataset and the circumstances under which you collected the data.
Create a CSV file that contains your dataset.
Create a CSV file that contains your dataset. Use R to create a dataframe
from your CSV file and print your dataset.
*What are Q0, Q1, Q2, Q3, and Q4?
*What are the sample mean and standard deviation?
Use R to plot three histogram one with the default bins, and two other
histograms where you set the bin boundaries (break points) explicity. See the
example in the discussion of Project 1a at the end of the Jan 21 Notes if you
are not sure how to do this.
*Use R to plot a boxplot. What does the boxplot tell you?
*List any moderate or extreme outliers using the boxplot. See Hint 1 below.
Create a new column of z-scores in the dataset.
Print the z-scores.
*List any moderate or extreme outliers using z-scores. See Hint 2
below.
Hints:
To find outliers using Q1, Q3, and IQR:
An extreme outlier is a data point that is more than 3 IQRs below Q1
or more than 3 IQRs above Q3.
An extreme outlier is a data point that is more than 1.5 IQRs below Q1
or more than 1.5 IQRs above Q3, and is not an extreme outlier.
To find outliers using z-scores:
An extreme outlier is a data point that has a z-score of more than
3 or less than -3.
An mild outlier is a data point that has a z-score greater than
2 or less than -2, and is not an extreme outlier.