To Notes

IT 223 -- Apr 3, 2024

Review Exercises

  1. What is the proper filename for project submissions?
  2. Matching:
    a. Fisher 1. First Stated the Central Limit Theorem
    b. Cotes 2. Tried to find the "ideal man" that nature was trying to produce
    c. Tukey 3. Was the first demographer to use statistics.
    d. de Moivre 4. Coined the term "exploratory data analysis."
    e. Pascal 5. Father of modern statistics.
    f. Gauss 6. First described the least squares method.
    g. Graunt 7. First studied the theory of errors applied to astronomy.
    h. Galton 8. First applied the theory of probability to gambling.
    i. Quetelet 9. Introduced the concept of correlation.

    Answer: a. 5; b. 7; c. 4; d. 1; e. 8; f. 6; g. 3; h. 9, i. 2.
  3. What is a lurking variable?
    Answer: A variable that is not included in the dataset, but should be. Another name for a lurking variable is a confounding variable.
  4. Why is it important for a clinical trial to be randomized and double blind?
    Answer: a clinical trial should be randomized to minimize the effect of lurking variables. The randomization insures that effects due to variables not included in the dataset are similarly distributed between both the treatment and the control group. A clinical study should be double blind so that psychological effects that result from the patient knowing that he or she has received the treatment vs. knowing that he or she has received the placebo.  The doctor treating the patient should also not know whether that patient has received the treatment or the placebo.
  5. What is the difference between a controlled experiment and an observational study?
    Answer: In a controlled experiment, each subject is randomly assigned the treatment or placebo, to reduce the effect of lurking variables. In an observational study, no experiment is performed. The data is just reported and analyzed "as is."
  6. Give examples of categorical, ordinal, and continuous variables.
    Answer: categorical: gender (M, F, X (non-binary), ordinal: year in college (1, 2, 3, 4), continuous: height.
  7. Create a stemplot from these Celsius temperatures:
    38 54 52 49 65 58
    Answer:
    6|5
    5|428
    4|9
    3|8
    
  8. True or False. In the R language, vectors are zero-based.
    Answer: False, R vectors are one-based.
  9. Which R function is used to create a numeric vector.
    Answer: the c function, which means combine. For example:
    x <- c(4, 2, 7, 5)
    
  10. What is the R assignment operator?
    Answer: <-, for example,
    x <- c(4, 2, 7, 5)
    
    -> is also the assignment operator:
    c(4, 2, 7, 5) -> x
    
    The = operator can be used for the assignment operator, but it is not recommended. The = operator also has other meanings.

Quartiles

Boxplots

Histograms

Practice Problems

  1. Match the descriptions with the histograms below:
    1. The gender of all persons in a college class (male = 0, female = 1).
    2. The handedness of all persons in a college class (left handed = 0, right handed = 1).
    3. The heights of all married persons counted separately.
    4. The heights of all persons in families where both parents are 28 years old or less.
    5. The heights of all automobiles.
    6. The incomes of all persons in the U.S.

    Ans: 1 iv, 2 iii, 3 i, 4 ii, 5 v, 6 vi.

Using R

Project 2

  • the number of years of schooling of all persons in the U. S.