- Create a scatterplot of Spending vs. Year. Answer:
> year <- c(2003, 2006, 2009)
> spending <- c(40.1, 47.8, 54.9)
> plot(year, spending, xlab="Year", ylab="Spending")
- Use R to obtain x, y,
SD+x, SD+y, and rxy Answer:
> year
[1] 2003 2006 2009
> spending
[1] 40.1 47.8 54.9
> mean(year)
[1] 2006
> mean(spending)
[1] 47.6
> sd(year)
[1] 3
> sd(spending)
[1] 7.402027
> cor(year, spending)
[1] 0.9997262
- Obtain the regression equation by hand using the statistics from Question 2b.
When performing hand calculations, you can use R as a calculator.
Verify your answer with the R lm function. Answer:
The regression equation is
y - y = (r * SD+y / SD+x)(x - x)
y - 47.6 = (0.9997262 * 7.402027 / 3)(x - 2006)
y - 47.6 = 2.466667 * (x - 2006)
y = 2.466667
x - 2.466667 * 2006 + 47.6
y = 2.466667 x -
4900.534
To compute the regression equation using R, we first need to create a data frame
containing the data. Then we use the lm function to obtain the regression model:
> df <- data.frame(x=year, y=spending)
> model <- lm(y ~ x, data=df)
> print(model)
Call:
lm(formula = y ~ x, data = df)
Coefficients:
(Intercept) x
-4900.533 2.467
- Compute the predicted values by hand. Check your
answer using this R function call:
> pred <- predict(model)
Use the model obtained in Exercise 2c:
y^1 = 2.467 * x1 - 4900.533 = 2.466667 * 2003 - 4900.534 = 40.2
y^2 = 2.467 * x2 - 4900.533 = 2.466667 * 2006 - 4900.534 = 47.6
y^3 = 2.467 * x3 - 4900.533 = 2.466667 * 2009 - 4900.534 =
55
We can perform this calculation in one line using R:
> 2.466667 * c(2003, 2006, 2009) - 4900.534
[1] 40.2 47.6 55.0
Check your answer using the R predict function, which obtains the
predicted values from the model:
> p <- predict(model)
1 2 3
40.2 47.6 55.0
- Compute the residuals, which are computed as
e^i = yi - y^i
e^1 = y1 - y^1 = 40.1 - 40.2 = -0.1
e^2 = y2 - y^1 = 47.8 -
47.6 = 0.2
e^3 = y3 - y^1 = 54.9 -
55.0 = -0.1
Check your answer using the R resid function, which
computes the residuals from the model. Answer:
> resid(model)
1 2 3
-0.1 0.2 -0.1
- Create the residual plot of residuals vs. predicted values. Answer:
> plot(resid(model), predict(model),
+ xlab="Predicted Values", ylab="Residuals",
+ main="Residual Plot")
- Compute the normal scores by hand (n=3).
Answer: the normal scores when n=3 divid the standard normal density into
4 equal areas or 25% each. The z-scores that do this are found at
-0.67, 0.00, and 0.67. These can be found using the R qnorm function like this:
> qnorm(c(0.25, 0.5, 0.75))
[1] -0.6744898 0.0000000 0.6744898
- Create and normal plot for the residuals. Answer:
> qqnorm(resid(model), main="Normal Plot of Residuals")