To Projects
CSC 423 -- Project 4
Not required for CSC 324 students.
Each problem is worth 10 points.
Part A. Prestige Dataset (40 pts.)
- Use the Prestige Dataset prestige.txt. This dataset is obtained
from 1971 Canadian Census data about various professions
that existed at that time.
- Fields in the dataset:
- Title of the profession (title)
- Average years of education for persons in profession (education)
- Average salary of profession (Education)
- Percentage of women in profession (Women)
- Average prestige rating (1-100) for profession in survey
- Construct a regression model to predict the prestige of a profession based on a subset of the
independent variables. Include the log of income and the square root of income
as possible independent variables.
Do not use more than one of income, log(income), and sqrt(income) in any one
model. You will need to create new variables such as log_income and
sqrt_income before obtaining your regression models.
- For your chosen final model or models, include a residual analysis.
Part B. Response Dataset (60 pts.)
- Use the ResponseSurface Dataset response.txt. This dataset shows the
yield of a chemical process for various combinations of temperature and
pressure.
- Fields in the dataset:
- Temperature of solution in chemical process (title).
- Pressure of solution in chemical process (pressure).
- Yield of chemical process (yield).
- Scale each independent variable linearly so that its minimum value is -1 and its maximum
value
is 1. Assign to t the scaled value of temperature
and assign to p the scaled value of pressure. Create the new variables like this:
t = (temperature - 250) / 10
p = (pressure - 145) / 5
- If the independent variables are denoted as t (temperature) and p
(pressure), and the dependent variable is denoted as y (yield), set up a regression model
to predict y from the independent variables t, t*t, p, p*p, t*p. (This is called the full quadratic model.)
What is the resulting regression equation for predicting y from these independent variables?
- If you are using SAS, use proc glm to compute the regression equation. This generates the contour plot automatically, which
can be used to check your answer for Problem B4.
- If you are using SAS, also use proc reg to compute the regression equation. This generates residual plots automatically.
- You are not required to create a contour plot if you are using R, however,
if you wish to create one, see the Surface Example.
- Find the critical point of the regression equation by computing the partial derivatives
∂y/∂t and ∂y/∂p, setting them to 0, and solving the set of two equations in two unknowns for the optional
values of t and p. The Surface Example shows how to do this.
Part B is an introduction to a field of study called response surface analysis.