To Projects

CSC 423 -- Project 4

Not required for CSC 324 students.

Each problem is worth 10 points.

Part A. Prestige Dataset (40 pts.)

  1. Use the Prestige Dataset prestige.txt. This dataset is obtained from 1971 Canadian Census data about various professions that existed at that time.
  2. Fields in the dataset:
    1. Title of the profession (title)
    2. Average years of education for persons in profession (education)
    3. Average salary of profession (Education)
    4. Percentage of women in profession (Women)
    5. Average prestige rating (1-100) for profession in survey

  3. Construct a regression model to predict the prestige of a profession based on a subset of the independent variables. Include the log of income and the square root of income as possible independent variables. Do not use more than one of income, log(income), and sqrt(income) in any one model.  You will need to create new variables such as log_income and sqrt_income before obtaining your regression models.
  4. For your chosen final model or models, include a residual analysis.

Part B. Response Dataset (60 pts.)

  1. Use the ResponseSurface Dataset response.txt. This dataset shows the yield of a chemical process for various combinations of temperature and pressure.
  2. Fields in the dataset:
    1. Temperature of solution in chemical process (title).
    2. Pressure of solution in chemical process (pressure).
    3. Yield of chemical process (yield).

  3. Scale each independent variable linearly so that its minimum value is -1 and its maximum value is 1. Assign to t the scaled value of temperature and assign to p the scaled value of pressure. Create the new variables like this:

  4. If the independent variables are denoted as t (temperature) and p (pressure), and the dependent variable is denoted as y (yield), set up a regression model to predict y from the independent variables t, t*t, p, p*p, t*p. (This is called the full quadratic model.) What is the resulting regression equation for predicting y from these independent variables?
    1. If you are using SAS, use proc glm to compute the regression equation. This generates the contour plot automatically, which can be used to check your answer for Problem B4.
    2. If you are using SAS, also use proc reg to compute the regression equation. This generates residual plots automatically.
    3. You are not required to create a contour plot if you are using R, however, if you wish to create one, see the Surface Example.

  5. Find the critical point of the regression equation by computing the partial derivatives ∂y/∂t and ∂y/∂p, setting them to 0, and solving the set of two equations in two unknowns for the optional values of t and p.  The Surface Example shows how to do this.

Part B is an introduction to a field of study called response surface analysis.