Please find the accompanying file: Kraft_with_Error.txt. In this assignment, the goal of our modeling will be to predict a physical property of chemical compounds called the Krafft point, based on three potential predictor variables using a dataset of size n=32, and to assess the fit of this model using regression diagnostics. The variables are:
Y = Krafft point (Krafft) is the dependent response variable
X1 = Randic Index (RA)
X2 = Reciprocal of volume of the tail of the molecule (VTINV)
X3 = Heat of Formation (HEAT)
The regression model we will consider is:
One data point has been altered in the file: Kraft_with_Error.txt. Using only diagnostics output by SPSS (outlier detection measures discussed in the previous lectures), determine which data point has been altered from the original data set (Krafft.txt). The bad data point will have a value of Kafft Point far from expected, i.e., it will be either much larger or smaller than expected based on the regression model.
Once you have identified the data point that has a Krafft Point value that is obviously an extreme outlier, correct the data value for Krafft by changing its value to 29.0. Then, perform regression analysis with SPSS. Now, on the corrected dataset, evaluate the proposed model using regression diagnostics produced by SPSS. You can use any of the regression diagnostics for outlier and influential data point detection as well as collinearity diagnostics. Please be sure to include the following in your analysis:
• Start with a correlation matrix for the variables in the dataset and partial regression plots.
• Perform an analysis of residuals for detecting outliers and violations of assumptions. These measures include inspecting:
o Studentized deleted (SD, ie.,jackknifed) residuals, leverages, DFFITS, DFBETAS.
• Diagnostic Plots (e.g., residual diagnostic plots such as the P-P plot or histogram of SD residuals)
• Measures of collinearity:
o Variance Inflation Factors
o Variance proportions
o Eigenvalues
o Condition Number and Indices.
• Based on your analysis, what would you propose as the best model for Kraft point, based on either all or a subset of the three predictor variables?