## Analyzing Accounts

Open the file Assignment 4 Westend House Sales 2021.xlxs. The data set has the Revised data with missing values estimated, the Dictionary, and the model showing Price as a function of Space and Beds, including the Residuals.

1. Can we improve the model by taking account of the condition of the basement or whether there is a garage? Copy the Residuals to a new sheet. Copy the variables, Basement and Garage, to this sheet.
a. Construct a Box and Whisker chart of the Residuals. Take a screen shot of the chart and paste it into your assignment.
b. Edit the chart to put Basement on the horizontal axis. Label the horizontal axis as Basement Condition. Take a screen shot of the chart and paste it into your assignment.
c. Edit the chart to remove Basement and put Garage on the horizontal axis. Label the horizontal axis as Garage. Take a screen shot of the chart and paste it into your assignment.
d. Do you think taking account of Basement or Garage will improve the model?
2. Let us recode Garage as a 0-1 variable and add it to the model with Space and Beds.
• Copy Beds, Space, Garage and Price to a new sheet.
• Insert a column between Space and Garage and call it Garage01.
• In the first cell in the Garage01 column (I am assuming it is column C and Garage is in column D), type =IF(D2=”yes”,1,0). If done correctly, you should get a response of 1 since the first home has a garage.
• Copy this formula into the rest of the Garage01 column.
a. Fit the regression model Estimated Average Price = intercept + coeff1Space + coeff2Beds + coeff3*Garage01. Take a screen shot of the Regression output and paste it into your assignment.
b. Write out the regression equation.
c. Explain the meaning of the coefficient for Garage01.
d. In Assignment 3, we found that just using Space and Beds we obtained an R-sq of 62.3% and a standard error of \$83,683. What are the R-sq and standard error for the new model? Are our predictions better?
e. In question 1, you likely found that the presence of a garage did not affect the residuals and its inclusion would not be helpful. Look at the standard error for the Garage01 coefficient. The value of the coefficient is an estimate of the correct value. By the Empirical Rule, we should not be surprised if the true value of the Garage01 coefficient is one standard error more of less than the estimated value. Within what range would you expect the true value to be?
3. We previously found that the List price was a very good predictor of the final selling Price. COVID is believed to have had a significant impact on the housing market. The variable “Days” reflects how many days before February 20, 2021, that the house was put up for sale. For example, a house with Days=300 would be a house that was listed for sale in early April 2020. If you were to look at the distribution of Days, you would find that there was a surge in houses on the market in summer 2020, but then the supply dropped in the fall of 2020. It is believed that the increase in supply was because prices rose rapidly due to COVID.

But if the List price is too high, it will likely take longer to sell. The variable Market reflects the number of days that the house was on the market before it sold. If there is strong demand for a house, it will receive one or more offers very soon after listing, although it may take some time to close the sale.

So Days and Market may give additional insight into the relationship between List and Price. Copy the columns List, Days, Market and Price into a new sheet.

a. Fit the model, Estimated Average Price = intercept + coeff1List + coeff2Days + coeff3*Market. Take a screen shot of the Regression output and paste it into your assignment.
b. The simple model in Assignment #3 with Price predicted just with List, the R-sq was 90.77% and the standard error was \$41,166. What are the R-sq and standard error for the new model?
c. Explain the coefficient for Days. What does it tell you about prices during the pandemic?
d. Look at the standard error of the Days coefficient. Do you think that the real average increase in house prices could really be as low as zero? That is, the coefficient is just an estimate of the average change in house prices per day. Could the estimate be so far off that there really is no increase in average prices?

1. In assignment #2 we examined patterns among variables in the MGSC 1206 student surveys of 2020 and 2021 by using pivot tables. We used % of Row or Column totals for several questions. The values in the tables were estimates of probabilities and conditional probabilities based upon relative frequencies. To better understand these pivot tables, it is insightful to examine the probabilities in more detail. Let us look at the question which asked students how often they worried about grades and how often they experienced depression. The observed frequencies are shown below. Rows and columns have been grouped so that there are only three levels for each variable. If you wish to use Excel to answer the questions, open the file Assignment 4 data – class surveys 2020 and 2021.xlxs. You can also answer the questions by hand if you wish.

a. What is the probability that a randomly selected student is depressed more than once per month?
b. What is the probability that a randomly selected student is depressed more than once per month and worries about grades most of the time?
c. What is the probability that a randomly selected student is depressed more than once per month or worries about grades most of the time?
d. Among students that are depressed more than once per month, what is the chance that a randomly selected student also worries about grades most of the time?
e. Among students that worry about grades most of the time, what is the chance that a randomly selected student also is depressed more than once per month?
f. Are the events “depressed more than once per month” and “worry about grades most of the time” independent? Explain and justify your answer using probability.
g. Are the events “depressed more than once per month” and “worry about grades most of the time” mutually exclusive? If they are not mutually exclusive, identify any pair of events that are mutually exclusive in this exercise and explain why they are mutually exclusive.