Description
This assignment relates to the ‘Real-estate’ data set. People buying or selling houses would like to know how much they can expect to get, or pay, for a property. This is also a concern for those who are making mortgage loans, or for those taxing real estate (and who are more likely to commission statistical studies than individual home-owners). The price of a house depends on its physical characteristics, including size, features, quality of construction, age, etc. It also depends on location, and current market characteristics. You are approached by a research group which has a data on a sample of residential sales in a midwestern city; the variables are described in Table below.
-
Read the data into R. Call the loaded data “real.estate”.
- Answer the following sub-questions
- Use the “summary()” function to identify the types of variables. Which variables are categorical? Which variables are quantitative? Are there any concerns in the summary table? Explain.
- Use the “pairs()” or “gpairs()” function to produce a scatterplot matrix of the first ten columns or variables of the data. Recall that you can reference the first then columns of a matrix A using A [,1:10]. Is there any interesting patterns? Which variables seem associated with the sales price? Explain.
- Use the “as.factor” function to regenerate categorical variables.
- Fit the models and address the following when building that model:
- Fit the null model and the full model
- Find the best sets of predictors using the stepwise procedures.
- Find the best sets of predictors using the best subset approach.
- Considering the models in part (ii) and part(iii), choose the best model.
- Interpret the coefficients of the best model in this context.
- Evaluate the best model.
- Check the assumptions using “plot()” function.
- Continue exploring the data, and provide a brief summary of what you discover.