STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

Similar documents
Seasonality at The Oslo Stock Exchange

Homework Assignment Section 3

> attach(grocery) > boxplot(sales~discount, ylab="sales",xlab="discount")

Non-linearities in Simple Regression

Study 2: data analysis. Example analysis using R

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Linear regression model

Multiple regression - a brief introduction

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Final Exam Suggested Solutions

Homework Assignment Section 3

NHY examples. Bernt Arne Ødegaard. 23 November Estimating dividend growth in Norsk Hydro 8

Introduction to Population Modeling

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Stat 328, Summer 2005

Analysis of Variance in Matrix form

MODEL SELECTION CRITERIA IN R:

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Regression and Simulation

Problem Set 9 Heteroskedasticty Answers

R is a collaborative project with many contributors. Type contributors() for more information.

PASS Sample Size Software

Stat 401XV Exam 3 Spring 2017

1 Estimating risk factors for IBM - using data 95-06

Rand Final Pop 2. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Fall 2004 Social Sciences 7418 University of Wisconsin-Madison Problem Set 5 Answers

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Quantitative Techniques Term 2

Homework Solutions - Lecture 2 Part 2

Tests for the Difference Between Two Linear Regression Intercepts

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

CHAPTER 4 DATA ANALYSIS Data Hypothesis

Chapter 7 Notes. Random Variables and Probability Distributions

starting on 5/1/1953 up until 2/1/2017.

GARCH Models. Instructor: G. William Schwert

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

The histogram should resemble the uniform density, the mean should be close to 0.5, and the standard deviation should be close to 1/ 12 =

Jaime Frade Dr. Niu Interest rate modeling

Final Exam - section 1. Thursday, December hours, 30 minutes

Lecture 8: Single Sample t test

Assessing Model Stability Using Recursive Estimation and Recursive Residuals

Maximum Likelihood Estimation

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Generalized Linear Models

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty

Business Statistics: A First Course

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

Variance clustering. Two motivations, volatility clustering, and implied volatility

Logit Models for Binary Data

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

State Ownership at the Oslo Stock Exchange. Bernt Arne Ødegaard

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

> > is.factor(scabdata$trt) [1] TRUE > is.ordered(scabdata$trt) [1] FALSE > scabdata$trtord <- ordered(scabdata$trt, +

Economics 424/Applied Mathematics 540. Final Exam Solutions

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Solutions to Midterm

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Econometric Methods for Valuation Analysis

The Norwegian State Equity Ownership

20135 Theory of Finance Part I Professor Massimo Guidolin

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

Study The Relationship between financial flexibility and firm's ownership structure in Tehran Stock Exchang.

When determining but for sales in a commercial damages case,

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Advanced Econometrics

Chapter 4 Level of Volatility in the Indian Stock Market

Assicurazioni Generali: An Option Pricing Case with NAGARCH

Computer Lab Session 3 The Generalized Linear Regression Model

Graduated from Glasgow University in 2009: BSc with Honours in Mathematics and Statistics.

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

Economics 413: Economic Forecast and Analysis Department of Economics, Finance and Legal Studies University of Alabama

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

Cross- Country Effects of Inflation on National Savings

Tests for One Variance

σ e, which will be large when prediction errors are Linear regression model

CHAPTER III METHODOLOGY

Time series: Variance modelling

Homework 0 Key (not to be handed in) due? Jan. 10

Regression. Lecture Notes VII

Chapter 6. Transformation of Variables

Predicting Charitable Contributions

2SLS HATCO SPSS, STATA and SHAZAM. Example by Eddie Oczkowski. August 2001

Economics 345 Applied Econometrics

The line drawn for part (a) will depend on each student s subjective choice about the position of the line. For this reason, it has been omitted.

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Random Effects... and more about pigs G G G G G G G G G G G

A Brief Illustration of Regression Analysis in Economics John Bucci. Okun s Law

Problem max points points scored Total 120. Do all 6 problems.

University of New South Wales Semester 1, Economics 4201 and Homework #2 Due on Tuesday 3/29 (20% penalty per day late)

Model Construction & Forecast Based Portfolio Allocation:

NEWCASTLE UNIVERSITY. School SEMESTER /2013 ACE2013. Statistics for Marketing and Management. Time allowed: 2 hours

Non-Inferiority Tests for the Ratio of Two Means in a 2x2 Cross-Over Design

The Likelihood Ratio Test

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS

Transcription:

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 For this assignment use the Diamonds dataset in the Stat2Data library. The dataset is used in examples in Sections 3.4 and 3.5 of the book, and for part of the Discussion on Friday, November 3. The pages from the book with the exercises below are linked to the course webpage along with this assignment for those who don t have a copy of the book. 1. Do Exercise 3.23 on page 157. For each model in parts a to d, show the results of "summary(model)" from R, then choose a model and explain your choice, as instructed. R instructions for creating second order models and adding quadratic (squared) terms and interactions were given in Discussion 5 (Nov 3) and Lecture 11 (Nov 6). a. > summary(depth) lm(formula = TotalPrice ~ Depth + I(Depth^2)) -9323-4251 -2676 2134 45513 (Intercept) -28406.783 112211.790-0.253 0.800 Depth 766.369 3353.222 0.229 0.819 I(Depth^2) -3.233 24.869-0.130 0.897 Residual standard error: 7616 on 348 degrees of freedom Multiple R-squared: 0.04748, Adjusted R-squared: 0.042 F-statistic: 8.673 on 2 and 348 DF, p-value: 0.0002111 b. > summary(both) lm(formula = TotalPrice ~ Carat + Depth) -9234.7-1223.7-274.3 1161.0 16368.6 (Intercept) 1059.24 1918.36 0.552 0.581 Carat 15087.01 320.96 47.006 < 2e-16 *** Depth -134.94 30.92-4.364 1.68e-05 *** Residual standard error: 2809 on 348 degrees of freedom Multiple R-squared: 0.8704, Adjusted R-squared: 0.8696 F-statistic: 1168 on 2 and 348 DF, p-value: < 2.2e-16 c. > summary(inter) lm(formula = TotalPrice ~ Depth * Carat) -8254.4-1311.5-157.2 1131.8 14513.9 (Intercept) 31171.41 4219.58 7.387 1.13e-12 *** Depth -598.18 65.47-9.137 < 2e-16 ***

Carat -11827.73 3436.47-3.442 0.000648 *** Depth:Carat 408.45 51.96 7.861 4.84e-14 *** Residual standard error: 2592 on 347 degrees of freedom Multiple R-squared: 0.89, Adjusted R-squared: 0.889 F-statistic: 935.7 on 3 and 347 DF, p-value: < 2.2e-16 d. > summary(second) lm(formula = TotalPrice ~ Depth * Carat + I(Depth^2) + I(Carat^2)) -12196.1-652.7-38.5 485.7 10582.2 (Intercept) 24338.820 30297.912 0.803 0.4223 Depth -728.700 904.439-0.806 0.4210 Carat 7573.620 3040.787 2.491 0.0132 * I(Depth^2) 5.276 6.727 0.784 0.4333 I(Carat^2) 4761.592 330.246 14.418 <2e-16 *** Depth:Carat -83.891 53.530-1.567 0.1180 Residual standard error: 2053 on 345 degrees of freedom Multiple R-squared: 0.9313, Adjusted R-squared: 0.9304 F-statistic: 936.1 on 5 and 345 DF, p-value: < 2.2e-16 Choice of best model: The usual criteria to use include the largest Adjusted R-squared or the smallest MSE or the smallest Residual standard error. (They will always give you the same model, since they are all functions of MSE.) So using any of those, the best model of the four models here and the two in Example 3.11 is the one in Part (d), the full second order model. 2. Do Exercise 3.24 on page 158. R instructions for creating the log of a variable were given in Lecture 3 (Oct 9) and Discussion 2 (Oct 13). R instructions for creating plots to check conditions were given in Discussion 2 (Oct 13) and Discussion 5 (Nov 3) 3.24 a. To examine the constant variance condition, the appropriate graph is the residuals vs fitted values, shown on the right. To examine the normality condition, an appropriate graph is a normal probability plot, shown on the right below (next page), or a histogram of the residuals, shown on the left below. All three graphs show problems with the conditions being met. For the graph on the right, it looks like the variance is increasing. For the normal probability plot, the points deviate substantially from the line, indicating substantial non-normality, also shown by the histogram.

3.24 b. The output for the model with logprice as the response is shown below. The model is still a reasonable choice. It has high Adjusted R-squared, and all of the coefficients have relatively small p- values, even though a few of them are above 0.05. You might want to try dropping the interaction term to see what happens, but that is not required. > summary(second2) lm(formula = logprice ~ Depth * Carat + I(Depth^2) + I(Carat^2)) -0.85021-0.13209 0.01441 0.13613 0.79710 (Intercept) 13.5049624 3.4020467 3.970 8.76e-05 *** Depth -0.2027689 0.1015563-1.997 0.0467 * Carat 2.5863485 0.3414393 7.575 3.33e-13 *** I(Depth^2) 0.0013384 0.0007553 1.772 0.0773. I(Carat^2) -0.5714071 0.0370821-15.409 < 2e-16 *** Depth:Carat 0.0095943 0.0060107 1.596 0.1114 Residual standard error: 0.2306 on 345 degrees of freedom Multiple R-squared: 0.9302, Adjusted R-squared: 0.9292 F-statistic: 919.9 on 5 and 345 DF, p-value: < 2.2e-16 3.24 c. The plot of the residuals versus fitted values is shown on the right. The plots to check normality are shown on the next page. While not perfect, the log transformation clearly has helped with both the constant variance and the normality conditions.

3. Do Exercise 3.25 on page 158. R instructions for nested F tests were given in Lecture 8 (Oct 25), Lecture 11 (Nov 6), and Discussion 5 (Nov 3). 3.25 The models being compared are: Full model: TotalPrice = β0 + β1 Carat + β2 Depth+ β3 Carat 2 + β4 Depth 2 + β5 Carat Depth + ε Reduced model: TotalPrice = β0 + β1 Carat + β2 Carat 2 + ε So, using notation from the Full model, the hypotheses are: Null: β2 =β4 = β5 = 0, or in context, Depth, Depth 2 and Carat Depth can be removed from the model Alternative: Not all of those coefficients are 0, or in context, at least one of the terms involving Depth is needed in the model. From the R output below, F = 9.43 and p = 5.24 10-6, so clearly reject the null hypothesis. Conclude that at least one of the terms involving Depth is needed in the model. The R output is as follows: > second <- lm(totalprice ~ Depth * Carat + I(Depth^2) + I(Carat^2)) > nodepth<-lm(totalprice ~ Carat + I(Carat^2)) > anova(nodepth,second) Analysis of Variance Table Model 1: TotalPrice ~ Carat + I(Carat^2) Model 2: TotalPrice ~ Depth * Carat + I(Depth^2) + I(Carat^2) Res.Df RSS Df Sum of Sq F Pr(>F) 1 348 1574044410 2 345 1454702094 3 119342316 9.4345 5.24e-06 *** 4. Do Exercise 3.26 on page 158. R instructions for prediction and confidence intervals for simple linear regression were given in Lecture 6 (Oct 18) and Discussion 3 (Oct 20) but you will need to expand it to include multiple variables. Here is an example for the command to create a confidence interval for the second order model using the StateSAT data for a state with 41% Takers and Expend value of $25:

predict(secondorder, list(takers=41, Expend = 25), se.fit=f, interval="c") You could also define newdata first, and then use it in the predict command: newdata<-data.frame(takers=41, Expend=25) predict(secondorder, newdata, se.fit=f, interval="c") 3.26 a. Answer can be found using R or computing from the equation given in Example 3.11. The predicted price is $1794.84. (Depending on rounding, your answer could differ slightly.) 3.26 b. Here is the R code and results for a 95% confidence interval: > Quad<-lm(TotalPrice ~ Carat + I(Carat^2), data = Diamonds) > predict(quad, list(carat=0.5), se.fit=f, interval="c") 1 1794.843 1424.296 2165.389 The 95% confidence interval is $1424.30 to $2165.39. Interpretation: We are 95% confident that the mean price of all 0.5 carat diamonds is between $1424.30 and $2165.39. 3.26 c. Here is the R code and results for a 95% prediction interval: > Quad<-lm(TotalPrice ~ Carat + I(Carat^2), data = Diamonds) > predict(quad, list(carat=0.5), se.fit=f, interval="p") 1 1794.843-2404.462 5994.147 The 95% prediction interval is -$2404.46 to $5994.15, but negative prices don t make sense. So we would write the prediction interval as $0 to $5994.15. Interpretation: Here are some possible ways to write it: We are 95% confident that the price of a randomly selected 0.5 carat diamond will be between $0 and $5994.15. We expect that 95% of all 0.5 carat diamonds will cost between $0 and $5994.15. For all 0.5 carat diamonds, about 95% of them will cost between $0 and $5994.15. For any of the above you could replace $0 and $5994.15 with something like no more than 5994.15. For instance, For all 0.5 carat diamonds, about 95% of them will cost no more than $5994.15. 3.26 d. If you didn t already do so for 3.24, you first you need to create the log variable, or you could do it directly in the lm command: > LnFit <-lm(log(totalprice) ~ Depth * Carat + I(Depth^2) + I(Carat^2), data = Diamonds) Then create the intervals: > predict(lnfit, list(carat=0.5, Depth=62), se.fit=f, interval="c") 1 7.525992 7.484671 7.567314 > predict(lnfit, list(carat=0.5, Depth=62), se.fit=f, interval="p") 1 7.525992 7.070612 7.981373 And finally, exponentiate the middle and endpoints to get the point estimate and intervals in dollars: > exp(7.525992) [1] 1855.653 #The point estimate is $1855.65 > exp(7.484671) [1] 1780.538 > exp(7.567314) [1] 1933.939 #The confidence interval is $1780.54 to $1933.94. > exp(7.070612) [1] 1176.868 > exp(7.981373) [1] 2925.946 #The prediction interval is $1176.87 to $2925.95