General Business 706 Midterm #3 November 25, 1997

Similar documents
CHAPTER 7 MULTIPLE REGRESSION

Final Exam Suggested Solutions

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm

Homework Assignment Section 3

Quantitative Techniques Term 2

Predicting Charitable Contributions

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Solutions to Midterm

Homework Assignment Section 3

Final Exam - section 1. Thursday, December hours, 30 minutes

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Multiple regression - a brief introduction

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Midterm

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Midterm

Stat 328, Summer 2005

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2010, Mr. Ruey S. Tsay. Solutions to Midterm

Influence of Personal Factors on Health Insurance Purchase Decision

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Solutions for Session 5: Linear Models

Fall 2004 Social Sciences 7418 University of Wisconsin-Madison Problem Set 5 Answers

STAT 1220 FALL 2010 Common Final Exam December 10, 2010

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2010, Mr. Ruey S. Tsay Solutions to Final Exam

Case 2: Motomart INTRODUCTION OBJECTIVES

Multiple Regression. Review of Regression with One Predictor

Stat3011: Solution of Midterm Exam One

Stat 401XV Exam 3 Spring 2017

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Economics 424/Applied Mathematics 540. Final Exam Solutions

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Estimating a demand function

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

Ac. J. Acco. Eco. Res. Vol. 3, Issue 2, , 2014 ISSN:

Linear regression model

International Journal of Multidisciplinary Consortium

SLIDES. BY. John Loucks. St. Edward s University

2SLS HATCO SPSS, STATA and SHAZAM. Example by Eddie Oczkowski. August 2001

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Descriptive Statistics

Advanced Econometrics

WEB APPENDIX 8A 7.1 ( 8.9)

Regression and Simulation

Cumulative Abnormal Returns

Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases.

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Relation between Income Inequality and Economic Growth

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Study 2: data analysis. Example analysis using R

Non-linearities in Simple Regression

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

SMAM 345 Exam 1 Name. 1. The following data represent the number of miles per gallon achieved on the highway for small cars for the model year 2008.

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

1) The Effect of Recent Tax Changes on Taxable Income

Factor Affecting Yields for Treasury Bills In Pakistan?

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Midterm

The relationship between GDP, labor force and health expenditure in European countries

MODEL SELECTION CRITERIA IN R:

PRMIA Exam 8002 PRM Certification - Exam II: Mathematical Foundations of Risk Measurement Version: 6.0 [ Total Questions: 132 ]

SAS Simple Linear Regression Example

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Analysis of Variance in Matrix form

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Final Exam, section 1. Thursday, May hour, 30 minutes

Online Appendix for. Explaining Corporate Capital Structure: Product Markets, Leases, and Asset Similarity. Joshua D.

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Homework Assignment #1

THE DETERMINANTS OF CAPITAL STRUCTURE IN THE TEXTILE SECTOR OF PAKISTAN

Tests for Two ROC Curves

CHAPTER 2 Describing Data: Numerical

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

Determinants of FII Inflows:India

(iii) Under equal cluster sampling, show that ( ) notations. (d) Attempt any four of the following:

Econometrics and Economic Data

CHAPTER III METHODOLOGY

Effect of Education on Wage Earning

Economics 483. Midterm Exam. 1. Consider the following monthly data for Microsoft stock over the period December 1995 through December 1996:

CHAPTER 4 DATA ANALYSIS Data Hypothesis

Random Effects ANOVA

PASS Sample Size Software

Problem Set 6 ANSWERS

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

CAPM (1) where λ = E[r e m ], re i = r i r f and r e m = r m r f are the stock i and market excess returns.

MANAGEMENT ACCOUNTING 2. Module Code: ACCT08004

Jacob: The illustrative worksheet shows the values of the simulation parameters in the upper left section (Cells D5:F10). Is this for documentation?

Chapter 14. Descriptive Methods in Regression and Correlation. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 14, Slide 1

Keywords: working capital management, profitability, cash conversion cycle. Introduction

The Two-Sample Independent Sample t Test

Empirical Rule (P148)

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points

DATA SUMMARIZATION AND VISUALIZATION

Econometric Methods for Valuation Analysis

Problem Set 9 Heteroskedasticty Answers

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

Transcription:

General Business 706 Midterm #3 November 25, 1997 There are 9 questions on this exam for a total of 40 points. Please be sure to put your name and ID in the spaces provided below. Now, if you feel any question is unreasonable, after the examination is over you may submit a written discussion of your reasons to the proctor or instructor. During the examination you may ask questions of the proctors, but based on past data, 99% of the time the response to your question will be, That is part of the question. This is a closed book, closed-note exam with the exception that you are allowed six 8-1/2 x 11" sheets (both sides) for notes. It is possible to earn partial credit on all problems by showing your work. If you need more room, feel free to use the back side of this exam. Good Luck! Name ID Questions 1-5 relate to the following information Computer Pricing Problem For executives "on the go," small, portable microcomputers called notebooks are becoming an important tool in today's business world. In their August, 1996 issue, PC Magazine compared a sample of the leading notebooks on the market at that time. All notebooks compared were Pentium class IBM-compatible machines. The comparisons included as well as a number of features of the notebooks. Descriptions of the variables are in the attached table. Basic summary statistics are given on the page following the description table. (4pts) 1 Manufacturers provide signals of quality through the product warranty. Does the warranty variable appear to affect? To answer this, consider a one-way model of Price using warranty as an explanatory variable. Test whether warranty is a statistically significant factor. In your response, state the null and hypotheses, the test statistic, your decision criterion and the result of the testing procedure. The computer output follows. One-Way Analysis of Variance Analysis of Variance for Source DF SS MS F P Warranty 3 16707355 5569118 7.04 0.000 Error 54 42723141 791169 Total 57 59430496 Level N StDev 1 34 3515.3 641.1 2 2 2798.5 283.5 3 20 4580.7 1235.4 4 2 3995.0 282.8 Pooled StDev = 889.5 (5pts) 2 After further investigation, it was found that the variable LCD may have a significant impact on the. To assess this impact in the presence of the categorical variable warranty, an added variable plot was created. Computer output to create the added variable plot on the following page. The correlation in the added variable plot is 0.559. i. Describe what we learn from an added variable plot. In your description, cite the partial correlation coefficient and compare this plot to a scatter plot. ii. Use a t-test to decide whether LCD will be an important variable if you were to add it to the model described in Question 1. Use a 5% significance level.

iii. Is your decision from part (ii) supported by the added variable plot? Interpret the relationship between and LCD in the added variable plot. Computer Output to Create an Added Variable Plot MTB > Oneway Warranty RESI1. One-Way Analysis of Variance Analysis of Variance for Source DF SS MS F P Warranty 3 16707355 5569118 7.04 0.000 Error 54 42723141 791169 Total 57 59430496 Level N StDev 1 34 3515.3 641.1 2 2 2798.5 283.5 3 20 4580.7 1235.4 4 2 3995.0 282.8 Pooled StDev = 889.5 MTB > Oneway LCD Warranty RESI2. One-Way Analysis of Variance Analysis of Variance for LCD Source DF SS MS F P Warranty 3 1.739 0.580 1.24 0.303 Error 54 25.140 0.466 Total 57 26.879 Level N StDev 1 34 11.038 0.666 2 2 10.400 0.000 3 20 11.100 0.732 4 2 11.700 0.566 Pooled StDev = 0.682 Added Variable Plot of Price versus LCD. The correlation coefficient of this plot is 0.559. 2000 1000 RESI1 0-1000 -2000-1 0 RESI2 1

(6pts) 3 A final model was fit, using the categorical variables Warranty and Processo, the indicator variables Battery and Direct and the continuous variable LCD. The computer output follows. The most expensive notebook in the sample was observation number 34, the IBM ThinkPad 760ED, d at $6,999. i. Compute the leverage for this observation. ii. Is this leverage unusual? iii. The fitted value for the 34 th observation is $5,644.87. Suppose that the ThinkPad was d incorrectly and the true should be $6,499. If you had re-run the regression with this new value of the response, then what would be the corresponding fitted value? Regression Model Using Categorical Variables Warranty and Processo, Continuous Variables LCD, Battery and Direct Factor Levels Values Warranty 4 1 2 3 4 Processo 6 75 90 100 120 133 150 Source DF SS MS Model 11 40715423 3701402 Error 46 18715073 406849 Total 57 59430496 Term Coef StDev T P Constant -421 1795-0.23 0.815 LCD 427.3 159.8 2.67 0.010 Battery -638.4 204.7-3.12 0.003 direct -439.3 233.0-1.89 0.066 Unusual Observations for Obs Fit Residual St Resid 6 2699.00 2699.00-0.00 * X 8 3499.00 4918.38-1419.38-2.44R 17 6647.00 5006.47 1640.53 2.78R 34 6999.00 5644.87 1354.13 2.26R 49 3999.00 2541.76 1457.24 2.42R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. (3pts) 4 A final model was fit, using the categorical variables Warranty and Processo, the indicator variables Battery and Direct and the continuous variable LCD. The computer output is above. The sixth observation has a zero residual and is a high leverage point. What characteristics of this notebook would yield these two unusual signals? (8pts) 5 A final model was fit, using the categorical variables Warranty and Processo, the indicator variables Battery and Direct and the continuous variable LCD. The computer output is above. Suppose that you are interested in the effects of direct marketing on the. i. Assuming that this is the correct model, does direct marketing lead to lower? Justify your response in terms of a formal test of hypothesis using a t-test. Use a 5% level of significance and a one-sided test. ii. Determine the p-value for your test in part (i). iii. Criticism has been leveled regarding the sample of notebooks that you have selected for your test. Describe a population of interest that may be relevant and the sampling frame that you used. iv. Another potential criticism regards limited dependent variable bias. Describe the limited nature of your dependent variable.

Variable Definitions Variable Name Variable Description Computer Name of the Computer Response Variable Price Price of the computer street or direct Continuous Explanatory Variables Battery Hours The battery life length, in hours. LCD The size of the display in inches Travel Weight The weight of the machine with necessary accessories for travel Installed The size of the installed hard drive in GB Maximum Hard The maximum size of hard drive the system will support Categorical Explanatory Variables Processor The speed of the CPU Maximum Ram The maximum amount of ram the system can support Graphics The amount of graphics memory in MB Type of Graphics Type of graphics system. 0 = Dram, 1 = EDO Dram, 2 = VRAM, 3=SDRAM Warranty Standard warranty on parts/labor in years CD Speed The speed of the CD Rom drive. If no drive available, speed = 0. Indicator Explanatory Variables Swappable An indicator, 1 = yes, 0 = no. A modular floppy drive that can be removed Internal An indicator, 1 = yes, 0 = no. A built in floppy drive External An indicator, 1 = yes, 0 = no. A floppy drive external from the computer CD Rom An indicator, 1 = yes, 0 = no. An integrated or connectable CD Rom drive Cache An indicator showing if the system employs a processing cache Modem An indicator, 1 = yes, 0 = no. Indicates system has a modem Infrared An indicator, 1 = yes, 0 = no. A wireless information transfer device NTSC in An indicator, 1 = yes, 0 = no. A video input. NTSC out An indicator, 1 = yes, 0 = no. A video output Touch Pad An indicator, 1 = yes, 0 = no. A pointing device Track Ball An indicator, 1 = yes, 0 = no. A pointing device Point Stick An indicator, 1 = yes, 0 = no. A pointing device Battery Type 1 = Nickel Hydride, 0 = Lithium Ion Direct An indicator for direct computer companies such as Dell Technical Support An indicator, 1 = yes, 0 = no. 24-hour, 7-day live technical support References Metz, Cade. Portable Property. PC Magazine August 1996: 100-195. Descriptive Statistics for Continuous Variables Variable N N* Median StDev Min Max 58 0 3875 3757 1021 2299 6999 Batt-hrs 57 1 2.153 2.000 0.903 0.767 4.867 LCD 58 0 11.060 11.250 0.687 10.200 12.100 travel w 58 0 7.984 8.100 1.086 4.700 10.300 Installe 58 0 1.0341 1.0000 0.2464 0.6860 1.3000 Max Hard 58 0 1.2974 1.3000 0.2947 0.8100 2.1000

Tabulated Statistics for Categorical Variables Rows: Processo 75 5 3300.8 90 1 2699.0 100 13 3144.0 120 16 3769.6 133 20 4628.6 150 3 3920.0 Rows: Max Ram 32 9 3400.6 40 36 3687.0 48 1 5198.0 64 8 4056.0 72 1 3795.0 80 2 6823.0 144 1 6299.0 Rows: Graphics 1 46 3837.3 2 12 4017.3 Rows: Type Gra 0 34 3751.9 1 5 4791.6 2 18 3800.1 3 1 4799.0 Rows: Warranty 1 34 3515.3 2 2 2798.5 3 20 4580.7 4 2 3995.0 Rows: CD speed 0 13 3027.5 2 1 3325.0 4 39 4044.0 6 5 4864.4 Descriptive Statistics for Indicator Variables Variable N Swappabl 58 0.638 internal 58 0.328 external 58 0.035 CD Rom 58 0.776 Cache 58 0.845 modem 58 0.138 infrared 58 0.897 NTSC in 58 0.017 NTSC out 58 0.276 touch pa 58 0.724 track ba 58 0.052 Point St 58 0.207 Battery 58 0.535 direct 58 0.276 Tech Sup 58 0.431 Questions 6-9 relate to the following information Do lower s correspond to higher sales? A response to this question is important to manufacturers competing in the United States automobile market, as in many other markets. To investigate this question, base model s of 48 types of cars were collected from the April 1989 issue of Consumer Reports. Also, in this issue was information on the miles per gallon (MPG) of each car, a measure of size of the car in terms of the number of passengers (2, 4, 5, or

6), and the manufacturer of each car. Manufacturers were categorized into six groups: FORD, General Motors (GM), JAPANESE (not including Mitsubishi), Mitsubishi (MITSUBSH), CHRYSLER, and EUROPE. The response variable of interest was the number of automobiles sold, collected from Automotive Weekly and Automotive Facts and Figures.. :. ::.. :::: ::.::. :: :... :......... +---------+---------+---------+---------+---------+-------SALES 0 80000 160000 240000 320000 400000 Variable N Median StDev Min Max SALES 48 123,337 85,150 101,693 2,743 416,957 REGRESSION ANALYSIS USING SALES AS THE RESPONSE SALES = 13316-2.89 PRICE + 19,190 SIZE - 56,437 MITSUBSH + 127,515 FORD + 81,250 GM + 119,173 JAPANESE - 162 CHRYSLER s = 82,163 3 2 SRES1 1 0-1 -2 0 FITS1 100000 200000 Plot of Standardized Residuals versus Fitted Values (3pts) 6 You use residual analysis to evaluate the model using SALES as the response. As part of your analysis, you consider the plot of the standardized residuals versus the fitted values. (This plot is on the previous page.) i When we plot the standardized residuals versus the fitted values, for what are we looking? ii. What does this plot tell you about the model using SALES as the response? How can you use this information to improve the model?

MTB > Stepwise LN_SALES PRICE MPG SIZE MITSUBSH - EUROPE ; SUBC> FEnter 4; SUBC> FRemove 4. Stepwise Regression Response is LN_SALES on 9 predictors, with N = 48 Step 1 2 3 4 5 6 7 Constant 11.534 11.612 9.385 9.865 9.046 10.477 11.119 EUROPE -1.92-2.00-1.92-2.10-2.04-1.58-1.50 T-Value -4.97-5.61-5.94-6.87-7.02-4.83-4.80 MITSUBSH -1.66-1.96-2.09-2.07-2.12-2.08 T-Value -3.05-3.91-4.49-4.69-5.10-5.06 MPG 0.065 0.056 0.051 0.018 T-Value 3.31 3.04 2.94 0.84 CHRYSLER -0.65-0.70-0.79-0.83 T-Value -2.88-3.27-3.88-4.23 SIZE 0.216 0.262 0.277 T-Value 2.47 3.11 3.39 PRICE -0.00004-0.00005 T-Value -2.56-3.97 S 0.817 0.752 0.680 0.630 0.596 0.560 0.558 R-Sq 34.96 46.12 56.87 63.85 68.43 72.78 72.31 (4pts) 7 Above is a stepwise regression, using LN_SALES as the response. i. In Stage 3, the variable associated with MPG has a t-ratio equal to 3.31 yet this variable has been removed in Stage 7. Explain how this can happen. Describe both the algorithm and the relationship among the variables. ii. Suppose that a conservative students wants to use stepwise regression but would like to use a t-ratio = 2.5 to enter the model (corresponding to an FENTER = 2.5 2 = 6.25) Using the above computer output, what model would be suggested using this version of stepwise regression? REGRESSION ANALYSIS USING LN_SALES AS THE RESPONSE The regression equation is LN_SALES = 9.61-0.000044 PRICE + 0.270 SIZE - 0.556 MITSUBSH + 1.68 FORD + 1.43 GM + 1.55 JAPANESE + 0.689 CHRYSLER Predictor Coef StDev Constant 9.6143 0.5036 PRICE -0.00004424 0.00001177 SIZE 0.27003 0.08320 MITSUBSH -0.5562 0.5210 FORD 1.6845 0.3730 GM 1.4289 0.3263 JAPANESE 1.5517 0.3713 CHRYSLER 0.6887 0.3498 s = 0.5648

(4pts) 8 Consider now the model using LN_SALES as the response. i. The standard deviation of the explanatory variable SIZE is 1.010. Use this to calculate the Variance Inflation Factor (VIF) of SIZE for this model. ii. What does VIF of SIZE measure? Is this an unusually high VIF? Why, or why not? (3pts) 9 We have two models from which to choose, the model with SALES as the response variable and the one with LN_SALES as the response variable. What is the purpose of using an out-of-sample validation technique for choosing between these two models? In two or three sentences, briefly describe how to perform an out-of-sample validation technique. Solutions to Midterm #3, Fall, 1997, Gen Bus 706 1. To test the significance of the categorical variable WARRANTY, our null is H 0 :µ 1 = µ 2 = µ 3 = µ 4 against the alternative H a : some µ s are not the same. The test statistic is the F-ratio = 7.04. We compare this to an F-value of 2.75, from the F-table with degrees of freedom df 1 = 3 and df 2 = 54. We reject H 0 and conclude that WARRANTY is a statistically significant factor in explaining the of notebooks. 2. i) The added variable plot provides the relationship between PRICE and LCD after removing the effects of WARRANTY. A scatter plot, on the other hand, provides the relationship of PRICE and LCD without removing the effects of any possible significant variables. The partial correlation coefficient is 0.559. ii) We have the partial correlation 0.559 r PRICE, LCD WARRANTY = t b t 2 b + n ( k + 1 ). With n=58 and k+1=4, we can = ( ) ( LCD ) ( LCD ) therefore compute the t-ratio = tb ( LCD ) = + 49. which is used to test the significance of the variable LCD. Since t-ratio is larger than 2 in absolute value, we conclude that LCD is a statistically useful variable. iii) This conclusion is supported by the added variable plot. The plot exhibits a positive relationship between the residuals; the correlation at 55.9% is fairly high. e$ 3. i) std. residuals = 34 implies the leverage s 1 h34 e$ 34 1354.13 is h =1-34 = 1 01176. s( std. residuals) =. 406849( 2. 26) ii) To be unusual, a leverage should exceed 3h = 3( k+ 1)/ n = 311 ( + 1)/ 58= 062.. In this case, it is not unusual leverage. iii) A fitted value is a linear combination of responses, i.e. $y = h y, where the h s are 2 2 i ki k k entries in the hat matrix whose diagonals are the measures of leverage. Thus, the corresponding fitted value is 5644.87+ 0.1176 (6499-6999) = 5,586.07. 4. The zero residual indicates that the fitted value equals the observed value and the high leverage indicates an unusual set of explanatory variables. That is, this notebook has an unusual set of features. From the summary statistics, we see that there is only one type of 90 megahertz processor. Thus, the indicator variable for this level of processor uniquely flags this point. This guarantees a zero residual and generally yields a high leverage point. The set

of explanatory variables is unusual because it is the only observation to have a 1 for this indicator variable. 5. i) We wish to test the null H 0 :β DIRECT = 0 against the alternative H a :β DIRECT <0. The test statistic is t-ratio = -1.89. Comparing this to the t-value = -1.68 from the table with df = n- (k+1)=46. We therefore reject the null and conclude that direct marketing generally lead to lower. ii) p value = Pr ob( t < 189. ) = 1. 97 = 3%. iii) There are many possible populations of interest; for example, we may be interest in notebooks produced by Apple as well as IBM-compatibles. The sampling frame consists of the PC Magazine list of Pentium class IBM-compatible machines. iv) The of notebook computers cannot be negative. Therefore, the response must be greater than or equal to zero. 6. i) To check the assumption of constant variability of errors. ii) The graph exhibits increasing variability of errors, and hence, heteroscedastic errors. This suggests examining a transform of the response, such as the logarithmic transform. 7. i) When the additional variables CHRYSLER, SIZE and PRICE enter the model, then MPG becomes insignificant. Intuitively, these additional variables account for the information contained in MPG. The algorithm allows this to happen because, at each stage, it checks to see whether superfluous variables are in the model and, if detected, removes them. ii) Stop at step 4 and the stepwise regression would have selected the variables EUROPE, MITSUBISHI, MPG, and CHRYSLER. VIF VIF 8. i) Since se( b ) = s.. SIZE ssize n 0 08320 1 = 05648, then VIF = 1.04. 1010. 48 1 ii) VIF s measure collinearity, that is, the extent to which one explanatory variable is a linear combination of the others. This value, because it is below the usual cut-off 10, does not indicate collinearity. 9. Out-of-sample validation procedures help to validate our choice of a model and provide protection against data snooping. They involve dividing the data into two subsamples: one to be used to fit regression models and the other to be used to validate the models. Two important statistics that summarize these procedures are PRESS and SSPE. By fitting a regression model on one scale, say LN_SALES, we can transform predictions to the other scale, say SALES. In this way, models based on different transforms of the response may be compared to one another.