> attach(grocery) > boxplot(sales~discount, ylab="sales",xlab="discount")

Similar documents
Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

The SAS System 11:03 Monday, November 11,

Non-linearities in Simple Regression

Multiple regression - a brief introduction

Stat 328, Summer 2005

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

SAS Simple Linear Regression Example

Homework Assignment Section 3

Study 2: data analysis. Example analysis using R

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases.

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Regression and Simulation

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Random Effects ANOVA

6 Multiple Regression

> > is.factor(scabdata$trt) [1] TRUE > is.ordered(scabdata$trt) [1] FALSE > scabdata$trtord <- ordered(scabdata$trt, +

Economics 424/Applied Mathematics 540. Final Exam Solutions

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS

Monetary Economics Risk and Return, Part 2. Gerald P. Dwyer Fall 2015

Market Approach A. Relationship to Appraisal Principles

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Topic 30: Random Effects Modeling

The study on the financial leverage effect of GD Power Corp. based on. financing structure

Homework Solutions - Lecture 2 Part 2

A Brief Illustration of Regression Analysis in Economics John Bucci. Okun s Law

Analysis of Variance in Matrix form

ST 350 Lecture Worksheet #33 Reiland

Generalized Linear Models

Fall 2004 Social Sciences 7418 University of Wisconsin-Madison Problem Set 5 Answers

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

R is a collaborative project with many contributors. Type contributors() for more information.

State Ownership at the Oslo Stock Exchange. Bernt Arne Ødegaard

Statistic Midterm. Spring This is a closed-book, closed-notes exam. You may use any calculator.

CHAPTER III METHODOLOGY

WesVar Analysis Example Replication C7

Topic 8: Model Diagnostics

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Statistics 101: Section L - Laboratory 6

1 Estimating risk factors for IBM - using data 95-06

The Norwegian State Equity Ownership

2SLS HATCO SPSS, STATA and SHAZAM. Example by Eddie Oczkowski. August 2001

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Linear regression model

The Multivariate Regression Model

NHY examples. Bernt Arne Ødegaard. 23 November Estimating dividend growth in Norsk Hydro 8

Copyrighted 2007 FINANCIAL VARIABLES EFFECT ON THE U.S. GROSS PRIVATE DOMESTIC INVESTMENT (GPDI)

Homework Assignment Section 3

Homework 0 Key (not to be handed in) due? Jan. 10

Regression Model Assumptions Solutions

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Spring, Beta and Regression

20135 Theory of Finance Part I Professor Massimo Guidolin

Variance clustering. Two motivations, volatility clustering, and implied volatility

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

The Effect of Health Insurance on Death Rates

Final Exam Suggested Solutions

Technical Documentation for Household Demographics Projection

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Presented at the 2003 SCEA-ISPA Joint Annual Conference and Training Workshop -

A SEARCH FOR A STABLE LONG RUN MONEY DEMAND FUNCTION FOR THE US

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Maximum Likelihood Estimation

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

MODEL SELECTION CRITERIA IN R:

PASS Sample Size Software

Cross-validation, ridge regression, and bootstrap

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

Lecture 1: Empirical Properties of Returns

STA258 Analysis of Variance

Regression. Lecture Notes VII

Modeling Panel Data: Choosing the Correct Strategy. Roberto G. Gutierrez

AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:

CHAPTER 4 DATA ANALYSIS Data Hypothesis

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Predicting Charitable Contributions

Business Statistics Final Exam

Time series data: Part 2

Sales Sales

Multiple Regression. Review of Regression with One Predictor

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Solutions to Midterm

Lecture note 8 Spring Lecture note 8. Analysis of Variance (ANOVA)

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

1.017/1.010 Class 19 Analysis of Variance

Jaime Frade Dr. Niu Interest rate modeling

Lecture Note: Analysis of Financial Time Series Spring 2008, Ruey S. Tsay. Seasonal Time Series: TS with periodic patterns and useful in

PRICE DISTRIBUTION CASE STUDY

1. Independence of x and error Generate an explanatory variable x and an error term eps independently:

Impact of Unemployment and GDP on Inflation: Imperial study of Pakistan s Economy

Handout seminar 6, ECON4150

Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA

ECO671, Spring 2014, Sample Questions for First Exam

SUMMARY OUTPUT. Regression Statistics Multiple R R Square Adjusted R Standard E Observation 5

Final Exam - section 1. Thursday, December hours, 30 minutes

Transcription:

Example of More than 2 Categories, and Analysis of Covariance Example > attach(grocery) > boxplot(sales~discount, ylab="sales",xlab="discount") Sales 160 200 240 > tapply(sales,discount,mean) 10.00% 15.00% 5.00% 217.7500 213.5833 203.5000 > tapply(sales,discount,sd) 10.00% 15.00% 5.00% 35.00162 26.66785 37.86939 Question: Is there a statistically significant difference in population mean sales for the different discount levels? Two versions in R: The aov command, and the lm command as covered in Friday discussion. See next page for output. 10.00% 15.00% 5.00% Discount

Using the aov command, followed by summary : > AOVModel<-aov(Sales~Discount) > summary(aovmodel) Discount 2 1288 644.2 0.573 0.569 Residuals 33 37074 1123.5 Using the lm command, followed by anova : > LMVersion<-lm(Sales~as.factor(Discount)) > anova(lmversion) as.factor(discount) 2 1288 644.19 0.5734 0.5691 Residuals 33 37074 1123.46 Using the lm command, followed by summary : > summary(lmversion) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 217.750 9.676 22.505 <2e-16 *** as.factor(discount)15.00% -4.167 13.684-0.304 0.763 as.factor(discount)5.00% -14.250 13.684-1.041 0.305 Residual standard error: 33.52 on 33 degrees of freedom Multiple R-squared: 0.03358, Adjusted R-squared: -0.02499 F-statistic: 0.5734 on 2 and 33 DF, p-value: 0.5691 Using any of the versions, do not reject H 0 ; conclude discount levels don t have significant effect on sales.

NOW add a covariate of X = Price of the item. Explanatory notes on white board. 275 250 Scatterplot of Sales vs Price Discount 5.00% 10.00% 15.00% 225 Sales 200 175 Case 5 150 8.0 8.5 Price 9.0 9.5 > AOC<-lm(Sales~Price+as.factor(Discount)) > anova(aoc) Price 1 36718 36718 1391.366 < 2.2e-16 *** as.factor(discount) 2 800 400 15.149 2.348e-05 *** Residuals 32 844 26 NOW we can reject H 0 and conclude Discount does have an effect on Sales, after accounting for Price.

Sample version of model for each group and tests with conclusions on board. (See next page for adjusted R 2, which is now almost 0.98.) NOTE: Order matters for anova command but not for summary command: > AOC<-lm(Sales~Price+as.factor(Discount))#Tests Price, then Discount > anova(aoc) Price 1 36718 36718 1391.366 < 2.2e-16 *** as.factor(discount) 2 800 400 15.149 2.348e-05 *** Residuals 32 844 26 > AOCOrder<-lm(Sales~as.factor(Discount)+Price)#Discount, then Price > anova(aocorder) as.factor(discount) 2 1288 644 24.41 3.648e-07 *** Price 1 36230 36230 1372.84 < 2.2e-16 *** Residuals 32 844 26 Question: Why is the Factor (Discount) now statistically significant even before adding Price, when it wasn t when the model was run without price at all??? Answer: The MSE is now computed after accounting for Price. It s the MSE for the full model. Adding price has explained a very large amount of the previous unexplained residual/error!

Question: Does it matter whether you put the covariate or the factor in the model first? Answer: Order does not matter for the Summary command, but it does matter for the anova table. And the results of summary never test the Factor as a whole. Individual added intercept terms are tested: > summary(aoc) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -466.132 18.517-25.173 <2e-16 *** Price 79.591 2.148 37.052 <2e-16 *** as.factor(discount)15.00% 4.655 2.111 2.205 0.0347 * as.factor(discount)5.00% -6.822 2.107-3.238 0.0028 ** Residual standard error: 5.137 on 32 degrees of freedom Multiple R-squared: 0.978, Adjusted R-squared: 0.9759 F-statistic: 473.9 on 3 and 32 DF, p-value: < 2.2e-16 > summary(aocorder) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -466.132 18.517-25.173 <2e-16 *** as.factor(discount)15.00% 4.655 2.111 2.205 0.0347 * as.factor(discount)5.00% -6.822 2.107-3.238 0.0028 ** Price 79.591 2.148 37.052 <2e-16 *** Residual standard error: 5.137 on 32 degrees of freedom Multiple R-squared: 0.978, Adjusted R-squared: 0.9759 F-statistic: 473.9 on 3 and 32 DF, p-value: < 2.2e-16

Assessing fit: Both plots look good. 7 Residuals vs Fitted Residuals -15-5 5 5 3 140 160 180 200 220 240 260 Fitted values lm(sales ~ Price + as.factor(discount)) Standardized residuals -3-1 0 1 2 Normal Q-Q 7 3 5-2 -1 0 1 2 Theoretical Quantiles lm(sales ~ Price + as.factor(discount))

The only case that may be a problem is the one labeled as 5. It has a large standardized residual. Its predicted Sales = 188.45, actual Sales = 174 and estimated s.d. = 5.137. No obvious explanation, so don t remove case! Standardized residuals -3-1 1 2 Residuals vs Leverage Cook's distance 0.5 5 7 2 0.00 0.05 0.10 0.15 0.20 Leverage lm(sales ~ Price + as.factor(discount))