Business Statistics: A First Course Fifth Edition Chapter 12 Correlation and Simple Linear Regression Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc. Chap 12-1
Learning Objectives In this chapter, you learn: To calculate the coefficient of correlation The meaning of the regression coefficients b 0 and db. 1 How to use regression analysis to predict the value of a dependent variable based on an independent variable. Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-2
Correlation vs. Regression A scatter plot can be used to show the relationship between two variables Correlation analysis is used to measure the strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the relationship No causal effect is implied with correlation Scatter plots were first presented in Ch. 2 Correlation was first presented in Ch. 3 Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-3
Coefficient of Correlation Measures the relative strength of the linear relationship between two numerical variables Sample coefficient of correlation: n x i x yi y i1 r n 1 Sx Sy Where x and y are the means of x and y-values S x and S y are the standard deviations of x and y- values Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-4 3-4
Shortcut Formula r n i 1 x i y i nxy n n 2 2 xi nx i 1 i 1 y 2 i ny 2 Where x and y are the means of x and y-values Question : Will outliers effect the correlation? ES Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-5 3-5
Features of the Coefficient of Correlation The population coefficient of correlation is referred as ρ. The sample coefficient of correlation is referred to as r. Either ρ or r have the following features: Unit free Ranges between 1 and 1 The closer to 1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker the linear relationship Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-6 3-6
Scatter Plots of Sample Data with Various Coefficients of Correlation r = -1 r = -.6 r = +1 r = +.3 Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-7 3-7 r = 0
Correlation Coefficient Example: Real estate t agent A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) A random sample of 10 houses is selected Dependent variable () = house price in $1000s Independent variable () = square feet Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-8
Correlation Coefficient Example: Data House Price in $1000s Square Feet () () 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-9
Simple Linear Regression Example: Scatter Plot House price model: Scatter Plot Price ($10 000s) House 450 400 350 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 Square Feet Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-10
Calculations 10 10 10 10 10 2 2 xi yi xi yi xiyi i1 i1 i1 i1 i1 r 17150, 2865, 30983750, 853423, 5085975 5085975 101715286.5 30983750 101715 853423 10286.5 i1 n 2 2 0.7621 There is positive relationship between the selling price of a home and its size. Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-11
Introduction to Regression Analysis Regression analysis is used to: Predict the value of a dependent variable based on the value of at least one independent variable Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to predict or explain Independent variable: the variable used to predict or explain the dependent variable Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-12
Simple Linear Regression Model Only one independent variable, Relationship between and is described by a linear function Changes in are assumed to be related to changes in Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-13
Types of Relationships Linear relationships Curvilinear relationships Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-14
Types of Relationships (continued) Strong relationships Weak relationships Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-15
Types of Relationships No relationship (continued) Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-16
Simple Linear Regression Model Dependent Variable Population intercept i β 0 Population Slope Coefficient β 1 Independent Variable i ε i Random Error term Linear component Random Error component Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-17
Simple Linear Regression Model (continued) i β 0 β 1 i ε i Observed Value of for i Predicted Value of for i ε i Random Error Slope = β 1 Random Error for this i value Intercept = β 0 i Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-18
Simple Linear Regression Equation (Prediction Line) The simple linear regression equation provides an estimate of the population regression line Estimated (or predicted) value for observation i Estimate of the regression intercept Estimate of the regression slope Ŷ i b 0 b 1 i Value of for observation i Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-19
Finding the Regression Equation The coefficients b 0 The Slope b 1 : The Intercept b 0 : where S n and b 1 are given by: 0 1 b S r S 1 y x b y b x 0 1 n 2 2 xi x yi y, S n1 n1 i1 i1 Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-20
Interpretation of the Slope and the Intercept b 0 is the estimated mean value of 0 when the value of is zero b 1 is the estimated change in the mean value of as a result of a one-unit change in Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-21
Simple Linear Regression Example: Recall the real estate example - Scatter Plot House price model: Scatter Plot Price ($10 000s) House 450 400 350 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 Square Feet Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-22
Calculations S y 1 1 S x 60. 1854 b r b 0. 7621 0.10977 417. 8649 b yb x b 286. 5 0.109771715=98.248 0 1 0 Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-23
Simple Linear Regression Example: Graphical Representation House price model: Scatter Plot and Prediction Line 450 Intercept = 98.248 00s) Price ($10 House 400 350 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 Slope = 0.10977 Square Feet house price 98.24833 0.10977 (square feet) Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-24
Simple Linear Regression Example: Interpretation of b o house price 98.24833 0.10977 (square feet) b 0 is the estimated mean value of when the valueofiszero(if=0 is in the range of observed values) Because a house cannot have a square footage of 0, b 0 has no practical application Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-25
Simple Linear Regression Example: Interpreting b 1 house price 98.24833 0.10977 (square feet) b 1 estimates the change in the mean value of as a result of a one-unit increase in Here, b 1 = 0.10977 tells us that the mean value of a house increases by 0.10977($1000) = $109.77, on average, for each additional one square foot of size Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-26
Simple Linear Regression Example: Making Predictions Predict the price for a house with 2000 square feet: house price 98.25 0.1098 (sq.ft.) 98.25 0.1098(2000) 317.85 The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850 Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-27
Coefficient of Determination, r 2 The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called r-squared and is denoted as r 2 r 2 Correlation between and 2 note: 0 r 2 1 Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-28
2 Examples of r 2 Values r 2 = 1 r 2 = 1 Perfect linear relationship between and : 100% of fthe variation in is explained by variation in r 2 = 1 Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-29
Examples of r 2 Values 0 <r 2 < 1 Weaker linear relationships between and : Some but not all of the variation in is explained by variation in Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-30
Examples of r 2 Values r 2 = 0 No linear relationship between and : r 2 = 0 The value of does not depend on. (None of the variation in is explained by variation in ) Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-31
Simple Linear Regression Example: 2 Coefficient of Determination, r 2 2 2 r.7621 0.5808 58.08% of the variation in house prices is explained by variation in square feet Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-32
Chapter Summary Introduced Correlation coefficient. Introduced types of regression models Discussed determining the simple linear regression equation Described measures of variation Discussed residual analysis Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc.. Chap 12-33