The Least Squares Regression Line Section 5.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 pm - 3:30 pm 620 PGH & 5:30 pm - 7:00 pm CASA Department of Mathematics University of Houston March 3, 2016 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 1 of / 23 Math
Outline 1 Beginning Questions 2 Least-Squares Regression 3 Prediction 4 Coefficient of Determination Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 2 of / 23 Math
Popper Set Up Fill in all of the proper bubbles. Use a #2 pencil. This is popper number 10. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 3 of / 23 Math
Popper #10 Questions We are looking at the relationship between how many items are on a shelf (shelf space) and the number of items sold in a week. The correlation coefficeint is r = 0.827. 1. What is the direction of the relationship between shelf space and weekly sales? a) Positive b) Negative c) No direction 2. What is the strength the relationship between shelf space and weekly sales? a) Strong b) Moderate c) Weak 3. What form appears to be in the relationship between shelf space and weekly sales? a) Linear b) Non-linear Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 4 of / 23 Math
Scatterplot Number Sold 150 200 250 300 5 10 15 20 Shelf Space(feet) Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 5 of / 23 Math
Examining relationships Correlation measures the direction and strength of the straight-line relationship between two quantitative variables. If a scatterpolt shows a linear relationship, we would like to summarize this overall pattern by drawing a line on the scatterplot. A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. This equation is used when one of the variables helps explain or predict the other. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 6 of / 23 Math
Least-Squares regression The least-squares regression line (LSRL) of Y on X is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 7 of / 23 Math
Least-Squares Number Sold 150 200 250 300 5 10 15 20 Shelf Space(feet) Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 8 of / 23 Math
Example The marketing manager of a supermarket chain would like to use shelf space to predict the sales of coffee. A random sample of 12 stores is selected, with the following results. Store Shelf Space (ft) Weekly Sales (# sold) 1 5 160 2 5 220 3 5 140 4 10 190 5 10 240 6 10 260 7 15 230 8 15 270 9 15 280 10 20 260 11 20 290 12 20 310 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 9 of / 23 Math
Equation of the least-squares regression line Let x be the explanatory variable and y be the response variable for n individuals. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. The least-squares line is the equation ŷ = a + bx Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 10 of / 23 Math
Equation of the least-squares regression line The least-squares line is the equation ŷ = a + bx. In the example of the supermarket sales, let Y = weekly sales and X = shelf space. The least-squares regression equation to predict weekly sales based on shelf space is ŷ = 145 + 7.4x Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 11 of / 23 Math
Popper #10 Questions The least-squares regression equation to predict weekly sales (Y ) based on shelf space (X) is ŷ = 145 + 7.4x. 4. What is the slope of this linear equation? a) 145 b) 7.4 c) 19.5 d) None of these 5. What is the y-intercept of this linear equation? a) 145 b) 7.4 c) 19.5 d) None of these 6. If there is a shelf space of 12 feet what would be the weekly sales? a) 145 b) 7.4 c) 233.8 d) 18 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 12 of / 23 Math
Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: b = 3. Calculate the y-intercept: a = 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 13 of / 23 Math
Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: b = 3. Calculate the y-intercept: a = 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 13 of / 23 Math
Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: b = 3. Calculate the y-intercept: a = 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 13 of / 23 Math
Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: b = 3. Calculate the y-intercept: a = 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 13 of / 23 Math
Finding Equation for Coffee Sales Given these values determine the least-square regression line (LSRL) equation for predicting number sold based on shelf space. Explanatory variable: Shelf space (X) x = 12.5 feet, s x = 5.83874 feet. Response variable: Sales (Y ) ȳ = 237.5 units sold, s y = 52.2451 units sold. Correlation: r = 0.827 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 14 of / 23 Math
Popper #10 Questions The following are descriptive statistics for estimating cost of the car Y by the age of the car (X). Estimated Cost (Y ) ȳ = 10360.93 s y =5482.3372 Age (X) x = 5.214 s x = 2.940 Correlation of Estimated Cost and Age = r = -0.8224 7. Calculate the slope b of the regression line to predict cost (Y ) by age (X). a) 5.214 b) -0.8224 c) -1533.56 d) 0.0004 8. Calculate the y-intercept, a. a) 10355.716 b) 18356.911 c) 15889.11 d) 0.6763 9. Give the least squares equation of the line to predict cost by age. a) ŷ = 18356.91 1533.56x c) 10360.93 = 18357-1533.56(5.2) b) ŷ = 10355.72 0.8224x d) None of these Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 15 of / 23 Math
Least-Square Regression Line Using R Use the command: lm(yvariable Xvariable). For example of the coffee sales. > lm(shelf$num_sold~shelf$shelf_space) Call: lm(formula = shelf$num_sold ~ shelf$shelf_space) Coefficients: (Intercept) shelf$shelf_space 145.0 7.4 Where the Intercept value is a and the other value is b, the slope. Thus the equation in this example is: ŷ = 145 + 7.4x. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 16 of / 23 Math
Least-Squares Regression Line Using TI-83(84) 1. Make sure the diagnostics is turned on by clicking 2ND CATALOG and scroll down to Diagnostics. 2. Choose STAT CALC then 8:LinReg(a+bx). 3. Make sure your Xlist is L1 and Ylist is L2 and select Calculate. 4. a is your y-intercept and b is your slope. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 17 of / 23 Math
Prediction The equation of the regression line makes prediction easy. Substitute an x-value into the equation. Predict the weekly sales of coffee with shelf space of 12 feet. ŷ = 145 + 7.4 12 = 233.8 Thus the 233.80 is the predicted weekly number of units of coffee sold that is has 12 feet of shelf space. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 18 of / 23 Math
Popper #10 Questions The least-squares regression line equation to predict cost of the car(y ) by the age of the car (X) is: ŷ = 18358 1534x. 10. Predict the cost of an automobile for a 5 year old car. a) $18358 b) $1534 c) $10688 d) $11.96 11. Predict the cost of an automobile for a 20 year old car. a) $18358 b) -$1534 c) -$12,322 d) $49038 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 19 of / 23 Math
Facts about least-squares regression Fact 1: A change of one standard deviation in x corresponds to a change of r standard deviations in y. (b 1 slope) Fact 2: The least-squares regression line always passes through the point ( x, ȳ). That is why we can get b 0 the y-intercept. Fact 3: The distinction between explanatory and response variables is essential in regression. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 20 of / 23 Math
R 2 The square of the correlation R 2 describes the strength of a straight-line relationship. The formal name of it is called the Coefficient of Determination. This is the percent of variation of Y that is explained by this equation. R 2 is a measure of how successful the regression was in explaining the response. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 21 of / 23 Math
R 2 for coffee sales The correlation is r = 0.827 The coefficient of determination is R 2 = 0.827 2 = 0.684 This means that 68.4% of the variation in coffee sales can be explained by the equation. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 22 of / 23 Math
Popper #10 Questions The following is an output of a least-squares regression equation in the TI-84. 12. What percent of the variation in the y-variable can be explained by this regression equation? a) 67% b) 6.7845% c) 77% d) 88% Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 23 of / 23 Math