The Least Squares Regression Line

Similar documents
AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:

Stat3011: Solution of Midterm Exam One

Binomial and Geometric Distributions

σ e, which will be large when prediction errors are Linear regression model

Linear regression model

The instructions on this page also work for the TI-83 Plus and the TI-83 Plus Silver Edition.

Statistics TI-83 Usage Handout

Business Statistics: A First Course

Linear Modeling Business 5 Supply and Demand

Name Period. Linear Correlation

Using the TI-83 Statistical Features

Chapter 5 Project: Broiler Chicken Production. Name Name

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Binomal and Geometric Distributions

Graphing Calculator Appendix

Standard Normal Calculations

(i.e. the rate of change of y with respect to x)

Chapter 14. Descriptive Methods in Regression and Correlation. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 14, Slide 1

Econometrics and Economic Data

Line of Best Fit Our objective is to fit a line in the scatterplot that fits the data the best Line of best fit looks like:

FINITE MATH LECTURE NOTES. c Janice Epstein 1998, 1999, 2000 All rights reserved.

WEB APPENDIX 8A 7.1 ( 8.9)

Use the data you collected and plot the points to create scattergrams or scatter plots.

Statistics 101: Section L - Laboratory 6

Non-linearities in Simple Regression

Inverse Normal Distribution and Approximation to Binomial

Name Date. Key Math Concepts

Math Week in Review #1. Perpendicular Lines - slopes are opposite (or negative) reciprocals of each other

Risk Analysis. å To change Benchmark tickers:

Forecasting Chapter 14

Probability & Statistics Modular Learning Exercises

d) What is the slope? Interpret in the context of the problem.

rise m x run The slope is a ratio of how y changes as x changes: Lines and Linear Modeling POINT-SLOPE form: y y1 m( x

Correlation Sections 4.5, 4.6

Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases.

PRACTICE PROBLEMS FOR EXAM 2

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

b) According to the statistics above the graph, the slope is What are the units and meaning of this value?

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

CHAPTER 10 DETERMINING HOW COSTS BEHAVE. Difference in costs Difference in machine-hours $5,400 $4,000. = $0.35 per machine-hour

SJAM MPM 1D Unit 5 Day 13

Regression. Lecture Notes VII

MAT Pre-Calculus Class Worksheet - Word Problems Chapter 1

When Is Factoring Used?

Algebra 1 Unit 3: Writing Equations

1. (9; 3ea) The table lists the survey results of 100 non-senior students. Math major Art major Biology major

MBF1923 Econometrics Prepared by Dr Khairul Anuar

Activity Two: Investigating Slope and Y-Intercept in the Real World. Number of Tickets Cost 8 $ $11.00 $

Risk Reduction Potential

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Cost (in dollars) 0 (free) Number of magazines purchased

Final Exam Suggested Solutions

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

Subject: Psychopathy

Mathematics: A Christian Perspective

5.5: LINEAR AUTOMOBILE DEPRECIATION OBJECTIVES

Section Linear Functions and Math Models

Homework Assignment Section 3

Correlation and Regression Applet Activity

Common Core Algebra L clone 4 review R Final Exam

NOBEL COLLEGE Assignment Questions. NOBEL COLLEGE Assignment Questions

First Exam for MTH 23

Standard Normal, Inverse Normal and Sampling Distributions

Important definitions and helpful examples related to this project are provided in Chapter 3 of the NAU MAT 114 course website.

Homework Assignment Section 3

Econometric Methods for Valuation Analysis

Name Name. To enter the data manually, go to the StatCrunch website ( and log in (new users must register).

Nobel College HOME ASSIGNMENT

Mathematics Success Grade 8

Additional Case Study One: Risk Analysis of Home Purchase

4.1 Write Linear Equations by Using a Tables of Values

MATH 217 Test 2 Version A

CHAPTER 8: INDEX MODELS

MATH 181-Quadratic Equations (7 )

Quantitative Methods

Chapter 5. Forecasting. Learning Objectives

11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression

Chapter 12. Homework. For each situation below, state the independent variable and the dependent variable.

Multiple Regression. Review of Regression with One Predictor

WEEK 2 REVIEW. Straight Lines (1.2) Linear Models (1.3) Intersection Points (1.4) Least Squares (1.5)

Chapter 5 Summarizing Bivariate Data

BARUCH COLLEGE MATH 2003 SPRING 2006 MANUAL FOR THE UNIFORM FINAL EXAMINATION

f x f x f x f x x 5 3 y-intercept: y-intercept: y-intercept: y-intercept: y-intercept of a linear function written in function notation

TRUE/FALSE. Write 'T' if the statement is true and 'F' if the statement is false.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price

Considerations for Planning and Scheduling Part 3 Blending the Planned Maintenance Program and Reactive Maintenance Plan

Multiple linear regression

STAB22 section 2.2. Figure 1: Plot of deforestation vs. price

This homework assignment uses the material on pages ( A moving average ).

PRMIA Exam 8002 PRM Certification - Exam II: Mathematical Foundations of Risk Measurement Version: 6.0 [ Total Questions: 132 ]

Rand Final Pop 2. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Probability & Statistics Modular Learning Exercises

Web Extension: Continuous Distributions and Estimating Beta with a Calculator

Regression and Simulation

Name: Common Core Algebra L R Final Exam 2015 CLONE 3 Teacher:

List the quadrant(s) in which the given point is located. 1) (-10, 0) A) On an axis B) II C) IV D) III

The Simple Regression Model

* The Unlimited Plan costs $100 per month for as many minutes as you care to use.

3. The distinction between variable costs and fixed costs is:

Transcription:

The Least Squares Regression Line Section 5.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 pm - 3:30 pm 620 PGH & 5:30 pm - 7:00 pm CASA Department of Mathematics University of Houston March 3, 2016 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 1 of / 23 Math

Outline 1 Beginning Questions 2 Least-Squares Regression 3 Prediction 4 Coefficient of Determination Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 2 of / 23 Math

Popper Set Up Fill in all of the proper bubbles. Use a #2 pencil. This is popper number 10. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 3 of / 23 Math

Popper #10 Questions We are looking at the relationship between how many items are on a shelf (shelf space) and the number of items sold in a week. The correlation coefficeint is r = 0.827. 1. What is the direction of the relationship between shelf space and weekly sales? a) Positive b) Negative c) No direction 2. What is the strength the relationship between shelf space and weekly sales? a) Strong b) Moderate c) Weak 3. What form appears to be in the relationship between shelf space and weekly sales? a) Linear b) Non-linear Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 4 of / 23 Math

Scatterplot Number Sold 150 200 250 300 5 10 15 20 Shelf Space(feet) Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 5 of / 23 Math

Examining relationships Correlation measures the direction and strength of the straight-line relationship between two quantitative variables. If a scatterpolt shows a linear relationship, we would like to summarize this overall pattern by drawing a line on the scatterplot. A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. This equation is used when one of the variables helps explain or predict the other. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 6 of / 23 Math

Least-Squares regression The least-squares regression line (LSRL) of Y on X is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 7 of / 23 Math

Least-Squares Number Sold 150 200 250 300 5 10 15 20 Shelf Space(feet) Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 8 of / 23 Math

Example The marketing manager of a supermarket chain would like to use shelf space to predict the sales of coffee. A random sample of 12 stores is selected, with the following results. Store Shelf Space (ft) Weekly Sales (# sold) 1 5 160 2 5 220 3 5 140 4 10 190 5 10 240 6 10 260 7 15 230 8 15 270 9 15 280 10 20 260 11 20 290 12 20 310 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 9 of / 23 Math

Equation of the least-squares regression line Let x be the explanatory variable and y be the response variable for n individuals. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. The least-squares line is the equation ŷ = a + bx Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 10 of / 23 Math

Equation of the least-squares regression line The least-squares line is the equation ŷ = a + bx. In the example of the supermarket sales, let Y = weekly sales and X = shelf space. The least-squares regression equation to predict weekly sales based on shelf space is ŷ = 145 + 7.4x Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 11 of / 23 Math

Popper #10 Questions The least-squares regression equation to predict weekly sales (Y ) based on shelf space (X) is ŷ = 145 + 7.4x. 4. What is the slope of this linear equation? a) 145 b) 7.4 c) 19.5 d) None of these 5. What is the y-intercept of this linear equation? a) 145 b) 7.4 c) 19.5 d) None of these 6. If there is a shelf space of 12 feet what would be the weekly sales? a) 145 b) 7.4 c) 233.8 d) 18 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 12 of / 23 Math

Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: b = 3. Calculate the y-intercept: a = 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 13 of / 23 Math

Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: b = 3. Calculate the y-intercept: a = 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 13 of / 23 Math

Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: b = 3. Calculate the y-intercept: a = 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 13 of / 23 Math

Calculating the least squares regression equation by hand Let X be the explanatory variable and Y be the response variable for n individuals. 1. From the data calculate the means x and ȳ and the standard deviations s x and s y of the two variables, and their correlation r. 2. Calculate the slope: b = 3. Calculate the y-intercept: a = 4. Then the equation is: ŷ = a + bx. Where you input the slope into b and the y-intercept into a and leave y and x alone (do not put any numbers into y and x). Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 13 of / 23 Math

Finding Equation for Coffee Sales Given these values determine the least-square regression line (LSRL) equation for predicting number sold based on shelf space. Explanatory variable: Shelf space (X) x = 12.5 feet, s x = 5.83874 feet. Response variable: Sales (Y ) ȳ = 237.5 units sold, s y = 52.2451 units sold. Correlation: r = 0.827 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 14 of / 23 Math

Popper #10 Questions The following are descriptive statistics for estimating cost of the car Y by the age of the car (X). Estimated Cost (Y ) ȳ = 10360.93 s y =5482.3372 Age (X) x = 5.214 s x = 2.940 Correlation of Estimated Cost and Age = r = -0.8224 7. Calculate the slope b of the regression line to predict cost (Y ) by age (X). a) 5.214 b) -0.8224 c) -1533.56 d) 0.0004 8. Calculate the y-intercept, a. a) 10355.716 b) 18356.911 c) 15889.11 d) 0.6763 9. Give the least squares equation of the line to predict cost by age. a) ŷ = 18356.91 1533.56x c) 10360.93 = 18357-1533.56(5.2) b) ŷ = 10355.72 0.8224x d) None of these Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 15 of / 23 Math

Least-Square Regression Line Using R Use the command: lm(yvariable Xvariable). For example of the coffee sales. > lm(shelf$num_sold~shelf$shelf_space) Call: lm(formula = shelf$num_sold ~ shelf$shelf_space) Coefficients: (Intercept) shelf$shelf_space 145.0 7.4 Where the Intercept value is a and the other value is b, the slope. Thus the equation in this example is: ŷ = 145 + 7.4x. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 16 of / 23 Math

Least-Squares Regression Line Using TI-83(84) 1. Make sure the diagnostics is turned on by clicking 2ND CATALOG and scroll down to Diagnostics. 2. Choose STAT CALC then 8:LinReg(a+bx). 3. Make sure your Xlist is L1 and Ylist is L2 and select Calculate. 4. a is your y-intercept and b is your slope. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 17 of / 23 Math

Prediction The equation of the regression line makes prediction easy. Substitute an x-value into the equation. Predict the weekly sales of coffee with shelf space of 12 feet. ŷ = 145 + 7.4 12 = 233.8 Thus the 233.80 is the predicted weekly number of units of coffee sold that is has 12 feet of shelf space. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 18 of / 23 Math

Popper #10 Questions The least-squares regression line equation to predict cost of the car(y ) by the age of the car (X) is: ŷ = 18358 1534x. 10. Predict the cost of an automobile for a 5 year old car. a) $18358 b) $1534 c) $10688 d) $11.96 11. Predict the cost of an automobile for a 20 year old car. a) $18358 b) -$1534 c) -$12,322 d) $49038 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 19 of / 23 Math

Facts about least-squares regression Fact 1: A change of one standard deviation in x corresponds to a change of r standard deviations in y. (b 1 slope) Fact 2: The least-squares regression line always passes through the point ( x, ȳ). That is why we can get b 0 the y-intercept. Fact 3: The distinction between explanatory and response variables is essential in regression. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 20 of / 23 Math

R 2 The square of the correlation R 2 describes the strength of a straight-line relationship. The formal name of it is called the Coefficient of Determination. This is the percent of variation of Y that is explained by this equation. R 2 is a measure of how successful the regression was in explaining the response. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 21 of / 23 Math

R 2 for coffee sales The correlation is r = 0.827 The coefficient of determination is R 2 = 0.827 2 = 0.684 This means that 68.4% of the variation in coffee sales can be explained by the equation. Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 22 of / 23 Math

Popper #10 Questions The following is an output of a least-squares regression equation in the TI-84. 12. What percent of the variation in the y-variable can be explained by this regression equation? a) 67% b) 6.7845% c) 77% d) 88% Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 Section pm - 3:30 5.3 pm 620 PGH & 5:30 pm - 7:00 pmmarch CASA 3, (Department 2016 23 of / 23 Math