Regression. Lecture Notes VII

Similar documents
Correlation and Regression Applet Activity

Stat3011: Solution of Midterm Exam One

Chapter 14. Descriptive Methods in Regression and Correlation. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 14, Slide 1

Linear functions Increasing Linear Functions. Decreasing Linear Functions

AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:

Linear Regression with One Regressor

Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation,

Rand Final Pop 2. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Financial Applications Involving Exponential Functions

Survey Sampling, Fall, 2006, Columbia University Homework assignments (2 Sept 2006)

rise m x run The slope is a ratio of how y changes as x changes: Lines and Linear Modeling POINT-SLOPE form: y y1 m( x

Econometrics and Economic Data

b) According to the statistics above the graph, the slope is What are the units and meaning of this value?

Objective Today I will calculate the linear depreciation of an automobile. Bellwork 1) What do you think depreciate means?

5.5: LINEAR AUTOMOBILE DEPRECIATION OBJECTIVES

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price

Risk Analysis. å To change Benchmark tickers:

f x f x f x f x x 5 3 y-intercept: y-intercept: y-intercept: y-intercept: y-intercept of a linear function written in function notation

Business Statistics: A First Course

GRAPHS IN ECONOMICS. Appendix. Key Concepts. Graphing Data

Introduction to Population Modeling

Lecture Notes #3 Page 1 of 15

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Discrete Random Variables

(i.e. the rate of change of y with respect to x)

FINITE MATH LECTURE NOTES. c Janice Epstein 1998, 1999, 2000 All rights reserved.

Homework Assignment Section 3

Probability & Statistics Modular Learning Exercises

Algebra 1 Unit 3: Writing Equations

1 Describing Distributions with numbers

Review Exercise Set 13. Find the slope and the equation of the line in the following graph. If the slope is undefined, then indicate it as such.

σ e, which will be large when prediction errors are Linear regression model

MATH 143: Introduction to Probability and Statistics Worksheet for Tues., Dec. 7: What procedure?

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

STT 315 Handout and Project on Correlation and Regression (Unit 11)

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Piecewise-Defined Functions

Business Statistics 41000: Probability 3

Subject: Psychopathy

Multivariate Statistics Lecture Notes. Stephen Ansolabehere

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Homework Assignment Section 3

The Least Squares Regression Line

CS 237: Probability in Computing

MATH 143: Introduction to Probability and Statistics Worksheet 9 for Thurs., Dec. 10: What procedure?

Lecture Notes 1 Part B: Functions and Graphs of Functions

SJAM MPM 1D Unit 5 Day 13

The line drawn for part (a) will depend on each student s subjective choice about the position of the line. For this reason, it has been omitted.

Math 1 EOC Review Parallel Problems

3. Joyce needs to gather data that can be modeled with a linear function. Which situation would give Joyce the data she needs?

Business Statistics 41000: Probability 4

CHAPTER 2 Describing Data: Numerical

Lesson 2.6 Creating and Graphing Linear Equations in Two Variables

Chapter 6 Analyzing Accumulated Change: Integrals in Action

Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases.

Multiple linear regression

Statistics 511 Additional Materials

Stat 328, Summer 2005

DATA HANDLING Five-Number Summary

MLC at Boise State Logarithms Activity 6 Week #8

Linear regression model

Data Analysis and Statistical Methods Statistics 651

Measure of Variation

We take up chapter 7 beginning the week of October 16.

CHAPTER 2 RISK AND RETURN: Part I

Lecture 3: Data Description - Multiple Attributes

Unit 1 Maths Methods (CAS) Exam 2013 Thursday June 6th pm

Session 5: Associations

Econometric Methods for Valuation Analysis

Jacob: What data do we use? Do we compile paid loss triangles for a line of business?

MLC at Boise State Lines and Rates Activity 1 Week #2

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

S14 Exponential Growth and Decay (Graphing Calculator or App Needed)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Unit2: Probabilityanddistributions. 3. Normal distribution

STAT 113 Variability

PRACTICE PROBLEMS FOR EXAM 2

Double Exponential Smoothing

Notes on a Basic Business Problem MATH 104 and MATH 184 Mark Mac Lean (with assistance from Patrick Chan) 2011W

Name: Class: Date: in general form.

ECO671, Spring 2014, Sample Questions for First Exam

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

Predicting Charitable Contributions

Exploratory Data Analysis

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Mathematics Success Level H

MA 162: Finite Mathematics - Chapter 1

Biol 356 Lab 7. Mark-Recapture Population Estimates

Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 18 PERT

Random Variables. Note: Be sure that every possible outcome is included in the sum and verify that you have a valid probability model to start with.

Multiple Regression. Review of Regression with One Predictor

COST-VOLUME-PROFIT ANALYSIS

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Finance Mathematics. Part 1: Terms and their meaning.

WEB APPENDIX 8A 7.1 ( 8.9)

Quadratic Modeling Elementary Education 10 Business 10 Profits

Keynesian Theory (IS-LM Model): how GDP and interest rates are determined in Short Run with Sticky Prices.

Excel-Based Active Learning in the Management Accounting Course K A R E N W. B R A U N C A S E W E S T E R N R E S E R V E U N I V E R S I T Y

Topic #1: Evaluating and Simplifying Algebraic Expressions

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Transcription:

Regression Lecture Notes VII Statistics 112, Fall 2002

Outline Predicting based on Use of the conditional mean (the regression function) to make predictions. Prediction based on a sample. Regression line. Least squares line. Reading for today: Section 2.3. Also review the concept of a conditional distribution and the conditional mean from chapter 4. Reading for next time: Section 2.4.

A Prediction Problem You are trying to determine how many years to stay in school. You are currently focusing on the economic aspects and want to predict what your earnings will be for various levels of education. Suppose that you have available the joint distribution of education and earnings from a population of individuals like yourself. Earnings $ 30,000 $ 40,000 $ 50,000 $ 60,000 12 0.7 0.1 0.1 0.1 Educ. 14 0.1 0.7 0.1 0.1 16 0.1 0.1 0.7 0.1 18 0.1 0.1 0.1 0.7 Table 1: Conditional distribution of earnings given education What is your prediction of your earnings if you only graduate from high school? What is your prediction of your earnings if you graduate from college?

Conditional Mean for Prediction One reasonable approach to predicting your earnings for a given level of schooling: use the mean level of earnings for each level of schooling. Educ. 12 14 Mean Earnings 16 18 The distribution of earnings among those who obtained a given level of schooling (e.g., high school) is called the conditional distribution of earnings given the level of schooling. The mean (expecation) of earnings among those who obtained a certain level of earnings is called the conditional mean (expectation) of earnings for the given level of schooling.

The Regression Function A response variable measures an outcome of a study and is often denoted by. An explanatory variable explains, predicts or causes changes in the response variable and is often denoted by. We are often interested in predicting based on : Education Height Past DJIA Blood alcohol content Exposure to radioactive contamination Earnings Weight Present DJIA Probability of an accident Cancer mortality The conditional mean of given is a reasonable prediction of given. This conditional mean is denoted by and it is called the regression function (it is a function of ). No prediction is infallible because there is scatter about the conditional mean, e.g., a high school graduate will occasionally earn $ 60,000.

Linear Regression Function Sometimes, the regression function is approximately linear in, i.e., the plot of vs. is a straight line,. What are the interpretations of the intercept and slope of a linear regression function? The intercept is the value of The slope is the amount that when. increases for every one unit increase in, e.g., your expected earnings increases $ 6,000 for each extra year of education.

Prediction Based on a Sample Typically, we do not know the population regression function and must estimate it based on a sample from the population,. Suppose that in a random sample of your family and friends, you obtain the following data Name Educ. Earn. Name Educ. Earn. AC 12 $ 30,000 RE 16 $ 32,000 DE 12 $ 20,000 LB 16 $ 40,000 BJ 12 $ 25,000 AJ 16 $ 30,000 TL 14 $ 33,000 PG 18 $ 40,000 QJ 16 $ 38,000 JU 18 $ 50,000

A natural idea for predicting based on is to use the sample mean of given, i.e., the mean in a vertical strip of the scatterplot. Some problems with this approach: There may only be a small number of observations for each in the sample, meaning that the sample estimate of will have high variance. What is a 95% confidence interval for the mean earnings for college graduates (without a master s degree) assuming that the population distribution is normal? What if we want to predict the earnings for someone who attends one year of college? How would we do it? The plot of the sample means of given (the graph of averages) will typically not be a line even if the population regression function is linear.

The Regression Line A regression line is a straight line that best fits the graph of averages. It is an estimate of the population regression function when the population regression function is linear. Which line? Prediction errors. Because we want to use the line to predict based on, it is sensible to choose the line that makes the smallest prediction errors. The prediction error for point is error predicted based on i.e., the vertical distance between the regression line at and.

The Least Squares Line One way to judge the magnitude of prediction errors is to square them. The least squares line is the line that makes the sum of squared prediction errors smallest, i.e., the the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Least squares line:. What is the interpretation of and? What would you predict your earnings to be if you graduate from college? What would you predict your earnings to be if you attend one year of college? What would you predict your earnings to be if you obtain a doctorate? Which of the above predictions do you feel most confident about? Which prediction do you feel least confident about?