Non-linearities in Simple Regression

Similar documents
Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Multiple regression - a brief introduction

Study 2: data analysis. Example analysis using R

MODEL SELECTION CRITERIA IN R:

Regression and Simulation

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Jaime Frade Dr. Niu Interest rate modeling

MA Notes, Lesson 19 Textbook (calculus part) Section 2.4 Exponential Functions

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS

> attach(grocery) > boxplot(sales~discount, ylab="sales",xlab="discount")

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

Generalized Linear Models

NHY examples. Bernt Arne Ødegaard. 23 November Estimating dividend growth in Norsk Hydro 8

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

MA Lesson 27 Section 4.1

f ( x) a, where a 0 and a 1. (Variable is in the exponent. Base is a positive number other than 1.)

6 Multiple Regression

Homework Assignment Section 3

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

Economics 424/Applied Mathematics 540. Final Exam Solutions

f x f x f x f x x 5 3 y-intercept: y-intercept: y-intercept: y-intercept: y-intercept of a linear function written in function notation

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Continuous Distributions

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

R is a collaborative project with many contributors. Type contributors() for more information.

Chapter 1 Review Applied Calculus 60

Final Exam Suggested Solutions

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Midterm

Econometric Methods for Valuation Analysis

Test # 1 Review Math MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Stat 401XV Exam 3 Spring 2017

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS. Pooja Shivraj Southern Methodist University

Chapter 5 Project: Broiler Chicken Production. Name Name

University of Zürich, Switzerland

Name Name. To enter the data manually, go to the StatCrunch website ( and log in (new users must register).

> > is.factor(scabdata$trt) [1] TRUE > is.ordered(scabdata$trt) [1] FALSE > scabdata$trtord <- ordered(scabdata$trt, +

1 Estimating risk factors for IBM - using data 95-06

MATH COLLEGE ALGEBRA/BUSN - PRACTICE EXAM #2 - SUMMER DR. DAVID BRIDGE

Statistics 101: Section L - Laboratory 6

Lecture Note: Analysis of Financial Time Series Spring 2008, Ruey S. Tsay. Seasonal Time Series: TS with periodic patterns and useful in

The Least Squares Regression Line

University of New South Wales Semester 1, Economics 4201 and Homework #2 Due on Tuesday 3/29 (20% penalty per day late)

Math of Finance Exponential & Power Functions

SFSU FIN822 Project 1

Introduction to Population Modeling

8.2 Exercises. Section 8.2 Exponential Functions 783

Homework Assignment Section 3

Linear regression model

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Final Exam

FINITE MATH LECTURE NOTES. c Janice Epstein 1998, 1999, 2000 All rights reserved.

A Brief Illustration of Regression Analysis in Economics John Bucci. Okun s Law

MLC at Boise State Logarithms Activity 6 Week #8

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Civil and Environmental Engineering

SAS Simple Linear Regression Example

Predicting Charitable Contributions

State Ownership at the Oslo Stock Exchange. Bernt Arne Ødegaard

Probability distributions relevant to radiowave propagation modelling

Monetary Economics Risk and Return, Part 2. Gerald P. Dwyer Fall 2015

Exercises in Mathematcs for NEGB01, Quantitative Methods in Economics. Part 1: Wisniewski Module A and Logic and Proofs in Mathematics

The Norwegian State Equity Ownership

Most of the transformations we will deal with will be in the families of powers and roots: p X -> (X -1)/-1.

Multiple linear regression

Case Study: Applying Generalized Linear Models

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2010, Mr. Ruey S. Tsay. Solutions to Midterm

Stat3011: Solution of Midterm Exam One

Chapter 3: Answers to Questions and Problems

Jacob: What data do we use? Do we compile paid loss triangles for a line of business?

14.1 Fitting Exponential Functions to Data

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

arxiv: v1 [q-fin.ec] 28 Apr 2014

CHAPTER 6. Exponential Functions

Business Statistics: A First Course

Normal Probability Distributions

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Supervisor: Prof. univ. dr. MOISA ALTAR MSc Student IONITA RODICA OANA

Panel Data. November 15, The panel is balanced if all individuals have a complete set of observations, otherwise the panel is unbalanced.

Exploring Slope. High Ratio Mountain Lesson 11-1 Linear Equations and Slope

############################ ### toxo.r ### ############################

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Bob Brown, CCBC Essex Math 163 College Algebra, Chapter 4 Section 2 1 Exponential Functions

Exchange Rate Regime Analysis for the Indian Rupee

Math 250A (Fall 2008) - Lab II (SOLUTIONS)

Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases.

Logistic Regression. Logistic Regression Theory

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Midterm

CHAPTER 4 APPENDIX DEMAND THEORY A MATHEMATICAL TREATMENT

Week 19 Algebra 2 Assignment:

Non-Inferiority Tests for the Ratio of Two Means in a 2x2 Cross-Over Design

Exchange Rate Regime Classification with Structural Change Methods

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

The SAS System 11:03 Monday, November 11,

Pricing Kernel. v,x = p,y = p,ax, so p is a stochastic discount factor. One refers to p as the pricing kernel.

Internet Appendix to The Booms and Busts of Beta Arbitrage

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm

Exponential functions: week 13 Business

Monotonically Constrained Bayesian Additive Regression Trees

Exchange Rate Regime Analysis for the Indian Rupee

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Final Exam

Transcription:

Non-linearities in Simple Regression 1. Eample: Monthly Earnings and Years of Education In this tutorial, we will focus on an eample that eplores the relationship between total monthly earnings and years of education. The code below downloads a CSV file that includes data from 1980 for 935 individuals on variables including their total monthly earnings (MonthlyEarnings) and a number of variables that could influence income, including years of education (YearsEdu) and assigns it to a dataset that we call wages. wages <- read.csv("http://murrayla.org/datasets/wage2.csv"); We estimate the simple regression with the following call to lm() and store the output in an object we call lmwages: lmwages <- lm(wages$monthlyearnings ~ wages$yearsedu) 2. Log Function It may not be appropriate that there is a linear relationship between years of education and monthly earnings. With a linear relationship, we assume that each year of education results in the same dollar increase in monthly earnings. It may be more appropriate to suggest that each year of education leads to a similar percentage increase in monthly earnings. To estimate such a relationship, we estimate the following regression equation that includes the natural logarithm of the dependent variable (monthly earnings): ln(y i ) = b 0 + b 1 i + e i where y i denotes the income of individual i, ln(y i ) is the natural logarithm of y i, and i denotes the number of years of education of individual i. When we have a relationship of the form ln(y) = b 0 + b 1, this can be transformed to the eponential function, y = ep b 0 + b 1. To get an idea what this function looks like, we can make up some numbers for b 0 and b 1 and plot the function. In the line of code below we create a function called epfun and set it equal to the function, f() = ep 2 + 5. epfun <- function() ep(2 + 5*) We can see a plot of this function with a simple call to plot(): plot(epfun) 1

epfun 0 200 400 600 800 We can see that this kind of relationship implies that the outcome variable, y, increases and an increasing rate as increases. Let us look at what the function looks like if instead the coefficient for b 1 is negative. In the code below, we create the function epfun, but with a coefficient on equal to 5 instead of +5, then plot it. epfun <- function() ep(2-5*) plot(epfun) epfun 0 2 4 6 Here we see this means that the outcome variable, y, decreases as increases, and at a decreasing rate. 2

3. Regression with a log dependent variable We estimate the regression equation with the log of monthly earnings as the outcome variable with the following call to lm() that assigns the output to an object that we call loglmwages: loglmwages <- lm(log(wages$monthlyearnings) ~ wages$yearsedu) We can view the summary of the regression output with the following call to summary(): summary(loglmwages) Call: lm(formula = log(wages$monthlyearnings) ~ wages$yearsedu) Residuals: Min 1Q Median 3Q Ma -1.94620-0.24832 0.03507 0.27440 1.28106 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 5.973062 0.081374 73.40 <2e-16 *** wages$yearsedu 0.059839 0.005963 10.04 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4003 on 933 degrees of freedom Multiple R-squared: 0.09742, Adjusted R-squared: 0.09645 F-statistic: 100.7 on 1 and 933 DF, p-value: < 2.2e-16 The coefficient years of education is equal to 0.0598, which is how much the predicted value for ln(monthly earnings) increases when educational attainment increases by one year. We can epress this mathematically as, = 0.0598 It turns out that this is a close approimation to the percentage increase in y from a one unit increase in. That is, ln(ŷ) % ŷ. Therefore, our regression predicts that a one additional year of education is associated with approimately a 6% higher monthly salary. 4. Log-Log Relationship Let us instead consider the possibility for the following non-linear relationship between monthly earnings and educational attainment: ln(y i ) = b 0 + b 1 ln( i ) + ɛ i 3

Let us make up some numbers for b 0 and b 1 to visualize what such a function looks like. First let us solve for y i by taking the eponential function of both sides of the equation. This yields the equivalent function: y i = ep b 0 + b 1 ln( i ) + ɛ i In the code below, we make up a function with b 0 = 2 and b 1 = 5, call it loglogfun and plot the curve to see what it looks like: loglogfun <- function() ep(2 + 5*log()) plot(loglogfun) loglogfun 0 2 4 6 The function also predicts that y increases at an increasing rate with, but an eamination of the magnitude of the y ais labels reveals rate of increase in smaller. We can estimate a log-log regression of monthly earnings on educational attainment with the following call to lm(), where the output is assigned to an object we call lglglmwages: lglglmwages <- lm( log(wages$monthlyearnings) ~ log(wages$yearsedu) ) We summarize the output with the following call to the summary() function: summary(lglglmwages) Call: lm(formula = log(wages$monthlyearnings) ~ log(wages$yearsedu)) Residuals: Min 1Q Median 3Q Ma -1.94925-0.24818 0.03866 0.27282 1.27167 4

Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 4.63932 0.21297 21.78 <2e-16 *** log(wages$yearsedu) 0.82694 0.08215 10.07 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4002 on 933 degrees of freedom Multiple R-squared: 0.09796, Adjusted R-squared: 0.09699 F-statistic: 101.3 on 1 and 933 DF, p-value: < 2.2e-16 The coefficient b 1 = 0.8269 is a measure of how much the natural log of earnings increases when the natural log of educational attainment increases by one unit, epressed mathematically as, ln() = 0.8269 It turns out that this is approimately equal to the predicted percentage increase in y when increases by one percent. Mathematically, ln() = % ŷ % = 0.8269 That is, monthly earnings on average are 0.83% higher for each 1% increase in education attainment. In economics, we call a measure like this an elasticity. 5