Econometrics and Economic Data

Similar documents
The Simple Regression Model

The Simple Regression Model

MBF1923 Econometrics Prepared by Dr Khairul Anuar

GRAPHS IN ECONOMICS. Appendix. Key Concepts. Graphing Data

Universidade NOVA de Lisboa Faculdade de Economia

The basic goal of regression analysis is to use data to analyze relationships.

Impact of Unemployment and GDP on Inflation: Imperial study of Pakistan s Economy

Empirical Methods for Corporate Finance. Panel Data, Fixed Effects, and Standard Errors

Econometrics is. The estimation of relationships suggested by economic theory

Business Statistics: A First Course

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

First Exam for MTH 23

CBA Model Question Paper CO3. Paper 1

Stat3011: Solution of Midterm Exam One

Basic Regression Analysis with Time Series Data

Chapter 14. Descriptive Methods in Regression and Correlation. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 14, Slide 1

The Least Squares Regression Line

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

PRACTICE PROBLEMS FOR EXAM 2

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Econometric Methods for Valuation Analysis

WEB APPENDIX 8A 7.1 ( 8.9)

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

Regression. Lecture Notes VII

Stat 201: Business Statistics I Additional Exercises on Chapter Chapter 3

Random Variables and Probability Distributions

False_ The average revenue of a firm can be increasing in the firm s output.

NOBEL COLLEGE Assignment Questions. NOBEL COLLEGE Assignment Questions

Chapter 1 Microeconomics of Consumer Theory

E-322 Muhammad Rahman CHAPTER-3

Tests for the Difference Between Two Linear Regression Intercepts

Subject: Psychopathy

Government Tax Revenue, Expenditure, and Debt in Sri Lanka : A Vector Autoregressive Model Analysis

σ e, which will be large when prediction errors are Linear regression model

The line drawn for part (a) will depend on each student s subjective choice about the position of the line. For this reason, it has been omitted.

Correlation between Inflation Rates and Currency Values

AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:

BUSINESS MATHEMATICS & QUANTITATIVE METHODS

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price

Modelling Economic Variables

Problem 1 / 25 Problem 2 / 25 Problem 3 / 25 Problem 4 / 25

The Decreasing Trend in Cash Effective Tax Rates. Alexander Edwards Rotman School of Management University of Toronto

Estimation of Volatility of Cross Sectional Data: a Kalman filter approach

Macro Notes: Introduction to the Short Run

not to be republished NCERT Chapter 2 Consumer Behaviour 2.1 THE CONSUMER S BUDGET

Forecasting Chapter 14

Chapter 4 Level of Volatility in the Indian Stock Market

A Macroeconomic Theory of the Open Economy. Chapter 30

Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation,

Archana Khetan 05/09/ MAFA (CA Final) - Portfolio Management

by Sankar De and Manpreet Singh

Homework 1 Due February 10, 2009 Chapters 1-4, and 18-24

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

LINES AND SLOPES. Required concepts for the courses : Micro economic analysis, Managerial economy.

CHAPTER 2 Describing Data: Numerical

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Multivariate Statistics Lecture Notes. Stephen Ansolabehere

The Core of Macroeconomic Theory

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Linear Regression with One Regressor

The mean-variance portfolio choice framework and its generalizations

Learning Objectives. 1. Describe how the government budget surplus is related to national income.

Chapter 4 Variability

DATA SUMMARIZATION AND VISUALIZATION

BARUCH COLLEGE MATH 2003 SPRING 2006 MANUAL FOR THE UNIFORM FINAL EXAMINATION

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Final Exam, section 1. Thursday, May hour, 30 minutes

Statistical Evidence and Inference

Chapter 10 THE PARTIAL EQUILIBRIUM COMPETITIVE MODEL. Copyright 2005 by South-Western, a division of Thomson Learning. All rights reserved.

Economic Response Models in LookAhead

STARRY GOLD ACADEMY , , Page 1

Augmenting Okun s Law with Earnings and the Unemployment Puzzle of 2011

NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Risk and Return and Portfolio Theory

Establishing a framework for statistical analysis via the Generalized Linear Model

Multiple linear regression

Lecture 37 Sections 11.1, 11.2, Mon, Mar 31, Hampden-Sydney College. Independent Samples: Comparing Means. Robb T. Koether.

Homework Assignment Section 3

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

IN THIS LECTURE, YOU WILL LEARN:

Financial Economics: Risk Aversion and Investment Decisions, Modern Portfolio Theory

Chapter 6 Firms: Labor Demand, Investment Demand, and Aggregate Supply

Interrelationship between Profitability, Financial Leverage and Capital Structure of Textile Industry in India Dr. Ruchi Malhotra

Kemal Saatcioglu Department of Finance University of Texas at Austin Austin, TX FAX:

Demand and Supply for Residential Housing in Urban China. Gregory C Chow Princeton University. Linlin Niu WISE, Xiamen University.

Lecture Notes #3 Page 1 of 15

ECON Micro Foundations

Homework Assignment Section 3

Lucas s Investment Tax Credit Example

Chapter 7 Selected Answers

Rand Final Pop 2. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Macroeconomic Policy: Evidence from Growth Laffer Curve for Sri Lanka. Sujith P. Jayasooriya, Ch.E. (USA) Innovation4Development Consultants

Chapter 18: The Correlational Procedures

Random Variables and Applications OPRE 6301

HOUSEHOLDS INDEBTEDNESS: A MICROECONOMIC ANALYSIS BASED ON THE RESULTS OF THE HOUSEHOLDS FINANCIAL AND CONSUMPTION SURVEY*

Foreign Direct Investment and Economic Growth in Some MENA Countries: Theory and Evidence

DATABASE AND RESEARCH METHODOLOGY

Assignment 5 The New Keynesian Phillips Curve

THE ANALYSIS OF FACTORS INFLUENCING THE DEVELOPMENT OF SMALL AND MEDIUM SIZE ENTERPRISES ACTIVITIES

Transcription:

Econometrics and Economic Data Chapter 1 What is a regression? By using the regression model, we can evaluate the magnitude of change in one variable due to a certain change in another variable. For example, an economist can estimate the amount of change in food expenditure due to a certain change in the income of a household by using the regression model. A sociologist may want to estimate the increase in the crime rate due to a particular increase in the unemployment rate. Besides answering these questions, a regression model also helps predict the value of one variable for a given value of another variable. For example, by using the regression line, we can predict the (approximate) food expenditure of a household with a given income. 1

What is a regression? Economists investigating the relationship between food expenditure and income. What factors or variables does a household consider when deciding how much money it should spend on food every week or every month? Certainly, income of the household is a factor. Is it the only factor? Many other variables also affect food expenditure such as: Assets owned Size What is a regression? preferences and tastes any special dietary needs These variables are called independent or explanatory variables because they all vary independently, and they explain the variation in food expenditures among different households. In other words, these variables explain why different households spend different amounts of money on food. Food expenditure is called the dependent variable because it depends on the independent variables. Studying the effect of two or more independent variables on a dependent variable using regression analysis is called multiple regression. 2

What is a regression? If we choose only one (usually the most important) independent variable and study the effect of that single variable on a dependent variable, it is called a simple regression. A regression model is a mathematical equation that describes the relationship between two or more variables. A simple regression includes only two variables: one independent and one dependent. Note that whether it is a simple or a multiple regression analysis, it always includes one and only one dependent variable. It is the number of independent variables that changes in simple and multiple regressions. The relationship between two variables in a regression analysis is expressed by a mathematical equation called a regression equation or model. A regression equation, when plotted, may assume one of many possible shapes, including a straight line. A regression equation that gives a straight-line relationship between two variables is called a linear regression model; otherwise, the model is called a nonlinear regression model. The two diagrams next will show a linear and a nonlinear relationship between the dependent variable food expenditure and the independent variable income 3

Coefficient of x or slope y B x A (1) Dependent variable Constant term or Y-intercept Independent variable 4

Model 1 is called a deterministic model. It gives an exact relationship between x and y. This model simply states that y is determined exactly by x and for a given value of x there is one and only one (unique) value of y. However, in many cases the relationship between variables is not exact. For instance, if y is food expenditure and x is income, then model 1 would state that food expenditure is determined by income only and that all households with the same income spend the same amount on food. But as mentioned earlier, food expenditure is determined by many variables, only one of which is included in model 1. In reality, different households with the same income spend different amounts of money on food because of the differences in the sizes of the household, the assets they own, and their preferences and tastes. To take these variables into consideration and to make our model complete, we add another term to the right side of model 1. This term is called the random error term. It is denoted by (Greek letter epsilon) which makes the model 2 to be deterministic y A B x (2) 5

The random error term is included in the model to represent the following two phenomena. 1. Missing or omitted variables: The random error term is included to capture the effect of all those missing or omitted variables that have not been included in the model. 2. Random variation: Human behavior is unpredictable. A household may have many parties during one month and spend more than usual on food during that month. The variation in food expenditure for such reasons may be called random variation. In model 2, A and B are the population parameters. The regression line obtained for model 2 by using the population data is called the population regression line. The values of A and B in the population regression line are called the true values of the y-intercept and slope. As we know, population data are difficult to obtain. As a result, we almost always use sample data to estimate model 2. The values of the y-intercept and slope calculated from sample data on x and y are called the estimated values of A and B and are denoted by a and b 6

Using a and b, we write the estimated regression model as where ŷ (read as y hat) is the estimated or predicted value of y for a given value of x. Equation 3 is called the estimated regression model it gives the regression of y on x. yˆ a bx (3) Scatter Diagram Suppose we take a sample of seven households from a low to moderate income neighborhood and collect information on their incomes and food expenditures for the past month. The information obtained (in hundreds of dollars) is given in the table next. Each pair consists of one observation on income and a second on food expenditure. For example, the first household's income for the past month was $3500 and its food expenditure was $900. 7

By plotting all seven pairs of values, we obtain a scatter diagram or scatterplot. The following figure gives the scatter diagram for the data of the previous. Each dot in this diagram represents one household. A scatter diagram is helpful in detecting a relationship between two variables. 8

By looking at the scatter diagram, we can observe that there exists a strong linear relationship between food expenditure and income. If a straight line is drawn through the points, the points will be scattered closely around the line. In fact, we can draw many straight lines that pass through the points. Each line will give different values for a and b of model 3 In regression analysis, we try to find a line that best fits the points in the scatter diagram. Such a line provides the best possible description of the relationship between the dependent and independent variables. The least squares method, discussed in the next section, gives such a line. The line obtained by using the least squares method is called the least squares regression line. 9

The value of y obtained for a member from the survey is called the observed or actual value of y. As mentioned earlier, the value of y, denoted by ŷ, obtained for a given x by using the regression line is called the predicted value of y. The random error denotes the difference between the actual value of y and the predicted value of y for population data. For example, for a given household, is the difference between what this household actually spent on food during the past month and what is predicted using the population regression line. The is also called the residual because it measures the surplus (positive or negative) of actual food expenditure over what is predicted by using the regression model. If we estimate model 2 by using sample data, the difference between the actual y and the predicted y based on this estimation cannot be denoted by. The random error for the sample regression model is denoted by e. Thus, e is an estimator of. If we estimate model 2 using sample data, then the value of e is given by e = actual food expenditure predicted food expenditure = y- ŷ e is the vertical distance between the actual position of a household and the point on the regression line. 10

The value of an error is positive if the point that gives the actual food expenditure is above the regression line and negative if it is below the regression line. The sum of these errors is always zero. In other words, the sum of the actual food expenditures for seven households included in the sample will be the same as the sum of the food expenditures predicted from the regression model. e ( y yˆ ) 0 11

To find the line that best fits the scatter of points, we cannot minimize the sum of errors. Instead, we minimize the error sum of squares, denoted by SSE, which is obtained by adding the squares of errors. 2 2 e ( y yˆ ) 0 The values of a and b that give the minimum SSE are called the least squares estimates of A and B, and they are SSxy b and a y bx SS where, xx x y x 2 2 and xx SSxy xy SS x n n The least squares regression line is called the regression of y on x. ŷ a bx The equation above is for estimating a sample regression line. But if we have access to a population data set. We can find the population regression line by using the same formulas with a little adaptation. If we have access to population data, we replace a by A, b by B, and n by N in all these formulas, and use the values of Σx, Σy, Σxy, and Σx 2 calculated for population data to make the required computations. 12

Example: Find the least squares regression line for the data on incomes and food expenditures on the seven households given in the following Table. Use income as an independent variable and food expenditure as a dependent variable. Income x Expenditure y 35 9 49 15 21 7 39 11 15 5 28 8 25 9 xy 315 735 147 429 75 224 225 x 2 1225 2401 441 1521 225 784 625 212 64 2150 7222 x 212 / 7 30.2857 y 64 / 7 9.1429 (212)(64) SSxy 2150 211.7143 7 2 (212) SSxx 7222 801.4286 7 211.7143 b.2642 801.4286 a 9.1429 (.2642)(30.2857) 1.1414 yˆ 1.1414.2642x 13

Using this estimated regression model, we can find the predicted value of y for any specific value of x. Suppose we randomly select a household whose monthly income is $3500 so that x = 35. The predicted value of food expenditure for this household is ŷ = 1.1414+(.2642)(35) = $10.3884 hundred = $1,038.84 Based on our regression line, we predict that a household with a monthly income of $3500 is expected to spend $1038.84 per month on food. This value of ŷ can also be interpreted as a point estimator of the mean value of y for x = 35. We can state that, on average, all households with a monthly income of $3500 spend about $1038.84 per month on food. But in our data on seven households, there is one household whose income is $3500. The actual food expenditure for that household is $900 The difference between the actual and predicted values gives the error of prediction. Thus, the error of prediction for this household is e = y ŷ = 9.00 10.3884 = - $138.84 The negative error indicates that the predicted value of y is greater than the actual value of y. 14

Thus, if we use the regression model, this household's food expenditure is overestimated by $138.84. 13.2.3 Interpretation of a and b How do we interpret a = 1.1414 and b =.2642 obtained in previous example? Interpretation of a Consider a household with zero income. Using the estimated regression line, we get the predicted value of y for x = 0 as $114.14. We can state that a household with no income is expected to spend $114.14 per month on food. We can also state that the point estimate of the average monthly food expenditure for all households with zero income is $114.14. 15

We should be very careful when making this interpretation of a. In our sample of seven households, the incomes vary from a minimum of $1500 to a maximum of $4900. Hence, our regression line is valid only for the values of x between 15 and 49. If we predict y for a value of x outside this range, the prediction usually will not hold true. Interpretation of b The value of b in a regression model gives the change in y due to a change of one unit in x. By using the regression equation obtained in the example, we see: When x 30, yˆ 1.1414.2642(30) 9.0674 When x 30, yˆ 1.1414.2642(31) 9.3316 Hence, when x increased by one unit, from 30 to 31, ŷ increased by 9.3316 9.0674 =.2642, which is the value of b. Because our unit of measurement is hundreds of dollars, we can state that, on average, a $100 increase in income will result in a $26.42 increase in food expenditure. We can also state that, on average, a $1 increase in income of a household will increase the food expenditure by $.2642. 16

When b is positive, an increase in x will lead to an increase in y and a decrease in x will lead to a decrease in y. When b is positive, the movements in x and y are in the same direction. Such a relationship between x and y is called a positive linear relationship. When b is negative, an increase in x will lead to a decrease in y and a decrease in x will cause an increase in y. The changes in x and y in this case are in opposite directions. Such a relationship between x and y is called a negative linear relationship. 17

Assumptions of the Regression Model Like any other theory, the linear regression analysis is also based on certain assumptions. Consider the population regression model y A B x Four assumptions are made about this model. (4) These assumptions are made about the population regression model and not about the sample regression model. Assumptions of the Regression Model Assumption 1: The random error term has a mean equal to zero for each x. Assumption 2: The errors associated with different observations are independent. Assumption 3: For any given x, the distribution of errors is normal. Assumption 4: The distribution of population errors for each x has the same (constant) standard deviation, which is denoted by σ. 18

What is econometrics? Econometrics = use of statistical methods to analyze economic data Econometricians typically analyze nonexperimental data Typical goals of econometric analysis Estimating relationships between economic variables Testing economic theories and hypotheses Forecasting economic variables Evaluating and implementing government and business policy 19

Steps in econometric analysis 1) Economic model (this step is often skipped) 2) Econometric model Economic models Maybe micro- or macromodels Often use optimizing behaviour, equilibrium modeling, Establish relationships between economic variables Examples: demand equations, pricing equations, Economic model of crime (Becker (1968)) Derives equation for criminal activity based on utility maximization Hours spent in criminal activities Wage of criminal activities Wage for legal employment Other income Probability of getting caught Probability of conviction if caught Expected sentence Age Functional form of relationship not specified Equation could have been postulated without economic modeling 20

Model of job training and worker productivity What is effect of additional training on worker productivity? Formal economic theory not really needed to derive equation: Hourly wage Years of formal education Years of workforce experience Weeks spent in job training Other factors may be relevant, but these are the most important (?) Econometric model of criminal activity The functional form has to be specified Variables may have to be approximated by other quantities Measure of criminal activity Wage for legal employment Other income Frequency of prior arrests Unobserved determinants of criminal activity Frequency of conviction Average sentence length after conviction Age e.g. moral character, wage in criminal activity, family background 21

Econometric model of job training and worker productivity Unobserved determinants of the wage Hourly wage Years of formal education Years of workforce experience Weeks spent in job training e.g. innate ability, quality of education, family background Most of econometrics deals with the specification of the error Econometric models may be used for hypothesis testing For example, the parameter represents effect of training on wage How large is this effect? Is it different from zero? Econometric analysis requires data Different kinds of economic data sets Cross-sectional data Time series data Pooled cross sections Panel/Longitudinal data Econometric methods depend on the nature of the data used Use of inappropriate methods may lead to misleading results 22

Cross-sectional data sets Sample of individuals, households, firms, cities, states, countries, or other units of interest at a given point of time/in a given period Cross-sectional observations are more or less independent For example, pure random sampling from a population Sometimes pure random sampling is violated, e.g. units refuse to respond in surveys, or if sampling is characterized by clustering Cross-sectional data typically encountered in applied microeconomics Cross-sectional data set on wages and other characteristics Indicator variables (1=yes, 0=no) Observation number Hourly wage 23

Cross-sectional data on growth rates and country characteristics Growth rate of real per capita GDP Government consumtion as percentage of GDP Adult secondary education rates Time series data Observations of a variable or several variables over time For example, stock prices, money supply, consumer price index, gross domestic product, annual homicide rates, automobile sales, Time series observations are typically serially correlated Ordering of observations conveys important information Data frequency: daily, weekly, monthly, quarterly, annually, Typical features of time series: trends and seasonality Typical applications: applied macroeconomics and finance 24

Time series data on minimum wages and related variables Average minimum wage for given year Average coverage rate Unemployment rate Gross national product Pooled cross sections Two or more cross sections are combined in one data set Cross sections are drawn independently of each other Pooled cross sections often used to evaluate policy changes Example: Evaluate effect of change in property taxes on house prices Random sample of house prices for the year 1993 A new random sample of house prices for the year 1995 Compare before/after (1993: before reform, 1995: after reform) 25

Pooled cross sections on housing prices Property tax Size of house in square feet Number of bathrooms Before reform After reform Panel or longitudinal data The same cross-sectional units are followed over time Panel data have a cross-sectional and a time series dimension Panel data can be used to account for time-invariant unobservables Panel data can be used to model lagged responses Example: City crime statistics; each city is observed in two years Time-invariant unobserved city characteristics may be modeled Effect of police on crime rates may exhibit time lag 26

Two-year panel data on city crime statistics Each city has two time series observations Number of police in 1986 Number of police in 1990 Causality and the notion of ceteris paribus Definition of causal effect of on : "How does variable change if variable is changed but all other relevant factors are held constant Most economic questions are ceteris paribus questions It is important to define which causal effect one is interested in It is useful to describe how an experiment would have to be designed to infer the causal effect in question 27

Causal effect of fertilizer on crop yield "By how much will the production of soybeans increase if one increases the amount of fertilizer applied to the ground" Implicit assumption: all other factors that influence crop yield such as quality of land, rainfall, presence of parasites etc. are held fixed Experiment: Choose several one-acre plots of land; randomly assign different amounts of fertilizer to the different plots; compare yields Experiment works because amount of fertilizer applied is unrelated to other factors influencing crop yields Measuring the return to education "If a person is chosen from the population and given another year of education, by how much will his or her wage increase? " Implicit assumption: all other factors that influence wages such as experience, family background, intelligence etc. are held fixed Experiment: Choose a group of people; randomly assign different amounts of eduction to them (infeasable!); compare wage outcomes Problem without random assignment: amount of education is related to other factors that influence wages (e.g. intelligence) 28

Effect of law enforcement on city crime level "If a city is randomly chosen and given ten additional police officers, by how much would its crime rate fall? " Alternatively: "If two cities are the same in all respects, except that city A has ten more police officers, by how much would the two cities crime rates differ? " Experiment: Randomly assign number of police officers to a large number of cities In reality, number of police officers will be determined by crime rate (simultaneous determination of crime and number of police) Effect of the minimum wage on unemployment "By how much (if at all) will unemployment increase if the minimum wage is increased by a certain amount (holding other things fixed)? " Experiment: Government randomly chooses minimum wage each year and observes unemployment outcomes Experiment will work because level of minimum wage is unrelated to other factors determining unemployment In reality, the level of the minimum wage will depend on political and economic factors that also influence unemployment 29

Testing predictions of economic theories Economic theories are not always stated in terms of causal effects For example, the expectations hypothesis states that long term interest rates equal compounded expected short term interest rates An implicaton is that the interest rate of a three-months T-bill should be equal to the expected interest rate for the first three months of a six-months T-bill; this can be tested using econometric methods 30