Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Similar documents
Final Exam - section 1. Thursday, December hours, 30 minutes

Advanced Econometrics

Problem Set 6 ANSWERS

Problem Set 9 Heteroskedasticty Answers

Quantitative Techniques Term 2

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Heteroskedasticity. . reg wage black exper educ married tenure

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

u panel_lecture . sum

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education

Labor Market Returns to Two- and Four- Year Colleges. Paper by Kane and Rouse Replicated by Andreas Kraft

The relationship between GDP, labor force and health expenditure in European countries

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

You created this PDF from an application that is not licensed to print to novapdf printer (

F^3: F tests, Functional Forms and Favorite Coefficient Models

1) The Effect of Recent Tax Changes on Taxable Income

Handout seminar 6, ECON4150

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

Solutions for Session 5: Linear Models

Econometrics is. The estimation of relationships suggested by economic theory

Effect of Education on Wage Earning

Modeling wages of females in the UK

Assignment #5 Solutions: Chapter 14 Q1.

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Impact of Household Income on Poverty Levels

Name: 1. Use the data from the following table to answer the questions that follow: (10 points)

Two-stage least squares examples. Angrist: Vietnam Draft Lottery Men, Cohorts. Vietnam era service

Final Exam, section 1. Thursday, May hour, 30 minutes

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

ECON Introductory Econometrics Seminar 2, 2015

Time series data: Part 2

The Multivariate Regression Model

Chapter 4 Level of Volatility in the Indian Stock Market

The model is estimated including a fixed effect for each family (u i ). The estimated model was:

Stat 328, Summer 2005

Technical Documentation for Household Demographics Projection

Relation between Income Inequality and Economic Growth

Trade Imbalance and Entrepreneurial Activity: A Quantitative Panel Data Analysis

Jet Fuel-Heating Oil Futures Cross Hedging -Classroom Applications Using Bloomberg Terminal

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Testing the Solow Growth Theory

Does Globalization Improve Quality of Life?

The Simple Regression Model

(ii) Give the name of the California website used to find the various insurance plans offered under the Affordable care Act (Obamacare).

Example 7.1: Hourly Wage Equation Average wage for women

The Simple Regression Model

May 9, Please put ONLY your ID number on the blue books. Three (3) points will be deducted for each time your name appears in a blue book.

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1

Cross- Country Effects of Inflation on National Savings

Problem max points points scored Total 120. Do all 6 problems.

Example 8.1: Log Wage Equation with Heteroscedasticity-Robust Standard Errors

CHAPTER III METHODOLOGY

Logistic Regression Analysis

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

Model fit assessment via marginal model plots

Home Energy Reporting Program Evaluation Report. June 8, 2015

An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program. on the United Methodist Church in Texas

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

PhD Qualifier Examination

CHAPTER 11 Regression with a Binary Dependent Variable. Kazu Matsuda IBEC PHBU 430 Econometrics

. ********** OUTPUT FILE: CARD & KRUEGER (1994)***********.. * STATA 10.0 CODE. * copyright C 2008 by Tito Boeri & Jan van Ours. * "THE ECONOMICS OF

Determinants of FII Inflows:India

Question scores. Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d M ult:choice Points


The impact of cigarette excise taxes on beer consumption

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Don t worry one bit about multicollinearity, because at the end of the day, you're going to be working with a favorite coefficient model.

Impact of Stock Market, Trade and Bank on Economic Growth for Latin American Countries: An Econometrics Approach

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Chapter 8 Statistical Intervals for a Single Sample

Tests for the Difference Between Two Linear Regression Intercepts

Final Exam, section 2. Tuesday, December hour, 30 minutes

General Business 706 Midterm #3 November 25, 1997

Advanced Industrial Organization I Identi cation of Demand Functions

Can Information Change Personal Retirement Savings? Evidence from Social Security Benefits Statement Mailings. Susan Payne Carter William Skimmyhorn

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Final Exam, section 1. Tuesday, December hour, 30 minutes

Limited Dependent Variables

Empirical Methods for Corporate Finance. Panel Data, Fixed Effects, and Standard Errors

Econometric Methods for Valuation Analysis

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

Quasi-Experimental Methods. Technical Track

A Study of the Impact of Social Welfare Policies on Household Saving. Rate in China. Borui Xiao. Advised by. Professor Lakshman Krishmurthi

CHAPTER 4 DATA ANALYSIS Data Hypothesis

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Measuring Impact. Impact Evaluation Methods for Policymakers. Sebastian Martinez. The World Bank

Econ Spring 2016 Section 12

Intro. Econometrics Fall 2015

Catherine De Vries, Spyros Kosmidis & Andreas Murr

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Point-Biserial and Biserial Correlations

Transcription:

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No (Your online answer will be used to verify your response.) Directions There are two parts to the final exam. Everyone must take the first part of the exam which includes 20 questions. Students who elected to take the optional portion of the final exam are required to complete the second part. For those who elected to take the optional portion of the final exam, your second midterm score will be recalculated as the average of your original score and what you score on the optional portion of the exam. All questions are worth 4 points each unless indicated otherwise. Place all answers in the space provided below or within each question. Round all numerical answers to the nearest 100 th (e.g. 1.23) unless told otherwise. The formula sheet and tables with the standard normal CDF, the F- and Chi-squared distributions will be provided as a separate document. 1

Using data from the 2015 Current Population Survey, I estimated a regression of workers hourly wages as a function of their age, years of schooling, and a dummy variable indicating whether the person is female.. reg hrwage age age2 _school female Source SS df MS Number of obs = 97,330 F(4, 97325) = 6357.59 Model 1897494.59 4 474373.646 Prob > F = 0.0000 Residual 7261940.53 97,325 74.6153663 R-squared = 0.2072 Adj R-squared = 0.2071 Total 9159435.11 97,329 94.1079751 Root MSE = 8.638 hrwage Coef. Std. Err. t P> t [95% Conf. Interval] age.7363723.0105456 69.83 0.000.7157031.7570416 age2 -.0069645.0001226-56.83 0.000 -.0072047 -.0067243 _school 1.336863.0119937 111.46 0.000 1.313355 1.36037 female -2.929198.0557887-52.51 0.000-3.038544-2.819853 _cons -16.33226.2480879-65.83 0.000-16.81851-15.84601. predict uhat, res. gen uhat2=uhat^2. reg uhat2 age age2 _school female Source SS df MS Number of obs = 97,330 F(4, 97325) = 1052.66 Model 275992027 4 68998006.8 Prob > F = 0.0000 Residual 6.3793e+09 97,325 65546.5201 R-squared = 0.0415 Adj R-squared = 0.0414 Total 6.6553e+09 97,329 68379.487 Root MSE = 256.02 uhat2 Coef. Std. Err. t P> t [95% Conf. Interval] age 4.307184.3125586 13.78 0.000 3.694572 4.919795 age2 -.0314069.0036323-8.65 0.000 -.0385261 -.0242877 _school 18.99781.3554801 53.44 0.000 18.30108 19.69455 female -28.98642 1.653511-17.53 0.000-32.22729-25.74556 _cons -274.5837 7.353029-37.34 0.000-288.9955-260.1718 1. Based on the above regressions, the value of the Chi-squared test statistic for the null hypothesis that the residuals in the loan equation are homoscedastic is 4039.195 and the test statistic has a Chi-squared distribution with 4 degrees of freedom. 2. The null hypothesis that the residuals in the loan equation are homoskedastic is rejected at the.05 level if the chi-squared statistic is (greater, less) than 9.49. 3. (6 points) If you wanted to perform the simple form of the White test for heteroskedasticity for the above wage equation, a. what regression command would you execute in Stata? (use the variable names defined above to be sure there is no ambiguity in your answer.) reg uhat2 yhat yhat2 where yhat and yhat2 are the predicted hourly wage and its square from the hrwage regression. 2

b. What test statement would you issue after the above regression? test yhat=yhat2=0 What would be the distribution of this test statistic (e.g. F-distribution, chi-squared, normal? how many degrees of freedom)? F(2,97227) 4. Based on the above regressions, what age maximizes the variance of the residuals (round your answer to the nearest year of age)? 4.307/(2*.0314)=68.58 5. If the null hypothesis of homoscedasticity is rejected, a. The standard OLS estimates of the coefficients remain unbiased b. The standard OLS estimates of the standard errors for the coefficients are incorrect c. The standard OLS estimates of the coefficients are no longer efficient d. All of the above e. Only b and c 6. Suppose you have data on total employment by county and you estimate the following regression: emp i = α 0 + α 1 pop i + α 2 (pop i age i ) + e i where the subscript I indexes the county emp represents total employment in the county, pop represents the total population, and age is the average age of people living in the county. The residuals (e)are likely to be a. Heteroskedastic and their variance will be greater in counties with larger populations. b. Homoskedastic and their variance will be greater counties with larger populations c. Heteroskedastic and their variance will be greater in counties with smaller populations. d. Homoskedastic and their variance will be greater counties with smaller populations 7. If the fraction of the population that works is lower among older workers, but employment always increases with population, we would expect a. α 1 + α 2 > 0 b. α 2 < 0 c. Both of the above d. None of the above. 3

To answer the next several questions, consider the following data drawn from an article by Li, Yi, and Zhang (2011). 1 The article was aimed at studying whether introduction of China s one-child policy caused the male-female sex ratio in China to increase. The one-child policy started in 1980, but applied to the Han Chinese but not minorities. The table below provides the fraction of new-borns that are male for the Han and Minorities. The pre-policy period is 1973-1979; the post-policy period (i.e. after the one-child policy was implemented) is 1980-1990. Birth cohort Han Minorities Sample size 1973-1979 51.53 51.49 1,521,563 1980-1990 52.34 51.24 2,334,926 8. (6 points) What is the diff-in-diff estimate of the effect of the one-child policy on the fraction of births that are male? Be sure to indicate whether the one-child policy increased or decreased the fraction of births that are male. (52.34-51.53) -(51.24-51.49)- = 1.06 (i.e. one child policy increased percent male by 1.06 percentage points). 9. Some commentators suggested that an outbreak of hepatitis B in the 1980s may have caused the percentage of male births in China to increase. a. (4 points) Assuming that hepatitis B does increase the probability of male births, under what conditions would the outbreak of hepatitis B cause NO bias in the diff-in-diff estimate of the impact of the one-child policy on male births? No bias would be created if hepatitis B caused the same increase in the probability of male births for the Han and minorities since such effects would be removed by the Diffin-diff estimator. b. (4 points) Assuming that hepatitis B does increase the probability of male births, under what conditions would the outbreak of hepatitis B cause an UPWARD bias in the diff-indiff estimate of the impact of the one-child policy on male births? It would cause an upward bias in the diff-in-diff estimate if hepatitis B caused a larger increase in the probability of male births among the Han (the treated group). 1 Li, Hongbin, Junjian Yi, and Junsen Zhang. "Estimating the effect of the one-child policy on the sex ratio imbalance in China: identification based on the difference-in-differences." Demography 48, no. 4 (2011): 1535-1557. 4

10. (6 points) The authors used a linear probability model to estimate their diff-in-diff equation. The dependent variable was a dummy that equals one if a birth was male; 0 if it was female. They had data on nearly 4 million births spanning from 1973-1990. Label the dummy indicating male child as M i. Write out the regression model that you would estimate to obtain the diff-in-diff estimate. Be sure to define any variables that are included in your regression and be sure to point out which coefficient is the estimate of the diff-in-diff effect of the one-child policy on the probability that a child is male. M i = α 0 + α 1 After i + α 2 Han i + α 3 (Han i after i ) + u i where After i is a dummy that equals one for any birth after 1980; and Han i is a dummy that equals one for any Han birth. α 3 is the diff-in-diff estimate of the effect of the one child policy on the probability that the child is male. 11. (6 points) If you estimate the above linear probability model, why shouldn t you use the standard OLS estimates for the standard errors? What should you do instead? You shouldn t use the standard OLS estimates for the standard errors because the LPM has heteroscedasticity built into the error term. The variance of the residual is p i(1-p i) where p i is the probability of a yes (i.e. male child). The model should either use robust standard errors, or weight least squares for estimation. WLS should use weights of 1/p i(1-p i) if all of the predicted probabilities lie within the unit interval. 5

You must answer 5 of the last 6 questions. Each question is worth 6 points. Write SKIPPED in the one question you do no answer. If you answer all 6, I will grade the first 5. A financial services company is interested in understanding whether a financial education seminar increases the amount that people save. To address this question, they collect data on the percentage of workers earnings that they contribute to their 401k pension plan along with information about the workers. The regression they estimate is: (1) S i = β 0 + β 1 sem i + β 2 age i + β 3 fem i + β 4 educ i + e i where S i is the percentage of the worker s income that is saved (i.e. put into the 401k plan), sem is a dummy variable indicating whether the worker attended the seminar, fem is a dummy variable indicating whether the person is female, and educ is the worker s number of years of formal schooling. In considering the questions below, assume that each worker in the sample has the opportunity to attend the seminar, but the choice to attend is voluntary. 12. (6 points) Explain why a simple OLS regression would likely suffer from an endogeneity problem that would cause a biased estimate of β 1. Be sure to discuss whether you believe the bias will be positive or negative and why. Since people are allowed to decide whether to attend the seminar, it is likely cov(sem i, e i ) 0. For example, if people who have unobserved characteristics that cause them to save more are more likely to attend the seminar, the cov(sem i, e i ) > 0 and the estimated effect of the seminar on saving (β 1 ) will be biased upward. 13. (6 points) Suppose that the financial planning seminars were held on weekends and workers would have to travel from home to attend. If you have data for each worker on the distance they would have to travel (D i), would this be an appropriate instrument for attendance at the seminar? Be sure that you precisely define the necessary conditions for distance to be an appropriate instrument and whether these properties are likely to be satisfied in this case. The two necessary conditions are that: (i)cov(d i, Sem i ) 0 and (ii)cov(d i, e i ) = 0 We expect (i) to be true because those who live further away will be less likely to attend the seminar. We expect (ii) to be true unless there is some reason to believe that the decision of how far to live away from the seminar location is systematically related to their saving preferences. 6

14. (6 points) Assuming distance is an appropriate instrument for seminar attendance, describe the 2 steps of the two stage least squares process that you would use to estimate the regression in (1). Use the variable names already defined in describing the two stage process. Step 1: Estimate the following reduced form regression for seminar attendance Sem i = π 0 + π 1 age i + π 2 fem i + π 3 educ i + π 4 D i + v i Step 2: Estimate the original structural form model in (1) after replacing the endogenous variable with its predicted va S i = β 0 + β 1 sem i + β 2 age i + β 3 fem i + β 4 educ i + e i (6 points) Suppose that you have panel data on worker saving that includes years before and after the financial planning seminars are offered. Explain how the use of panel data and worker specific fixed effects model could eliminate the endogeneity bias in OLS estimation of the savings equation in (1). Reconsider the original regression in (1) as the panel model below where the subscript i indexes the person and t indexes the time period. (1a) S it = β 0 + β 1 sem it + β 2 age it + β 3 fem it + β 4 educ it + a i + e it Suppose that the unobserved savings preferences of an individual can be captured in an individual fixed effect (a i ) that does not change over time. Panel data allows us to difference-out this effect and eliminate any of the endogeneity bias that was caused by cov(sem it, a i ) 0. That is, with panel data that includes fixed effects for the individual, we can estimate (1a) as (1a) S it = β 0 + β 1 sem it + β 2 age it + β 4 educ it + e it where the * superscript indicates that the variable is measured as a deviation from i-specific (person specific) means over the panel. Notice that taking deviations from means cause a i to be differenced out of the regression and thus any bias caused by cov(sem it, a i ) 0 in OLS will now be eliminated. 7

15. If you estimate the savings equation in (1) with panel data and fixed effects for each worker, can you still include the female dummy variable in your regression? Why or why not? The female dummy can no longer be included because deviations from individual specific means would always be zero for time-invariant variables. 8