Final Exam, section 1. Tuesday, December hour, 30 minutes

San Francisco State University Michael Bar ECON 312 Fall 2018 Final Exam, section 1 Tuesday, December 18 1 hour, 30 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can use one double-sided sheet of paper, letter size (8½ 11 in or 215.9 279.4 mm), with any content you want. 3. No calculators of any kind are allowed. 4. Show all the calculations, and explain your steps. 5. If you need more space, use the back of the page. 6. Fully label all graphs. Good Luck

1. (40 points). Jennifer is close to completing her bachelor s degree in economics, and she considers pursuing a master s degree. For her ECON 690 project she collected a sample of workers with either bachelor s degree or master s degree, with the following variables: inc - respondent s total annual income (in $) afqt - percentile score on U.S. Armed Forces Qualifying Exam female - dummy (=1 if female, 0 otherwise) black - dummy (=1 if black, 0 otherwise) ma - dummy (=1 if highest level of education is master s degree, 0 if highest level of education is bachelor s degree) Jennifer estimated two models, and her results are reported in the next table. The numbers in parentheses are 95% confidence intervals. Dependent variable: log(inc) model1 model2 Constant 10.376 *** (10.234, 10.519) 10.374 *** (10.231, 10.516) afqt 0.005 *** (0.004, 0.007) 0.006 *** (0.004, 0.007) female -0.315 *** (-0.388, -0.242) -0.318 *** (-0.395, -0.241) black 0.010 (-0.094, 0.115) -0.005 (-0.112, 0.102) ma 0.104 * (-0.017, 0.226) 0.046 (-0.124, 0.216) female:ma 0.054 (-0.189, 0.298) black:ma 0.296 (-0.087, 0.680) Observations 848 848 R 2 0.142 0.145 Adjusted R 2 0.138 0.139 Residual Std. Error 0.534 (df = 843) 0.534 (df = 841) F Statistic 34.928 *** (df = 4; 843) 23.699 *** (df = 6; 841) Note: * p<0.1; ** p<0.05; *** p<0.01 a. Demonstrate how you would use the fitted equation from model1 to predict the total income of a Hispanic female with master s degree, who scored at the 95 th percentile on her Armed Forces Qualifying Exam (afqt = 95). No need to calculate the final number, just write the fitted equation and substitute the values. ıııııı = exp(bb 1 + bb 2 ssssssss + bb 3 ffffffffffff + bb 4 bbbbbbbbbb + bb 5 mmmm) = exp(10.375 + 0.005 95 0.315 1 + 0.01 0 + 0.104 1) = exp (10.375 + 0.005 95 0.315 + 0.104) We exponentiate because the dependent variable is log(inc). 1

b. Interpret the estimated coefficient on ma in model1. bb 5 = 0.104 means that workers with master s degree are earning 10.4% more in annual income than workers with bachelor s degree, holding all other regressors the same (i.e. gender, race, and score on the Armed Forces Qualifying Exam). c. Interpret the estimated coefficient on female in model1. bb 3 = 0.315 means that female workers annual income is approximately 31.5% lower than that of male workers, holding all other regressors the same (i.e. race, education level, and score on the Armed Forces Qualifying Exam). d. Suppose that Jennifer wants to test whether black workers income is different from income of non-black workers. Write the null and alternative hypotheses for her test, based on model1. HH 0 : ββ 4 = 0 HH 1 : ββ 4 0 2

e. Interpret the estimated coefficient on female:ma in model2. bb 6 = 0.054 is the difference between the benefit from master s degree for female and for male. That is, female workers income benefits from master s degree is 5.4% more than male workers, holding other regressors fixed (race, and score on the Armed Forces Qualifying Exam). Steps. bbbbbbbbbbbbtt ffffffffffff,mmmm = ıııııı ffffffffffff,mmmm ıııııı ffffffffffff,bbbbbbh = bb 3 + bb 6 bbbbbbbbbbbbtt mmmmmmmm,mmmm = ıııııı mmmmmmmm,mmmm ıııııı mmmmmmmm,bbbbbbh = bb 3 Thus, the difference between the benefit of female and male is: bbbbbbbbbbbbbb ffffffffffff,mmmm bbbbbbbbbbbbbb mmmmmmmm,mmmm = bb 6 f. Suppose that Jennifer wants to test whether black workers benefits from master s degree are different from benefits of non-black workers from master s degree. Write the null and alternative hypotheses for her test, based on model2. HH 0 : ββ 7 = 0 HH 1 : ββ 7 0 g. Based on the reported confidence intervals, what is your conclusion about the test in the last section? Explain your answer. The 95% confidence interval for ββ 7 is (-0.087, 0.680), contains all the null values of ββ 7 which cannot be rejected at significance level of αα = 5% against a two-sided alternative. Since the reported confidence interval contains 0, we fail to reject the null hypotheses at significance level αα = 5%. We conclude that black workers benefits from master s degree are NOT different from benefits of non-black workers from master s degree. 3

h. Suppose that Jennifer wants to test whether income of black workers with master s degree is higher than the income of black workers with bachelor s degree. Write the null and alternative hypotheses for her test, based on model2. HH 0 : ββ 5 + ββ 7 = 0 HH 1 : ββ 5 + ββ 7 > 0 2. (5 points). Suppose you estimated a regression model using cross-sectional data. Your model has good overall fit, but individual coefficients are insignificant and have unreasonable magnitudes. Your model likely suffers from (circle the correct answer): a. Autocorrelation b. Spurious regression c. Heteroscedasticity d. Omitted variable bias e. Imperfect multicollinearity 3. (5 points). Suppose that you are working on a project and you found dozens of data variables. You decided to include all the variables in your regression model, because you strongly believe in more is better. What is the likely adverse consequence of your approach (circle the correct answer)? a. Biased OLS estimators b. Inefficient OLS estimators c. Inconsistent OLS estimators d. Robust standard errors e. None of the above 4

4. (20 points). Simone serves as an expert witness in a discrimination lawsuit against a major mortgage lending company. She collected data on 2,380 loan applications from that company, with the following variables: deny = 1 if mortgage application was denied, 0 otherwise black = 1 if applicant is black, 0 in non-black dir ratio of debt payments to total income of applicant, in % lvr ratio of loan amount to value of property, in % cs credit score (in points, higher value is better) dmi = 1 if applicant was denied mortgage insurance, 0 otherwise Simone estimated the probit and logit models, and her results (marginal effects) are given in the next table. Estimated standard errors are in parentheses, and the constant is omitted: Dependent variable: deny Probit mfx Logit mfx black 0.0852 *** (0.0215) 0.0738 *** (0.0197) dir 0.0039 *** (0.0006) 0.0036 *** (0.0006) lvr 0.0013 *** (0.0004) 0.0014 *** (0.0004) cs -0.0299 *** (0.0031) -0.0262 *** (0.0027) dmi 0.7825 *** (0.0605) 0.8043 *** (0.0585) Pseudo R 2 0.2345 0.2368 p-value 0 0 Observations 2,381 2,381 Log Likelihood -667.7190-665.6904 Akaike Inf. Crit. 1,347.4380 1,343.3810 Note: * p<0.1; ** p<0.05; *** p<0.01 a. Interpret the estimated marginal effect of black in the probit model. mmmmmm(bbbbbbbbbb) = 0.0852, means that black applicants are 8.52% more likely to be denied a mortgage, than non-black applicants, holding all other regressors (mortgage characteristics) at their sample means values. 5

b. Suppose Simone wants to test statistically whether the lending company discriminates against black applicants. Write the null and alternative hypotheses of this test. Let ββ 2 be the unknown marginal effect on black. The test is therefore: HH 0 : ββ 2 = 0 HH 1 : ββ 2 > 0 Remark: If black applicants are being discriminated, then their chances of being denied a mortgage are higher, i.e. this is upper-tail test. c. Interpret the estimated marginal effect of dir in the probit model. mmmmmm(dddddd) = 0.0039 means that a 1% increase in the ratio of debt payments to applicant s income, increases the chances of mortgage application denial by 0.39%, holding all regressors at the sample average values. d. Interpret the estimated marginal effect of cs class in the probit model. mmmmmm(cccc) = 0.0299 means that a 1 point increase in applicant s credit score, lowers the chances of mortgage application denial by 0.3%, holding all regressors at the sample average values. 6

5. (10 points). Suppose that you estimated a regression model using OLS, and the plot of residuals against the fitted values looks like the next figure. Heteroscedasticity. a. (3 points). What kind of econometric problem your model likely suffers from? b. (4 points). What are the consequences of the problem in the previous section? OLS estimators are inefficient Estimated standard errors are biased, and therefore statistical hypotheses tests are invalid. c. (3 points). Propose a practical solution to the problem you identified in section a. The most practical solution to compute and report robust standard errors (in R, using the sandwich package). 7

6. (10 points). Suppose that Kevin estimated two models, and his fitted equations are: EEEEEEEEEEEEEEEE = bb 1 + 3SS + 2EEEEEE EEEEEE = dd 1 0.2SS Where SS is schooling and EEEEEE is experience. Dray is another researcher who estimated the following model: EEEEEEEEEEEEEEEE = bb 1 + bb 2 SS a. (3 points). Suppose that Kevin s model is the correct one. What is the econometric problem in Dray s model? Omitted variable bias. b. (4 points). What are the likely consequences of the problem in the previous section? i. Biased and inconsistent estimator of the coefficient on the schooling, ii. Biased standard errors of estimators, which makes all statistical tests invalid. c. (3 points). What would be the value of Dray s estimated coefficient on schooling, bb 2? bb 2 = bb 2 + bb 3 dd 2 = 3 + 2 ( 0.2) = 2.6 8

7. (10 points). Suppose that you estimated a time series model, and plot of the autocorrelation function looks like the next figure. a. (3 points). What kind of econometric problem your model is likely suffers from? Autocorrelation (or serial correlation). b. (4 points). What are the consequences of the problem in the previous section? OLS estimators are inefficient Estimated standard errors are biased, and therefore statistical hypotheses tests are invalid. c. (3 points). Propose one practical solution to the problem you identified in section a. The most practical solution is to compute and report robust standard errors (in R, using the sandwich package). 9