Final Exam, section 1. Thursday, May hour, 30 minutes

San Francisco State University Michael Bar ECON 312 Spring 2018 Final Exam, section 1 Thursday, May 17 1 hour, 30 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can use one double-sided sheet of paper, letter size (8½ 11 in or 215.9 279.4 mm), with any content you want. 3. No calculators of any kind are allowed. 4. Show all the calculations, and explain your steps. 5. If you need more space, use the back of the page. 6. Fully label all graphs. Good Luck

1. (50 points). Ailee studies discrimination in the labor market based on looks. She and her team interviewed 1260 workers, assigned them each of them a score of good looks from 1 to 5 (best looking is 5), and recorded the following variables: wage - hourly wage (in $) educ - years of schooling exper - years of workforce experience expersq - exper 2 (experience squared) female - dummy (=1 if female) black - dummy (=1 if black) belavg - dummy (=1 if looks < 3, i.e. below average looks) union - dummy (=1 if union member) Ailee estimated two models, and the results are reported in the next table. The numbers in parentheses are 95% confidence intervals. Dependent variable: wage log(wage) (1) (2) Constant -1.0223 (-2.4116, 0.3669) 0.4913 *** (0.3344, 0.6482) educ 0.4262 *** (0.3347, 0.5176) 0.0684 *** (0.0581, 0.0788) exper 0.2508 *** (0.1755, 0.3262) 0.0392 *** (0.0307, 0.0477) expersq -0.0039 *** (-0.0056, -0.0022) -0.0006 *** (-0.0008, -0.0004) female -2.4415 *** (-2.9457, -1.9373) -0.4347 *** (-0.4916, -0.3777) black 0.1148 (-0.7814, 1.0109) -0.0627 (-0.1639, 0.0385) belavg -0.8934 ** (-1.5977, -0.1892) -0.1439 *** (-0.2234, -0.0643) union 0.7109 *** (0.1886, 1.2332) 0.1891 *** (0.1301, 0.2481) Observations 1,260 1,260 R 2 0.2092 0.3798 Adjusted R 2 0.2048 0.3763 F Statistic (df = 7; 1252) 47.3110 *** 109.5361 *** Note: * p<0.1; ** p<0.05; *** p<0.01 a. Which of the two models fits the data better, model 1 or model 2? Explain how you reached this conclusion. Model 2 has better fit, based on higher Adjusted R 2. Model 2 explains nearly 38% of the variation in the dependent variable, while model 1 only 20%. 1

b. Demonstrate how you would use the fitted equation from model 2 to predict the hourly wage of a white male with 16 years of education, 10 years of experience, average or above looks and is a union member. No need to calculate the final number, just write the correct equation and substitute the correct values. wwwwwwww = exp bb 1 + bb 2 eeeeeeee + bb 3 eeeeppeeee + bb 4 eeeeeeeeeeeeee + bb 5 ffffffffffff +bb 6 bbbbbbbbbb + bb 7 bbbbbbbbbbbb + bb 8 union = exp (0.4913 + 0.0684 16 + 0.0392 10 0.0006 10 2 + 0.1891) We exponentiate because the dependent variable in model 2 is log(wage). c. Interpret the estimated coefficient on belavg in model 1. bb 7 = 0.8934 means that workers with below average looks earn about 89 cents an hour less than workers with average or better looks, holding all other regressors the same (i.e. education, experience, race, gender, union membership). d. Interpret the estimated coefficient on belavg in model 2. bb 7 = 0.1439 means that workers with below average looks earn about 14% an hour less than workers with average or better looks, holding all other regressors the same (i.e. education, experience, race, gender, union membership). 2

e. Interpret the estimated coefficients on exper and expersq (jointly). Offer possible explanation for why these estimated results make sense. bb 3 > 0 and bb 4 < 0 mean that hourly wage is increasing in experience up to certain level, and decreasing beyond that level, holding all other regressors constant. These results make sense because people with a lot of experience are also older, and may be less productive due to poor health and outdated skills. f. Interpret the estimated coefficient on union in model 1. bb 8 = 0.7109 means that union members earn about 71 cents per hour more than non-union workers, with same characteristics (education, experience, gender, race, looks). g. Suppose that Ailee wants to test whether black workers earnings are different from earnings of non-black workers. Write the null and alternative hypotheses for her test. HH 0 : ββ 6 = 0 HH 1 : ββ 6 0 3

h. Based on the reported confidence interval for model 2, what is your conclusion about the test in the last section? Explain your answer. The 95% confidence interval for ββ 6 is (-0.1639, 0.0385), contains all the null values of ββ 6 which cannot be rejected at significance level of αα = 5% against a two-sided alternative. Since the reported confidence interval contains 0, we fail to reject the null hypotheses at significance level αα = 5%. We conclude that black workers earnings do not differ from non-black, with same characteristics (education, experience, gender, looks, and union membership). i. Explain the meaning of the *** next to the estimated coefficient on female, and write the conclusion for the underlying hypothesis test. The *** means that p-value for the two-sided test is less than 1%. HH 0 : ββ 5 = 0 HH 1 : ββ 5 0 Thus, we reject the null hypothesis at any significance level of 1% or higher. We conclude that female earnings are different from those of male workers, with same characteristics (education, experience, race, looks, and union membership). j. Suppose Ailee wants to investigate whether the gender earnings gap is the same for black and non-black workers. How would Ailee need to change her models in order to be able to estimate such differential earnings gap? Ailee would need to add an interaction term female*black as additional regressor to her models. 4

2. (20 points). Linda is investigating the chances of surviving the Titanic crash for different passengers. She collected data on 2201 passengers and crew, with the following characteristics: Survived = {Yes, No} Class = {1st, 2nd, 3rd, Crew}, 1 st class is the most luxurious Sex = {Male, Female} Age = {Child, Adult} Linda estimated the linear probability model, probit and logit, and her results (marginal effects) are given in the next table (estimated standard errors are in parentheses): Predicting Probability of Surviving the Titanic Crash OLS Probit Logit (Intercept) 0.4081 *** (0.0432) Class1st 0.1756 *** 0.2029 *** 0.1965 *** (0.0280) (0.0366) (0.0381) Class2nd -0.0105-0.0310-0.0328 (0.0288) (0.0340) (0.0346) Class3rd -0.1312 *** -0.1623 *** -0.1782 *** (0.0216) (0.0253) (0.0260) Female 0.4907 *** 0.5298 *** 0.5381 *** (0.0230) (0.0254) (0.0259) Adult -0.1813 *** -0.2219 *** -0.2510 *** (0.0410) (0.0560) (0.0600) R 2 0.2529 Adj. R 2 0.2512 Num. obs. 2201 2201 2201 F statistic 148.6389 Pseudo R 2 0.2011 0.2020 *** p < 0.001, ** p < 0.01, * p < 0.05 Crew a. What is the reference (omitted) category for Class? 5

b. Interpret the estimated marginal effect of Female in the logit model. mmmmmm(ffffffffffff) = 0.5381 means that female passengers have nearly 54% higher chances of surviving the Titanic crash than male, holding all the other characteristics at their sample averages. c. Interpret the estimated marginal effect of Adult in the logit model. mmmmmm(aaaaaaaaaa) = 0.251 means that adult passengers have 25% lower chances of surviving the Titanic crash than children, holding all the other characteristics at their sample averages. d. Interpret the estimated marginal effect of 1st class in the probit model. mmmmmm(1ssss) = 0.2029 means that passengers in 1 st class had about 20% higher chances of surviving the Titanic crash, then crewmembers, holding all the other characteristics at their sample averages. 6

3. (10 points). The following diagrams show plots of residuals against one of the regressors. a. Which of the above panels are likely to be generated from heteroscedastic models? Circle from the following list: a, b, c, d. Notice that residuals, which are estimates of error terms, are not evenly spread around zero in all panels. b. Which of the above cases of heteroscedasticity is the Goldfeld-Quandt likely to miss (not detect)? Circle from the following list: a, b, c, d. G-Q will most likely miss panels c and d, because the test is based on comparing residual sum of squares in the left part and the right part of the panel, and these look symmetric. 7

4. (10 points). Consider two models of earnings, with regressors S years of schooling, and EXP years of experience. [true model]: EEEEEEEEEEEEEESS ii = ββ 1 + ββ 2 SS ii + ββ 3 EEEEPP ii + uu ii [misspecified model]: EEEEEEEEEEEEEESS ii = ββ 1 + ββ 2 SS ii + uu ii a. Suppose that researcher estimates the misspecified model, and obtains an estimate bb 2 of ββ 2. Is bb 2 likely to overestimate or underestimate the true parameter ββ 2? Prove your answer. We proved in class that the bias from omitting relevant variable, such as experience, is: bbbbbbbb bb 2 = ββ 3 dd 2 where dd 2 is the OLS estimator of δδ 2 in the regression of omitted variable on included variable EEEEPP ii = δδ 1 + δδ 2 SS ii + vv ii Thus, if impact of schooling on experience is negative, i.e. dd 2 < 0, and the impact of experience on earnings is positive, i.e. ββ 3 > 0, we have bbbbbbbb bb 2 = ββ 3 < 0 Thus, bb 2 is likely to underestimate ββ 2. >0 dd 2 <0 b. Provide economic intuition for the result in the previous section. When we estimate the true model, the estimate bb 2 of ββ 2 measures the impact of additional year of schooling on earnings, holding experience fixed. This is the net effect of schooling on earnings. However, when we estimate the misspecified model, the estimator bb 2 of ββ 2 measure the net effect of schooling on earnings, and the indirect, negative effect of schooling on earnings through lowering experience: SS EEEEEE EEEEEEEEEEEEEEEE. This second, negative effect of schooling on earnings, biases the estimated coefficient on schooling downwards. 8

5. (10 points). The next figure plots the autocorrelation function of residuals from some regression model with time series data, for lags kk = 1,2,, 40. The dashed lines give the 95% confidence bands for the autocorrelation. a. Based on the above graph, what is your conclusion about the presence of serial correlation? Circle the correct answer. i. There is no evidence of serial correlation. ii. There is evidence of serial correlation at lags kk = 2, 4. iii. There is evidence of serial correlation at all lags kk = 1,2,, 40 iv. None of the above. b. Suppose that you are estimating regression model using time series data, and some of the variables contain trends. Briefly describe one solution to avoid the problem of spurious regression. i. Detrending (removing the trend from variables) before using them in regression. ii. Normalizing expressing the variables in terms of ratios of the original variable to some other key variable. For example, CC tt CC tt =, or CC GGGGPP tt = dddddd tt tt GGGGPP tt are normalized consumption and normalized deficit, both expressed as a fraction of GDP. iii. Including time as a regressor. 9