Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

1. I estimated a multinomial logit model of employment behavior using data from the 2006 Current Population Survey. The three possible outcomes for a person are employed (outcome=1), unemployed (outcome=2) and out of the labor force (outcome=3). The coefficients for outcomes 2 and 3 are presented below. The coefficients for outcome 1 are normalized to zero. VARIABLES unemployed Out of labor force female 0.0575 0.677 (2.647) (74.59) age -0.129-0.305 (-33.56) (-211.3) Age-squared 0.00122 0.00379 (25.35) (225.6) # of kids aged 0-5 0.00907 0.181 (0.490) (22.78) # of kids aged 6-17 0.0711 0.199 (6.557) (42.17) Constant -0.309 3.711 (-4.467) (132.5) Observations 325458 325458 a. compute the probability that a 40 year old male with no kids is i. employed ii. unemployed b. After estimating the above multinomial logit model, I executed the following stata commands and received the output listed below:

. mfx, predict(p outcome(2)) Marginal effects after mlogit y = Pr(emp==2) (predict, p outcome(2)) =.02700302. mfx, predict(p outcome(2)) variable dy/dx Std. Err. z female* -.0048006.00056-8.50 age -.0005028.0001-4.94 age2-3.61e-06.00000-2.94 #kids<5 -.0014705.00048-3.05 #kids 6-17 -.0000104.00028-0.04 (*) dy/dx is for discrete change of dummy variablefrom 0 to 1 (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx, predict(p outcome(3)) Marginal effects after mlogit y = Pr(emp==3) (predict, p outcome(3)) =.34939412 variable dy/dx Std. Err. z female*.1515857.00198 76.51 age -.0681668.00034-198.50 age2.0008495.00000 209.74 #kids<5.0410849.0018 22.88 #kids 6-17.0446071.00106 41.91 Use the above results to compute the effect of having an additional child under the age of 5 on the probability that a person is employed. Show how you derived your estimate. d. Suppose you wish to test that children have different effects on employment behavior of men and women. Explain how you could test this hypothesis. Define the variables you would construct, the model(s) you would estimate, how you would construct your test statistic, the distribution of test statistics, and how you would decide whether to reject the null hypothesis.

2. Using data from IRS Form 5500 filings by U.S. pension plans, I estimated a model of contributions to pension plans as ln(1 + c i ) = α 0 + U i α 1 + PD i α 2 + e i Where the subscript i indicates the pension plan, c is employer contributions per participant, U is a dummy that equals one if the plan is a union plan, and PD is a dummy that equals if the plan is participant directed (meaning that the employee decides how to invest the money). Note that employer contributions can equal zero, since some pension plans are funded entirely by employee contributions. I estimated a Tobit and OLS version of the model. The results are below. Sigma represents the standard error of the residual. Standard error of coefficients is presented in parentheses below each coefficient. Variable OLS Tobit Union -0.0626*** -0.0676*** (0.012) (0.014) Participant Directed -0.307*** -0.311*** (0.0068) (0.0080) Constant 5.894*** 5.684*** (0.0057) (0.0067) Sigma 2.7434 3.226*** Observations 744615 744615 a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation. b. Use the tobit model to make the following predictions for a participant directed non-union plan. Give a brief outline of each calculation and be sure to account for the fact that the dependent variable is ln(1+c), not c. i. employer contributions per capita ii. employer contributions per capita conditional on non-zero contributions being made iii. the predicted probability that no contribution is made by the employer iv. the predicted probability that per capita employer contributions are between $1000 and $2000 ( is this a bit like an ordinal probit question??) c. Use the tobit model to predict the effect of a switch to participant direction on per capita contributions. Give a brief outline of your calculation. d. Explain how you could test whether the effect of unionism on employer contributions has been constant over time (there are 18 years of data in the sample). Describe the models you would estimate, how you form the appropriate test statistic, the distribution of the test statistic (including degrees of freedom), and the conditions under which you would reject the null hypothesis.

3. Using data from the March 2011 CPS, I estimated an OLS model explaining the number of children living in a household with an adult woman present as a function of the woman s age (and its square), her years of education, and her marital status. The never married dummy variable is omitted and the sample is restricted to women aged 21-50. Table 1. OLS and Tobit estimates of determinants of number of children living in households. (t-statistics are in parentheses) VARIABLES OLS Tobit Age 0.0311 0.0292 (6.270) (4.257) Age2-0.000558-0.000594 (-7.953) (-6.104) Years of education -0.0637-0.0860 (-29.71) (-29.04) married 0.581 0.905 (40.65) (44.58) divorced 0.169 0.330 (8.536) (11.83) Constant 3.279 3.806 (30.39) (25.40) Sigma 1.202 1.598 Observations 49,104 49,104 R-squared 0.056 a. Compare the OLS and Tobit coefficient estimates. What pattern do you observe? Why should you expect this pattern? Provide a rationale for the direction of the bias in OLS. b. Use the tobit model to predict each of the following for a 30 year old married woman with 12 years of education. Provide a brief outline of how you computed your answer. i. the expected number of children ii. the expected number of children, conditional on having more than 0 children. iii. the expected number of children, conditional on having more than 2 children. iii. the probability of having 4 or more children. c. What is the marginal effect of another year of education (compared to never married) on the expected number of children for the person described in (b)? Provide a brief outline of how you computed your answer for i. OLS model ii. Tobit model d. How will the relative size of the OLS and Tobit estimates compare for married vs never married workers? Just provide qualitative comparison no numbers required. Justify your answer. e. Suppose that you want to test the hypothesis that the Tobit regression coefficients (NOT just the intercepts) are identical across three racial categories. Explain how you could test this with an LR test. Precisely describe the restricted and unrestricted models, the degrees of freedom for your test statistic, and the conditions under which you would reject the null hypothesis.

4. (20 points) Using data from the March 2008 CPS, I estimate tobit models of annual social security income. The sample includes 62-70 year old men. The only controls that I used in the model are age dummies (62 excluded) and the person s years of education measured as the deviation from the mean. T-statistics are in parentheses. Tobit estimates for Social Security Income Age dummies 63 4869*** (7.364) 64 6652*** (10.23) 65 10176*** (16.01) 66 14532*** (22.20) 67 15570*** (23.11) 68 16819*** (24.70) 69 15992*** (23.86) 70 15991*** (23.04) Years of education -347.6*** (measured as deviation from mean) (-3.135) Constant -4329*** (-8.801) σ (std. deviation of residual) 10849*** (78.26) Observations 5783 Log-likelihood -40784.617

Provide a description of the how you derive your estimates for the questions below. a) Based on the estimated model for men, what is the probability that a 62 year old with the average amount of education would i. Receive no social security income ii. Receive social security income of $5,000-10,000 iii. Receive social security income of more than $10,000 b) For the same man described in (a), what is i. The expected annual social security income? ii. The expected annual social security income conditional on receiving a non-zero income? c) For the man described in (a), what is the marginal effect of turning 63 on his expected Social Security benefit? To answer the next part of this question, you need some background on how Social Security benefits are determined. To be eligible for a Social Security retirement benefit, a person must be at least 62 years old and have contributed into the system for at least 40 quarters (10 years) over one s lifetime. For men, this means that virtually everyone is eligible to collect. The size of the benefit depends on two factors. (1) the person s average social security earnings over the highest 35 years of their career; and (2) when the person files for benefits. The person can file as early as 62 but receives a delayed retirement bonus for every year that they postpone retirement. For example, if a person is eligible for a $10,000 benefit at age 62, they would be entitled to a check of $10,500 if they postpone filing to age 62. Since more educated workers have higher incomes, we would expect that higher levels of education would lead to higher social security benefits for those that have filed. At the same time, research has shown that increased education leads to later retirement dates perhaps because more educated workers are in less physically demanding jobs and thus choose to continue working to a later age. d. Does the tobit model allow for the possibility that increased education will reduce the probability of receiving a benefit, but increase the size of the benefit conditional on receiving nonzero Social Security income (or vice versa)? Justify your answer by explaining what parameter(s) in the tobit model determine the direction of these two effects.

5. As an alternative to the Tobit model, I used a Heckit to estimate the determinants of Social Security benefits. In this model, I treat the decision to file for social security benefits as a sample selection problem. The controls are identical to those used for the Tobit, except that I add marital status and numbe of children living at home as a control in the sample selection equation (but not the Social Security benefit equation). The reference group is age 62 and never married for both the social security and sample selection equation. T-statistics are in parentheses. (1) (2) Social Security Sample Selection (i.e. receive SS benefit) Age dummies 63 159.5 0.380*** (0.225) (5.605) 64-66.29 0.534*** (-0.0798) (7.946) 65 65.45 0.821*** (0.0609) (12.31) 66-1282 1.361*** (-0.848) (18.49) 67-1466 1.524*** (-0.903) (19.27) 68-1753 1.748*** (-0.997) (20.49) 69-2133 1.652*** (-1.249) (20.24) 70-2406 1.679*** (-1.388) (19.45) Years of education (dev. from mean) 793.5*** -0.0855*** (7.490) (-6.513) married 0.229*** (2.800) widowed 0.476*** (4.020) divorced 0.281*** (3.017) # children at home -0.227*** (-5.508) lambda -5648*** (-3.215) Constant 17903*** -0.788*** (8.400) (-8.689) Observations 5783 5783

a. What advantage does the Heckit have over the Tobit? Also, based on the Heckit estimates found here, what evidence is there for or against the underlying assumptions of the Tobit model? b. Suppose that instead of Heckit, I had estimated the model using OLS for the sample of men receiving a Social Security check (i.e. excluding those with zero Social Security income). Would you expect that the estimated effect of education on Social Security benefits would be over- or under-estimated in the OLS model? Justify your prediction. c. For the same person that you used in #1 (62 year old with average education) and assuming he is married with no children at home, predict each of the following and provide a brief outline of how you derived your answers. Provide a brief outline of how you derived these predictions. i. probability of receiving Social Security ii. expected Social Security income iii. expected Social Security income conditional on receiving a nonzero benefit d. Given that the decision to file for Social Security is affected by a wide range of variables that we have not controlled for (e.g. health, other sources of wealth, work preferences, physical demands of job, etc.), provide a story that would lead to the type of sample selection observed here.

6. A study by Joshua Angrist 1 investigated the effect of voluntary military service on postservice earnings. According to their estimates, the difference in mean earnings of veterans (those who were in the military) and non-veterans is $1233 annually. a. To examine the true effect of military service on earnings, Angrist estimates a simple OLS model: Y i = X i β + V i α + e i Where Y i is annual earnings, X i is a vector of controls describing person i s earnings potential (e.g. education, age) and V i is a dummy that equals one for veterans. The estimated coefficient on V i is -$197 and it is statistically significant. What could cause the regression estimate of the military effect to be NEGATIVE while the simple difference in means suggests that military service increases earnings by over $1000? b. For the cohort of people in the sample examined by Angrist, there was no draft and military service was voluntary. Since military service was voluntary, is the OLS estimate of the veteran effect likely to be biased upwards or downwards? Explain any assumptions that you make about behavior and why this would lead to either an upward or downward bias. c. Explain what model you could estimate to eliminate the bias in (b) and get the true effect of veteran status on earnings. Describe the estimation process and any additional data or variables that you will need to estimate the model. Also, indicate what parameters in the estimated model reveal the true effect of military service on earnings. d. Explain how you could use the estimation parameters described in (c) to estimate the total difference in earnings for two people who are identical in all respects except that one is a veteran and one is not. e. During the late 1960s, there was a draft where eligible men were randomly selected from the population (they randomly drew birthdates to determine a person s order in the draft). If the OLS earnings equation described in (a) was estimated using people who were age eligible for the military in the late 1960s (the draft cohort ) instead of the 1990s cohort (the voluntary cohort ), do you think the estimated effect of military service would rise or fall? Would the estimated effect of military service be closer to the true effect for the draft or voluntary cohort? Explain the basis for your prediction. 1 Angrist, Joshua (1998). Estimating the Labor market Impacton Voluntary Military Service Using Social Security Data on Military Applicants. Econometrica 66, 249-88.

7. Using 7 years of data from the Survey of Consumer Finances gathered between 1989 and 2007 (the survey is done once every 3 years) I estimated several regressions to examine the factors that influence the value of vehicles owned by a household The dependent variable in each case is the natural log of the real value of all vehicles owned (in 2007 dollars). The control variables are the natural log of real household income; and dummy variables indicating whether the household has a married couple (omitted dummy), a single female, or a single male; and year dummies (1989 omitted). OLS 10 th quantile 50 th quantile 90 th quantile Ln(real income) 0.423*** 0.409*** 0.429*** 0.445*** (0.0038) (0.0086) (0.0047) (0.0058) Single female -0.548*** -0.571*** -0.556*** -0.517*** (0.016) (0.030) (0.019) (0.023) Single male -0.329*** -0.463*** -0.327*** -0.191*** (0.015) (0.029) (0.019) (0.023) 1992 0.00310 0.0509-0.0732*** 0.0237 (0.022) (0.040) (0.027) (0.032) 1995 0.242*** 0.368*** 0.210*** 0.168*** (0.021) (0.040) (0.026) (0.032) 1998 0.246*** 0.440*** 0.188*** 0.155*** (0.021) (0.040) (0.027) (0.032) 2001 0.300*** 0.510*** 0.246*** 0.177*** (0.021) (0.040) (0.026) (0.032) 2004 0.261*** 0.401*** 0.214*** 0.207*** (0.021) (0.039) (0.026) (0.031) 2007 0.238*** 0.395*** 0.197*** 0.159*** (0.021) (0.039) (0.026) (0.032) Constant 4.897*** 3.839*** 4.941*** 5.655*** (0.048) (0.11) (0.059) (0.071) Observations 24801 24801 24801 24801 a. Based on the regressions above, controlling for income, sex, and marital status, what has happened to the mean value of cars owned between 1989 and 2007? Explain how you came to your conclusion.. b. Controlling for income, sex, and marital status, what has happened to the range of car values owned over time? Explain how you came to your conclusion.. c. For a given income and year, is the range of car values owned greater among the single male or single female population? Explain how you came to your conclusion. d. For a married couple with $100,000 of income in 2007, what is the projected range of car values (from 10 th to 90 th percentile)? Be sure to note the use of log transformation in some of the variables when you do your calculations.

8 An article by Rangvid (2010) 2 uses data on students in Denmark to investigate peer effects. The hypothesis is that the academic performance of a student s peers influence one s own academic performance. That is, ceteris paribus, students in a classroom with brighter peers will do better. To investigate the hypothesis, Rangvid estimates the effect of the average academic score for a student s peers on their own performance using both OLS and quantile regression methods. Other controls for family background, teacher quality, and class size are also included. The OLS estimate of the peer effect is given by the horizontal line in the figure below. The estimated effect from various quantile regressions is given by the bold line. The dashed lines represent confidence intervals for the estimates. Suppose that a school system is considering tracking students. This would put all the high performers in one classroom and the low performers in another. Given the information provided above, describe how you can tell a. whose academic performance would improve? b. whose academic performance would worsen? c. whether the average academic performance of all the students combined would rise or fall. 2 Educational Peer Effects: Quantile cregression Evidence from Denmark with PISA2000 data, unpublished manuscript.