Fixed Effects Maximum Likelihood Estimation of a Flexibly Parametric Proportional Hazard Model with an Application to Job Exits

Similar documents
Equity, Vacancy, and Time to Sale in Real Estate.

Correcting for Survival Effects in Cross Section Wage Equations Using NBA Data

State Dependence in a Multinominal-State Labor Force Participation of Married Women in Japan 1

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Firing Costs, Employment and Misallocation

The effect of the UI wage replacement rate on reemployment wages: a dynamic discrete time hazard model with unobserved heterogeneity.

Explaining Unemployment Duration in Australia*

PASS Sample Size Software

In Debt and Approaching Retirement: Claim Social Security or Work Longer?

Explaining procyclical male female wage gaps B

Volume 37, Issue 2. Handling Endogeneity in Stochastic Frontier Analysis

Cross Atlantic Differences in Estimating Dynamic Training Effects

Welfare Recipiency and Welfare Recidivism: An Analysis of the NLSY Data. Jian Cao Institute for Research on Poverty University of Wisconsin Madison

Re-employment probabilities over the business cycle

Multinomial Logit Models for Variable Response Categories Ordered

XI Congreso Internacional de la Academia de Ciencias Administrativas A.C. (ACACIA) Tema: Finanzas y Economía

Window Width Selection for L 2 Adjusted Quantile Regression

Employer-Provided Health Insurance and Labor Supply of Married Women

Appendix. A.1 Independent Random Effects (Baseline)

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

Economic conditions at school-leaving and self-employment

THE PERSISTENCE OF UNEMPLOYMENT AMONG AUSTRALIAN MALES

1. You are given the following information about a stationary AR(2) model:

WORKING PAPERS IN ECONOMICS & ECONOMETRICS. Bounds on the Return to Education in Australia using Ability Bias

GMM for Discrete Choice Models: A Capital Accumulation Application

Centre for Economic Policy Research

Experience with the Weighted Bootstrap in Testing for Unobserved Heterogeneity in Exponential and Weibull Duration Models

Risk management methodology in Latvian economics

Is neglected heterogeneity really an issue in binary and fractional regression models? A simulation exercise for logit, probit and loglog models

Analysis of Microdata

Analyzing the Determinants of Project Success: A Probit Regression Approach

Journal of Economics and Financial Analysis, Vol:1, No:1 (2017) 1-13

Carmen M. Reinhart b. Received 9 February 1998; accepted 7 May 1998

Volume 30, Issue 1. Samih A Azar Haigazian University

Re-Employment Probabilities over the Business Cycle

Estimation Procedure for Parametric Survival Distribution Without Covariates

THE EQUIVALENCE OF THREE LATENT CLASS MODELS AND ML ESTIMATORS

Modelling component reliability using warranty data

Survival Analysis APTS 2016/17 Preliminary material

Small Sample Bias Using Maximum Likelihood versus. Moments: The Case of a Simple Search Model of the Labor. Market

Quantile Regression due to Skewness. and Outliers

The Stochastic Approach for Estimating Technical Efficiency: The Case of the Greek Public Power Corporation ( )

Moral hazard in a voluntary deposit insurance system: Revisited

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

14.471: Fall 2012: Recitation 3: Labor Supply: Blundell, Duncan and Meghir EMA (1998)

A Test of the Normality Assumption in the Ordered Probit Model *

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

Marital Disruption and the Risk of Loosing Health Insurance Coverage. Extended Abstract. James B. Kirby. Agency for Healthcare Research and Quality

Acemoglu, et al (2008) cast doubt on the robustness of the cross-country empirical relationship between income and democracy. They demonstrate that

Labor Economics Field Exam Spring 2011

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

Estimation of Unemployment Duration in Botoşani County Using Survival Analysis

Time Invariant and Time Varying Inefficiency: Airlines Panel Data

ELEMENTS OF MONTE CARLO SIMULATION

NBER WORKING PAPER SERIES WHY DO PENSIONS REDUCE MOBILITY? Ann A. McDermed. Working Paper No. 2509

The Great Moderation Flattens Fat Tails: Disappearing Leptokurtosis

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

The Impact of Tax Policies on Economic Growth: Evidence from Asian Economies

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

An Empirical Examination of Traditional Equity Valuation Models: The case of the Athens Stock Exchange

Selection Bias, Demographic Effects and Ability Effects in Common Value Auction Experiments

Alternative methods of estimating program effects in event history models

The Probability of Experiencing Poverty and its Duration in Adulthood Extended Abstract for Population Association of America 2009 Annual Meeting

Subject CS2A Risk Modelling and Survival Analysis Core Principles

ONLINE APPENDIX (NOT FOR PUBLICATION) Appendix A: Appendix Figures and Tables

UPDATED IAA EDUCATION SYLLABUS

Who stays poor? Who becomes poor? Evidence from the British Household Panel Survey

Transitions between unemployment and low pay

Public Opinion about the Pension Reform in Albania

Worker adaptation and workplace accommodations after the onset of an illness

Empirical Analysis of the US Swap Curve Gough, O., Juneja, J.A., Nowman, K.B. and Van Dellen, S.

Abadie s Semiparametric Difference-in-Difference Estimator

The persistence of urban poverty in Ethiopia: A tale of two measurements

Censored Quantile Instrumental Variable

Risk Preferences and Technology: A Joint Analysis

Competing Risks Models using Mortgage Duration Data under the Proportional Hazards Assumption

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

THE ROLE OF EDUCATION FOR RE-EMPLOYMENT HAZARD OF ROMANIAN WOMEN

9. Logit and Probit Models For Dichotomous Data

Bonus Impacts on Receipt of Unemployment Insurance

HARVARD JOHN M. OLIN CENTER FOR LAW, ECONOMICS, AND BUSINESS

Exiting Poverty: Does Sex Matter?

Panel Data with Binary Dependent Variables

Panel Regression of Out-of-the-Money S&P 500 Index Put Options Prices

Online Appendix to Grouped Coefficients to Reduce Bias in Heterogeneous Dynamic Panel Models with Small T

The trade balance and fiscal policy in the OECD

Egyptian Married Women Don t desire to Work or Simply Can t? A Duration Analysis. Rana Hendy. March 15th, 2010

Keywords Akiake Information criterion, Automobile, Bonus-Malus, Exponential family, Linear regression, Residuals, Scaled deviance. I.

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

Gamma Distribution Fitting

AN ANALYSIS OF THE DEGREE OF DIVERSIFICATION AND FIRM PERFORMANCE Zheng-Feng Guo, Vanderbilt University Lingyan Cao, University of Maryland

Ministry of Health, Labour and Welfare Statistics and Information Department

AN EMPIRICAL ANALYSIS OF GENDER WAGE DIFFERENTIALS IN URBAN CHINA

The Effects of Active Labour Market Policies for Immigrants Receiving Social Assistance in Denmark

The Effects of Increasing the Early Retirement Age on Social Security Claims and Job Exits

2. Copula Methods Background

List of tables List of boxes List of screenshots Preface to the third edition Acknowledgements

Obtaining Analytic Derivatives for a Class of Discrete-Choice Dynamic Programming Models

Phd Program in Transportation. Transport Demand Modeling. Session 11

One period models Method II For working persons Labor Supply Optimal Wage-Hours Fixed Cost Models. Labor Supply. James Heckman University of Chicago

Transcription:

Fixed Effects Maximum Likelihood Estimation of a Flexibly Parametric Proportional Hazard Model with an Application to Job Exits Published in Economic Letters 2012 Audrey Light* Department of Economics Ohio State University 410 Arps Hall 1945 N. High Street Columbus, OH 43210 USA light.20@osu.edu Yoshiaki Omori Faculty of Economics Yokohama National University 79-3 Tokiwadai, Hodogaya Ward Yokohama 240-8501 JAPAN omori@ynu.ac.jp Abstract: We extend the fixed effects maximum likelihood estimator to a proportional hazard model with a flexibly parametric baseline hazard. We use the method to estimate a job duration model for young men, and show that failure to account for unobserved fixed effect causes negative schooling and union effects to be downward biased. Key words: Proportional hazard model, fixed effects JEL code: C41 *Corresponding author. +1(614)292-0493 (phone), +1(614)292-3906 (fax)

1. Introduction We extend Yamaguchi s (1986) fixed effects maximum likelihood (FEML) estimator to the proportional hazard model (PHM) with a flexibly parametric baseline hazard similar to the one proposed by Han and Hausman (1990) and Meyer (1990). After describing the estimation method which requires that the baseline hazard and covariates are constant within each duration interval we use Monte Carlo experiments to demonstrate that FEML performs well when these key assumptions are met, even with as few as five spells per individual. We then use the model to identify effects of union membership, schooling attainment, and other potentially endogenous variables on young men s job durations. This extension of FEML provides a tractable method for contending with endogenous covariates in a variety of hazard model applications. A more general solution to endogeneous covariates is to estimate multiple duration models simultaneously (van den Berg 2000). However, this approach cannot incorporate multiple endogenous regressors without a substantial increase in computational burden, and it requires that endogenous variables be modeled as duration variables. Simultaneous estimation of a PHM and discrete choice or linear models for endogenous regressors skirts the need for multiple duration models, but generally requires distributional assumptions for identification. Fixed effects duration models overcome these problems as long as the endogenous covariates are only correlated with time-invariant, individual-specific unobservables. Three approaches fixed effects conditional likelihood (FECL), fixed effects partial likelihood (FEPL) and fixed effects marginal likelihood (FEMGL) use stratification on individuals to remove individual-specific parameters from the likelihood function, thereby avoiding inconsistency due to the presence of these nuisance parameters (Chamberlain 1985; Cox and Lewis 1966; Ridder and Tunali 1999; Yamaguchi 1986). As long as the observation period is long enough to minimize this inconsistency, however, the FEML approach dominates FECL, FEPL and FEMGL by (i) requiring less stringent assumptions about the underlying stochastic process; (ii) allowing all forms of time-varying covariates to be included; and (iii) identifying the baseline hazard, which allows conditional survival probabilities to be predicted. We note that a fourth approach to the PHM with fixed effects that entails nonparametric estimation of the baseline hazard (Horowitz and Lee 2004) might dominate FEML in cases with dependent censoring or a baseline hazard that even a flexible parameterization fails to fit adequately. 1

2. Fixed Effects Maximum Likelihood Estimator Given data on j=1,,j i job durations for i=1,,n individuals, we model the hazard rate for individual i on job j at tenure t as a standard PHM:, where νi represents time-constant, individual unobservables, λ0t represents the baseline hazard, Xijt is a vector of time-varying covariates, and β is a vector of coefficients to be estimated. In contrast to the random effects PHM, we do not have to make a distributional assumption for νi or assume νi is uncorrelated with the covariates. Multiple spells for each individual and time-varying covariates aid identification. Yamaguchi (1986) proposes a FEML estimator for the case where (i) precise failure times are assumed known; and (ii) the baseline hazard is assumed to be a constant, exponential function. He shows that the first order condition for the maximization of the resulting log-likelihood 2 2.1 function can be solved analytically for νi as a function of β and the baseline parameter, given the data; this function is then substituted into the log-likelihood function to obtain FEML estimators for the parameters. We extend Yamaguchi s FEML approach as follows. Following Han and Hausman (1990), Kiefer (1988) and Prentice and Gloeckler (1978), we divide the time axis into unit intervals (0,1],,(t-1,t], although any uniform spell length can be used in estimation. We assume all covariates are constant within each interval and, in contrast to Yamaguchi (1986), we assume that the baseline hazard is constant within each interval rather than constant for all t. The log-likelihood function is ln ln exp exp ln 1 where ln 2.2 is the log of the baseline hazard integrated over the interval, Dij is a dummy variable indicating a failure, Tij is the exact duration, and is the grouped duration such that 1. Differentiating (2.2) with respect to νi and setting the result to zero can be solved individually

for each i due to the interval-constant baseline hazard assumption:. 2.3 exp ln 1 exp We could treat δt as free parameters for each interval, but to reduce the number of parameters we express it as a flexible polynomial. FEML estimates for β and αp are obtained by replacing νi in (2.2) with and maximizing the resulting function with respect to the parameters. As discussed in Yamaguchi (1986), the FEML estimator has two limitations. First, it does not identify coefficients for covariates that are time-constant within and across jobs. Second, the likelihood function depends on, which are incidental parameters, so estimates are inconsistent. As demonstrated below, neither shortcoming appears to be significant in our application. 3. Monte Carlo Results We generate data using model (2.1) with one endogenous covariate and one exogenous covariate. We assume and are independently and identically distributed as standard normal, and that ln and are bivariate normal with variances equal to one and covariance equal to -0.5. 1 We set 1 and 1, and make three alternative assumptions about the log of the baseline hazard integrated over the interval: (1) exponential ( ) with 1.4; (2) Weibull ) with 0.5, 1.4; and (3) Weibull with 1.5, 0.86. The exponential function is consistent with the assumption (section 2) that the baseline hazard is constant over each interval; this assumption is violated when we use the Weibull function, especially as the degree of duration dependence increases. For each parameterization we generate 1,000 samples consisting of 100 individuals with, alternately, 5, 10 and 100 spells per person. To generate data for each spell for each individual, we first draw a pair of heterogeneity terms ln, from the bivariate normal distribution. We then set t=0, draw and from the standard normal distribution, and compute the conditional probability that the spell survives the first interval. We draw a random number from 1 In our application, innate ability, determination, and other unobserved factors that explain endogenous covariates such as union status and schooling attainment are likely to be negatively correlated with stick-to-itness and other unobservables that increase the job exit hazard. 3

a uniform distribution and judge the spell to fail if this number exceeds the computed conditional probability; otherwise, we increment t and repeat the process, censoring all spells at t=32. Finally, after judging a spell to fail within a given interval, we compute its completed duration in continuous time by applying the inverse transform technique on the cumulative distribution function for duration, conditional on completion in the given interval. We use the FEML approach described in section 2 to estimate the coefficients and plus the six parameters in the function for each alternative model. In addition to estimating the FEML with alternative parameterizations of the baseline function, we also produce maximum likelihood (ML) estimates of a model that ignores unobserved heterogeneity (νi =1). In table 1, we report estimates for and that represent the means and standard errors across samples of estimated coefficients. Table 1 reveals that FEML performs well when we use the exponential baseline hazard. With 100 spells per person, we obtain estimates of 1.002 and -1.001, which are close to the true values of 1 and -1. The estimates remain reasonably close to their true values (while becoming less precise) when the number of spells per person falls to 10 and even to 5. In contrast, ML estimates are severely biased (.515 and.713 when we ignore unobserved heterogeneity and use 10 spells per person. When we assume a Weibull hazard, the estimated coefficients are biased (in opposite directions) by 42% when α=.5 and γ=1.4, and by 12% when α=1.5 and γ=.86. The Weibull with α=.5 and γ=1.4 exhibits extreme, negative duration over the first interval, so this model s poor performance demonstrates that violating the condition that the baseline hazard be constant within each interval becomes more of a problem as duration dependence increases. In further experiments not reported in table 1, we find that the bias falls by only 10% when, using α=.5 and γ=1.4, we decrease the interval length for by half while holding constant the frequency of changes in time-varying covariates. 4. Application to Job Duration Data We use the PHM with fixed effects and, for comparison, a PHM with no unobserved heterogeneity and a PHM with lnν i treated as a random draw from a standard normal distribution to estimate a job exit hazard with 1979-2008 data from the National Longitudinal Survey of Youth (NLSY79). Among the 5,579 men in the NLSY79 who were not in the military oversample, we select a sample of 3,357 men who were born in 1960-64 and still in school when 4

the survey began tracking employment in January 1978 (thus ensuring non-left censored work histories), and who are observed working for at least two unique employers between initial school exit and the 2008 interview. Using 13-week intervals, we obtain a sample with 232,211 interval-specific observations for 32,630 jobs held by these men. The number of spells per person ranges from two to 40 with a mean of 9.7 and standard deviation of 6.2. Based on our Monte Carlo results we believe that bias due to the incidental parameters problem will be minimal. In Table 2, we report means and standard deviations for select covariates included in the hazard model. Years of schooling and union status are key covariates insofar as they are time-varying, and are likely to be correlated with time-constant unobservables represented by ν i (innate ability, patience, etc.). 2 Other time-varying covariates include years of work experience and its square, and the average hourly wage (in 1996 dollars). For the non-fe models, we include time-constant indicators of race/ethnicity and an age-adjusted, 1980 score from the Armed Forces Qualifications Test. Other covariates not shown in table 2 include industry and occupation dummies and controls for the local unemployment rate, rural location, and average unemployment insurance benefits in the state of residence. Table 2 reveals that the estimated coefficient for union status is -0.270 when unobserved heterogeneity is ignored (column A), -0.283 when unobservables are assumed to be normally distributed random variables that are uncorrelated with covariates (column B), and -0.247 when the FEML approach is used to account for endogeneity. Similarly, the estimated schooling coefficient is -0.044 in column A, -0.047 in column B, and -0.037 in column C. For these and other coefficients, we do not always reject the null hypothesis of equality across specifications. Focusing on point estimates, however, we conclude that the failure to account for unobservables causes the estimated effects of union membership and schooling on job exits to be downward biased by 9% and 19%, respectively. The biases become even more severe 15% for union status and 27% for schooling when we use the random effects model and assume all covariates are exogenous. In contrast to the FECL, FEPL and FEMGL models, the FEML model identifies the baseline 2 Years of schooling is often thought of as time-invariant once the career begins, but 35% of our sample members increment their highest grade completed after their initial school exit, which we define as the start date of the first nonenrollment spell lasting 13 months or longer. 5

hazard and thus allows us to assess the magnitudes of the effects of endogenous covariates by computing conditional survival probabilities. Focusing on the schooling effect, we predict that a man with 16 years of schooling, three years of job tenure, and mean or modal values for all other covariates has a 62% chance of remaining with his current employer for another 13-week interval. This predicted survival probability falls to 50% if we use the random effects model (with exogenous schooling) and 44% if we ignore unobserved heterogeneity. 6

Acknowledgements: This research was funded by a grant to Light from the National Science Foundation and a grant to Omori from the Japan Society for the Promotion of Science (Grant-in- Aid for Scientific Research (C)20530193). We thank Shiying Zhang for excellent research assistance. References Chamberlain, G., 1985. Heterogeneity, Omitted Variable Bias and Duration Dependence. In: Heckman, J.J., Singer, B. (Eds.), Longitudinal Analysis of Labor Market Data. Cambridge University Press, Cambridge, pp. 3-38. Cox, D.R., Lewis P.A.W., 1966. The Statistical Analysis of Events. Chapman and Hall, London. Han, A., Hausman J.A., 1990. Specification and Semiparametric Estimation of Duration Models. Journal of Applied Econometrics 5, 1-28. Horowitz, J.L., Lee, S., 2004. Semiparametric Estimation of a Panel Data Proportional Hazards Model with Fixed Effects. Journal of Econometrics 119, 155-198. Kiefer, N., 1988. Analysis of Grouped Duration Data. Contemporary Mathematics 80, 107-37. Meyer, B., 1990. Unemployment Insurance and Unemployment Spells. Econometrica 58, 757-82. Prentice, R.L., Gloeckler, L.A., 1978. Regression Analysis of Grouped Survival Data with Application to Breast Cancer Data. Biometrics 34, 57-67. Ridder, G, Tunali, L., 1999. Stratified Partial Likelihood Estimation. Journal of Econometrics 92, 193-232. van den Berg, G.J., 2001. Duration Models: Specification, Identification, and Multiple Durations. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of Econometrics 5. Elsevier Science, Amsterdam, pp. 3381-3460. Yamaguchi, K., 1986. Alternative Approaches to Unobserved Heterogeneity in the Analysis of Repeatable Events. Sociological Methodology 16, 213-49. 7

True Model Table 1: Monte Carlo Estimates Estimated Model FEML ML with ν i =1 Spells per person β 1 β 2 β 1 β 2 Exponential baseline, γ=1.4 5 1.042-1.042 (.066) (.063) Exponential baseline, γ=1.4 10 1.024-1.021 0.515-0.713 (.042) (.042) (.034) (.047) Exponential baseline, γ=1.4 100 1.002-1.001 (.029) (.028) Weibull baseline, α =.5, γ =1.4 10 1.427-1.428 0.541-0.752 (.065) (.062) (.041) (.048) Weibull baseline, α =1.5, γ =.86 10 0.881-0.881 0.541-0.752 (.061) (.067) (.041) (.048) True values 1-1 1-1 Note: Each simulation is based on 1,000 samples containing 100 individuals. Standard errors are in parentheses. The estimated model in the right-most columns (maximum likelihood with ν i =1) ignores unobserved heterogeneity. 8

Table 2: Maximum Likelihood Estimates for Three Alternative Job Duration Hazard Models (A) ML (B) RE (C) FEML with ν i =1 Covariate Mean Coeff. Coeff. Coeff. (S.D.) (S.E.) (S.E.) (S.E.) Years of school 12.73 -.044 -.047 -.037 (2.30) (.004) (.009) (.010) 1 if union member.15 -.270 -.283 -.247 (.022) (.027) (.025) Years of experience 9.93 -.075 -.094 -.069 (7.42) (.003) (.004) (.003) Average hourly wage 13.63 -.014 -.014 -.013 (22.44) (.001) (.001) (.001) 1 if black.28.025.072 (.018) (.051) 1 if Hispanic.18.010 -.018 (.020) (.051) AFQT score 1.35 -.001 -.000 (27.53) (.000) (.002) Note: Model A ignores unobserved heterogeneity; B assumes unobserved factors are random effects that are uncorrelated with all covariates; C assumes fixed effects. All three models are estimated with 232,211 observations for 3,357 men using NLSY79 data. See text for other covariates included in each model. 9