Phd Program in Transportation Transport Demand Modeling João de Abreu e Silva Session 11 Binary and Ordered Choice Models Phd in Transportation / Transport Demand Modelling 1/26
Heterocedasticity Homoscedasticity of ε i in the binary choice model, is likely to be violated in micro- level data. No robust parametric approaches to model fitting and analysis. Heteroscedasticity necessitates some care in interpreting the coefficients for a variable w ik that could be in x or z or both. Partial effects Phd in Transportation / Transport Demand Modelling 2/26
The LogL function will be different also LM test provides a convenient way to test for heteroscedasticity. The model is easily estimated assuming that θ = 0. g i W i Phd in Transportation / Transport Demand Modelling 3/26
Use the data base healthcare_greene.lpj Estimate a binary logit Y=doctor Betas= age,educ,married,hhninc,hhkids,self, plus the constant term Estimate a binary logit with heterocedasticity The variance is function of the variables: female,working Estimate the Wald test and LM Phd in Transportation / Transport Demand Modelling 4/26
Binary logit NAMELIST ;x=one,age,educ,married,hhninc,hhkids,self$ LOGIT ;Lhs=doctor ;Rhs=x ;Marginal Effects$ Heterocedastic Binary Logit NAMELIST ;x=one,age,educ,married,hhninc,hhkids,self ;w=female,working$ LOGIT ;Lhs=doctor ;Rhs=x ;Hetero ;Rh2=w ;Marginal Effects$ Phd in Transportation / Transport Demand Modelling 5/26
General parameters Homocedastic Model +---------------------------------------------+ Binary Logit Model for Binary Choice Maximum Likelihood Estimates Model estimated: Oct 28, 2010 at 10:12:54AM. Dependent variable DOCTOR Weighting variable None Number of observations 27326 Iterations completed 4 Log likelihood function -17624.39 Number of parameters 7 Info. Criterion: AIC = 1.29045 Finite Sample: AIC = 1.29045 Info. Criterion: BIC = 1.29255 Info. Criterion:HQIC = 1.29113 Restricted log likelihood -18019.55 McFadden Pseudo R-squared.0219299 Chi squared 790.3324 Degrees of freedom 6 Prob[ChiSqd > value] =.0000000 Hosmer-Lemeshow chi-squared = 100.07793 P-value=.00000 with deg.fr. = 8 +---------------------------------------------+ Heterocedastic Model +---------------------------------------------+ Binary Logit Model for Binary Choice Maximum Likelihood Estimates Model estimated: Oct 28, 2010 at 10:20:16AM. Dependent variable DOCTOR Weighting variable None Number of observations 27326 Iterations completed 20 Log likelihood function -17482.31 Number of parameters 9 Info. Criterion: AIC = 1.28020 Finite Sample: AIC = 1.28020 Info. Criterion: BIC = 1.28290 Info. Criterion:HQIC = 1.28107 Restricted log likelihood -18019.55 McFadden Pseudo R-squared.0298142 Chi squared 1074.478 Degrees of freedom 6 Prob[ChiSqd > value] =.0000000 Hosmer-Lemeshow chi-squared = 251.61966 P-value=.00000 with deg.fr. = 8 Heteroscedastic Logit Model for Binary Data +---------------------------------------------+ Phd in Transportation / Transport Demand Modelling 6/26
Coeficients Homocedastic Model +--------+--------------+----------------+--------+--------+----------+ Variable Coefficient Standard Error b/st.er. P[ Z >z] Mean of X +--------+--------------+----------------+--------+--------+----------+ ---------+Characteristics in numerator of Prob[Y = 1] Constant.19630207.09148261 2.146.0319 AGE.02154436.00129096 16.689.0000 43.5256898 EDUC -.04283708.00566578-7.561.0000 11.3206310 MARRIED.07866593.03334957 2.359.0183.75861817 HHNINC -.12838019.07545127-1.701.0888.35208362 HHKIDS -.21642397.02961699-7.307.0000.40273000 SELF -.50886016.05126489-9.926.0000.06217522 Heterocedastic Model +--------+--------------+----------------+--------+--------+----------+ Variable Coefficient Standard Error b/st.er. P[ Z >z] Mean of X +--------+--------------+----------------+--------+--------+----------+ ---------+Characteristics in numerator of Prob[Y = 1] Constant.13038579.07993987 1.631.1029 AGE.01267775.00117014 10.834.0000 43.5256898 EDUC -.01474345.00493778-2.986.0028 11.3206310 MARRIED.05367866.02570463 2.088.0368.75861817 HHNINC -.04535008.05945239 -.763.4456.35208362 HHKIDS -.16931423.02524340-6.707.0000.40273000 SELF -.34757934.04988114-6.968.0000.06217522 ---------+Disturbance Variance Terms FEMALE -.64128849.05321731-12.050.0000.47877479 WORKING.23148882.04524144 5.117.0000.67704750 Phd in Transportation / Transport Demand Modelling 7/26
Homocedastic Model +--------+--------------+----------------+--------+--------+----------+ Variable Coefficient Standard Error b/st.er. P[ Z >z] Elasticity +--------+--------------+----------------+--------+--------+----------+ ---------+Marginal effect for variable in probability Constant.04560723.02123634 2.148.0317 AGE.00500544.00029937 16.720.0000.34422172 EDUC -.00995242.00131612-7.562.0000 -.17801212 ---------+Marginal effect for dummy variable is P 1 - P 0. MARRIED.01837205.00782737 2.347.0189.02202070 HHNINC -.02982681.01752908-1.702.0888 -.01659216 ---------+Marginal effect for dummy variable is P 1 - P 0. HHKIDS -.05051956.00693740-7.282.0000 -.03214577 ---------+Marginal effect for dummy variable is P 1 - P 0. SELF -.12335991.01273732-9.685.0000 -.01211830 Heterocedastic Model +--------+--------------+----------------+--------+--------+----------+ Variable Coefficient Standard Error b/st.er. P[ Z >z] Elasticity +--------+--------------+----------------+--------+--------+----------+ ---------+Characteristics in numerator of Prob[Y = 1] Constant.03540389.02141397 1.653.0983 AGE.00344241.00029542 11.653.0000.23862175 EDUC -.00400332.00131402-3.047.0023 -.07217585 MARRIED.01457547.00699227 2.085.0371.01760950 HHNINC -.01231399.01622256 -.759.4478 -.00690472 HHKIDS -.04597420.00631202-7.284.0000 -.02948693 SELF -.09437885.01284107-7.350.0000 -.00934530 ---------+Disturbance Variance Terms FEMALE.07840115.00417011 18.801.0000.05977989 WORKING -.02830082.00782409-3.617.0003 -.03051544 Phd in Transportation / Transport Demand Modelling 8/26
Homocedastic Model +---------------------------------------------------------+ Predictions for Binary Choice Model. Predicted value is 1 when probability is greater than.500000, 0 otherwise. Note, column or row total percentages may not sum to 100% because of rounding. Percentages are of full sample. +------+---------------------------------+----------------+ Actual Predicted Value Value 0 1 Total Actual +------+----------------+----------------+----------------+ 0 615 ( 2.3%) 9520 ( 34.8%) 10135 ( 37.1%) 1 583 ( 2.1%) 16608 ( 60.8%) 17191 ( 62.9%) +------+----------------+----------------+----------------+ Total 1198 ( 4.4%) 26128 ( 95.6%) 27326 (100.0%) +------+----------------+----------------+----------------+ Heterocedastic Model Predictions for Binary Choice Model. Predicted value is 1 when probability is greater than.500000, 0 otherwise. Note, column or row total percentages may not sum to 100% because of rounding. Percentages are of full sample. +------+---------------------------------+----------------+ Actual Predicted Value Value 0 1 Total Actual +------+----------------+----------------+----------------+ 0 304 ( 1.1%) 9831 ( 36.0%) 10135 ( 37.1%) 1 269 ( 1.0%) 16922 ( 61.9%) 17191 ( 62.9%) +------+----------------+----------------+----------------+ Total 573 ( 2.1%) 26753 ( 97.9%) 27326 (100.0%) +------+----------------+----------------+----------------+ Phd in Transportation / Transport Demand Modelling 9/26
Wald test NAMELIST ;x=one,age,educ,married,hhninc,hhkids,self ;w=female,working$ LOGIT ;Lhs=doctor ;Rhs=x ;Hetero ;Rh2=w ;Wald: b(8)=0,b(9)=0$ Phd in Transportation / Transport Demand Modelling 10/26
+---------------------------------------------+ Binary Logit Model for Binary Choice Maximum Likelihood Estimates Model estimated: Oct 28, 2010 at 10:20:20AM. Dependent variable DOCTOR Weighting variable None Number of observations 27326 Iterations completed 20 Log likelihood function -17482.31 Number of parameters 9 Info. Criterion: AIC = 1.28020 Finite Sample: AIC = 1.28020 Info. Criterion: BIC = 1.28290 Info. Criterion:HQIC = 1.28107 Restricted log likelihood -18019.55 McFadden Pseudo R-squared.0298142 Chi squared 1074.478 Degrees of freedom 6 Prob[ChiSqd > value] =.0000000 Hosmer-Lemeshow chi-squared = 251.61966 P-value=.00000 with deg.fr. = 8 Heteroscedastic Logit Model for Binary Data Wald test of 2 linear restrictions Chi-squared = 236.01, Sig. level =.00000 +---------------------------------------------+ The Wald Test rejects the hypothesis that both female and working are equal to zero Phd in Transportation / Transport Demand Modelling 11/26
NAMELIST ;x=one,age,educ,married,hhninc,hhkids,self$ LOGIT ;Lhs=doctor ;Rhs=x ;Marginal Effects$ NAMELIST ;w=female,working$ LOGIT ;Lhs=doctor ;Rhs=x ;Hetero ;Rh2=w ;Start=b,0,0 ;Maxit=0$ LOGIT ;Lhs=doctor ;Rhs=x ;Hetero ;Rh2=w ;Marginal Effects$ Phd in Transportation / Transport Demand Modelling 12/26
+---------------------------------------------+ Binary Logit Model for Binary Choice Maximum Likelihood Estimates Model estimated: Oct 28, 2010 at 10:20:21AM. Dependent variable DOCTOR Weighting variable None Number of observations 27326 Iterations completed 1 LM Stat. at start values 273.9433 LM statistic kept as scalar LMSTAT Log likelihood function -17624.39 Number of parameters 9 Info. Criterion: AIC = 1.29059 Finite Sample: AIC = 1.29059 Info. Criterion: BIC = 1.29330 Info. Criterion:HQIC = 1.29147 Restricted log likelihood -18019.55 McFadden Pseudo R-squared.0219299 Chi squared 790.3324 Degrees of freedom 6 Prob[ChiSqd > value] =.0000000 Hosmer-Lemeshow chi-squared = 100.07793 P-value=.00000 with deg.fr. = 8 Heteroscedastic Logit Model for Binary Data +---------------------------------------------+ The LM statistic rejects the hypothesis that q=0 (chisq, with 2 degrees of freedom) Also the LR rejects the hypothesis of homocedasticity LL Homocedastic model: -17624,39 LLHeterocedastic Model:-17482,31 LR=284,6 Phd in Transportation / Transport Demand Modelling 13/26
Panel Data Model for a possible unbalanced panel Effects model u i is the unobserved, individual specific heterogeneity. Random effects model - u i is unrelated to x it, so that the conditional distribution f (u i x it ) is not dependent on x it Fixed effects model - f (u i x it ) is unrestricted. u i and x it may be correlated Phd in Transportation / Transport Demand Modelling 14/26
Panel Data This modeling framework presents several difficulties and unconventional estimation problems: Estimation of the random effects model requires very strong assumptions about the heterogeneity. The fixed effects model encounters an incidental parameters problem that renders the maximum likelihood estimator inconsistent even when the model is properly specified. As in the linear model, there cannot be any time invariant variables in a fixed effects binary choice model. Phd in Transportation / Transport Demand Modelling 15/26
Fixed effects When the appropriate model is either a fixed or random effects specification (or any other specification that involves correlation across observations), the pooled estimator obtained by ignoring the panel nature of the data will usually be inconsistent. Fixed Effects d it is a dummy variable that is equal to 1 in every period for individual i and zero otherwise. x it are the nonconstant variables in the model. Phd in Transportation / Transport Demand Modelling 16/26
Fixed effects The K elements of g and the n individual constant terms are the parameters to be estimated. In the linear regression the estimation of the parameters is made possible by transforming the data to deviations from group means (thus eliminating the individual specific constants from the estimator). This is not usable in this model (nonlinear). To estimate the parameters of this model, it will be necessary actually to compute the possibly huge number of constant terms at the same time as g. Practical obstacle to the estimation of this model (need to invert a potentially large second derivatives matrix) Greene (2010) argues that this is a misconception. Phd in Transportation / Transport Demand Modelling 17/26
The problems with the fixed effects estimator are statistical, not practical. It relies on T i increasing for the constant terms to be consistent in essence, each α i is estimated with T i observations. But, in this case, T i is fixed and generally quite small. Thus, the estimators of the constant terms are inconsistent (they do not converge). Because the estimator of g is a function of the estimators of α i, then MLE of g is not consistent either - Incidental parameters problem In a small sample (small T i ) the upward bias in the estimators is in the order of T/(T 1). The seriousness of this bias is contentious. Phd in Transportation / Transport Demand Modelling 18/26
Since the fixed effects approach does not require an assumption of orthogonality of the independent variables and the heterogeneity it is an appealing approach. Tradeoff between this and the incidental parameters problem. Phd in Transportation / Transport Demand Modelling 19/26
Random effects model Random effects model vit and ui are independent random variables (X are the independent variables) Phd in Transportation / Transport Demand Modelling 20/26
Random effects model The probit model is Phd in Transportation / Transport Demand Modelling 21/26
Random effects model When the data is pooled and the within group correlation is ignored the maximum likelihood estimator provides a consistent estimator of g, but not of g. This estimator is inconsistent (biased toward zero) The estimated asymptotic covariance matrix is also inappropriate because the observations (within groups) are correlated Partial effects for the random effects probit model Phd in Transportation / Transport Demand Modelling 22/26
Random effects model Assuming the data x it are well behaved, the pooled model does produce the appropriate estimator of the partial effects in the random effects probit model. This establishes a case for estimating the pooled model, with an appropriate correction to the estimator of the asymptotic covariance matrix. Maximum likelihood estimator for the random effects model Computation of integrals (with no closed form). The parameters may also be estimated by maximum simulated likelihood. Typically specified using the normal distribution (probit model) for both v it and u i. Using the simulation based estimator, the logit model could be used for either or both terms. Phd in Transportation / Transport Demand Modelling 23/26
Testing for heterogeneity Test whether there is heterogeneity. With homogeneity (α i = α), the model can be estimated as a pooled probit or logit model. Random effects model: A Wald (t ) test of the statistical significance of the estimate of ρ is appropriate. The likelihood ratio test by comparing the log likelihoods of the random effects and pooled models. Phd in Transportation / Transport Demand Modelling 24/26
Testing for heterogeneity Testing for heterogeneity in the fixed effects case is more difficult. It is not possible to test the hypothesis using the likelihood ratio test because the two likelihoods are not comparable. The conditional likelihood is based on a restricted data set that excludes individuals for which y it is the same in every period. Since the individual effects are not estimated, none of the usual tests of restrictions can be used. Hausman s test is a natural one to use here. A large value will indicate doubts on the hypothesis of homogeneity. But is not definitive. It seems that there is no conclusive test for fixed effect versus no effects Phd in Transportation / Transport Demand Modelling 25/26
Testing for fixed or random effects Variable addition test If the random effects model is appropriate, then the coefficients on the group means should be zero If θ 0, this casts doubt on the random effects model, which suggests the fixed-effects model as a preferable alternative. The Wald and likelihood ratio tests should be usable. The effects in the fixed effects model are projected on the means of the (time varying) regressors Proposed middle ground between fixed and random effects Phd in Transportation / Transport Demand Modelling 26/26