Phd Program in Transportation. Transport Demand Modeling. Session 11

Similar documents
Discrete Choice Modeling William Greene Stern School of Business, New York University. Lab Session 2 Binary Choice Modeling with Panel Data

Exercise 1. Data from the Journal of Applied Econometrics Archive. This is an unbalanced panel.n = 27326, Group sizes range from 1 to 7, 7293 groups.

Discrete Choice Modeling

Transport Data Analysis and Modeling Methodologies

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

Intro to GLM Day 2: GLM and Maximum Likelihood

Alastair Hall ECG 790F: Microeconometrics Spring Computer Handout # 2. Estimation of binary response models : part II

DYNAMICS OF URBAN INFORMAL

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7

Analysis of Microdata

Appendix. Table A.1 (Part A) The Author(s) 2015 G. Chakrabarti and C. Sen, Green Investing, SpringerBriefs in Finance, DOI /

Is neglected heterogeneity really an issue in binary and fractional regression models? A simulation exercise for logit, probit and loglog models

Econometric Methods for Valuation Analysis

ARCH Models and Financial Applications

Moral hazard in a voluntary deposit insurance system: Revisited

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

A Comparison of Univariate Probit and Logit. Models Using Simulation

A Two-Step Estimator for Missing Values in Probit Model Covariates

9. Logit and Probit Models For Dichotomous Data

Vlerick Leuven Gent Working Paper Series 2003/30 MODELLING LIMITED DEPENDENT VARIABLES: METHODS AND GUIDELINES FOR RESEARCHERS IN STRATEGIC MANAGEMENT

Chapter 6. Transformation of Variables

Time Invariant and Time Varying Inefficiency: Airlines Panel Data

THE PERSISTENCE OF UNEMPLOYMENT AMONG AUSTRALIAN MALES

Logit Models for Binary Data

An Empirical Examination of Traditional Equity Valuation Models: The case of the Athens Stock Exchange

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

The Returns to Aggregated Factors of Production when Labor Is Measured by Education Level

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Valuing Environmental Impacts: Practical Guidelines for the Use of Value Transfer in Policy and Project Appraisal

Quantitative Techniques Term 2

THE IMPACT OF BANKING RISKS ON THE CAPITAL OF COMMERCIAL BANKS IN LIBYA

Acemoglu, et al (2008) cast doubt on the robustness of the cross-country empirical relationship between income and democracy. They demonstrate that

State Dependence in a Multinominal-State Labor Force Participation of Married Women in Japan 1

A Test of the Normality Assumption in the Ordered Probit Model *

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Lecture 21: Logit Models for Multinomial Responses Continued

Didacticiel - Études de cas. In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA.

Estimating Mixed Logit Models with Large Choice Sets. Roger H. von Haefen, NC State & NBER Adam Domanski, NOAA July 2013

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

Web Appendix Figure 1. Operational Steps of Experiment

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Equity, Vacancy, and Time to Sale in Real Estate.

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Final Exam - section 1. Thursday, December hours, 30 minutes

XI Congreso Internacional de la Academia de Ciencias Administrativas A.C. (ACACIA) Tema: Finanzas y Economía

The method of Maximum Likelihood.

ONLINE APPENDIX (NOT FOR PUBLICATION) Appendix A: Appendix Figures and Tables

Financial Development and Economic Growth at Different Income Levels

An Instrumental Variables Panel Data Approach to. Farm Specific Efficiency Estimation

New robust inference for predictive regressions

Assicurazioni Generali: An Option Pricing Case with NAGARCH

Estimation of dynamic term structure models

Simulated Multivariate Random Effects Probit Models for Unbalanced Panels

Recovery measures of underfunded pension funds: contribution increase, no indexation, or pension cut? Leo de Haan

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

The relationship between external debt and foreign direct investment in D8 member countries ( )

PhD Qualifier Examination

Volume 37, Issue 2. Handling Endogeneity in Stochastic Frontier Analysis

Modelling the potential human capital on the labor market using logistic regression in R

Point-Biserial and Biserial Correlations

Heterogeneity in Multinomial Choice Models, with an Application to a Study of Employment Dynamics

Empirical Methods for Corporate Finance. Panel Data, Fixed Effects, and Standard Errors

15. Multinomial Outcomes A. Colin Cameron Pravin K. Trivedi Copyright 2006

Estimating Ordered Categorical Variables Using Panel Data: A Generalised Ordered Probit Model with an Autofit Procedure

Questions of Statistical Analysis and Discrete Choice Models

Estimation Procedure for Parametric Survival Distribution Without Covariates

Discrete Choice Modeling William Greene Stern School of Business, New York University. Lab Session 4

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Maximum Likelihood Estimation

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Volume Title: Bank Stock Prices and the Bank Capital Problem. Volume URL:

Online Appendix to Grouped Coefficients to Reduce Bias in Heterogeneous Dynamic Panel Models with Small T

STA 4504/5503 Sample questions for exam True-False questions.

PASS Sample Size Software

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

ECO671, Spring 2014, Sample Questions for First Exam

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2010, Mr. Ruey S. Tsay Solutions to Final Exam

Country Fixed Effects and Unit Roots: A Comment on Poverty and Civil War: Revisiting the Evidence

Analysis of the Influence of the Annualized Rate of Rentability on the Unit Value of the Net Assets of the Private Administered Pension Fund NN

Correcting for Survival Effects in Cross Section Wage Equations Using NBA Data

Chapter 4 Level of Volatility in the Indian Stock Market

Testing the significance of the RV coefficient

Appendix. A.1 Independent Random Effects (Baseline)

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Hasil Common Effect Model

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Basic Regression Analysis with Time Series Data

A SEARCH FOR A STABLE LONG RUN MONEY DEMAND FUNCTION FOR THE US

INTERNATIONAL REAL ESTATE REVIEW 2002 Vol. 5 No. 1: pp Housing Demand with Random Group Effects

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian. Binary Logit

WORKING PAPERS IN ECONOMICS & ECONOMETRICS. Bounds on the Return to Education in Australia using Ability Bias

Labor Economics Field Exam Spring 2014

Fixed Effects Maximum Likelihood Estimation of a Flexibly Parametric Proportional Hazard Model with an Application to Job Exits

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Transcription:

Phd Program in Transportation Transport Demand Modeling João de Abreu e Silva Session 11 Binary and Ordered Choice Models Phd in Transportation / Transport Demand Modelling 1/26

Heterocedasticity Homoscedasticity of ε i in the binary choice model, is likely to be violated in micro- level data. No robust parametric approaches to model fitting and analysis. Heteroscedasticity necessitates some care in interpreting the coefficients for a variable w ik that could be in x or z or both. Partial effects Phd in Transportation / Transport Demand Modelling 2/26

The LogL function will be different also LM test provides a convenient way to test for heteroscedasticity. The model is easily estimated assuming that θ = 0. g i W i Phd in Transportation / Transport Demand Modelling 3/26

Use the data base healthcare_greene.lpj Estimate a binary logit Y=doctor Betas= age,educ,married,hhninc,hhkids,self, plus the constant term Estimate a binary logit with heterocedasticity The variance is function of the variables: female,working Estimate the Wald test and LM Phd in Transportation / Transport Demand Modelling 4/26

Binary logit NAMELIST ;x=one,age,educ,married,hhninc,hhkids,self$ LOGIT ;Lhs=doctor ;Rhs=x ;Marginal Effects$ Heterocedastic Binary Logit NAMELIST ;x=one,age,educ,married,hhninc,hhkids,self ;w=female,working$ LOGIT ;Lhs=doctor ;Rhs=x ;Hetero ;Rh2=w ;Marginal Effects$ Phd in Transportation / Transport Demand Modelling 5/26

General parameters Homocedastic Model +---------------------------------------------+ Binary Logit Model for Binary Choice Maximum Likelihood Estimates Model estimated: Oct 28, 2010 at 10:12:54AM. Dependent variable DOCTOR Weighting variable None Number of observations 27326 Iterations completed 4 Log likelihood function -17624.39 Number of parameters 7 Info. Criterion: AIC = 1.29045 Finite Sample: AIC = 1.29045 Info. Criterion: BIC = 1.29255 Info. Criterion:HQIC = 1.29113 Restricted log likelihood -18019.55 McFadden Pseudo R-squared.0219299 Chi squared 790.3324 Degrees of freedom 6 Prob[ChiSqd > value] =.0000000 Hosmer-Lemeshow chi-squared = 100.07793 P-value=.00000 with deg.fr. = 8 +---------------------------------------------+ Heterocedastic Model +---------------------------------------------+ Binary Logit Model for Binary Choice Maximum Likelihood Estimates Model estimated: Oct 28, 2010 at 10:20:16AM. Dependent variable DOCTOR Weighting variable None Number of observations 27326 Iterations completed 20 Log likelihood function -17482.31 Number of parameters 9 Info. Criterion: AIC = 1.28020 Finite Sample: AIC = 1.28020 Info. Criterion: BIC = 1.28290 Info. Criterion:HQIC = 1.28107 Restricted log likelihood -18019.55 McFadden Pseudo R-squared.0298142 Chi squared 1074.478 Degrees of freedom 6 Prob[ChiSqd > value] =.0000000 Hosmer-Lemeshow chi-squared = 251.61966 P-value=.00000 with deg.fr. = 8 Heteroscedastic Logit Model for Binary Data +---------------------------------------------+ Phd in Transportation / Transport Demand Modelling 6/26

Coeficients Homocedastic Model +--------+--------------+----------------+--------+--------+----------+ Variable Coefficient Standard Error b/st.er. P[ Z >z] Mean of X +--------+--------------+----------------+--------+--------+----------+ ---------+Characteristics in numerator of Prob[Y = 1] Constant.19630207.09148261 2.146.0319 AGE.02154436.00129096 16.689.0000 43.5256898 EDUC -.04283708.00566578-7.561.0000 11.3206310 MARRIED.07866593.03334957 2.359.0183.75861817 HHNINC -.12838019.07545127-1.701.0888.35208362 HHKIDS -.21642397.02961699-7.307.0000.40273000 SELF -.50886016.05126489-9.926.0000.06217522 Heterocedastic Model +--------+--------------+----------------+--------+--------+----------+ Variable Coefficient Standard Error b/st.er. P[ Z >z] Mean of X +--------+--------------+----------------+--------+--------+----------+ ---------+Characteristics in numerator of Prob[Y = 1] Constant.13038579.07993987 1.631.1029 AGE.01267775.00117014 10.834.0000 43.5256898 EDUC -.01474345.00493778-2.986.0028 11.3206310 MARRIED.05367866.02570463 2.088.0368.75861817 HHNINC -.04535008.05945239 -.763.4456.35208362 HHKIDS -.16931423.02524340-6.707.0000.40273000 SELF -.34757934.04988114-6.968.0000.06217522 ---------+Disturbance Variance Terms FEMALE -.64128849.05321731-12.050.0000.47877479 WORKING.23148882.04524144 5.117.0000.67704750 Phd in Transportation / Transport Demand Modelling 7/26

Homocedastic Model +--------+--------------+----------------+--------+--------+----------+ Variable Coefficient Standard Error b/st.er. P[ Z >z] Elasticity +--------+--------------+----------------+--------+--------+----------+ ---------+Marginal effect for variable in probability Constant.04560723.02123634 2.148.0317 AGE.00500544.00029937 16.720.0000.34422172 EDUC -.00995242.00131612-7.562.0000 -.17801212 ---------+Marginal effect for dummy variable is P 1 - P 0. MARRIED.01837205.00782737 2.347.0189.02202070 HHNINC -.02982681.01752908-1.702.0888 -.01659216 ---------+Marginal effect for dummy variable is P 1 - P 0. HHKIDS -.05051956.00693740-7.282.0000 -.03214577 ---------+Marginal effect for dummy variable is P 1 - P 0. SELF -.12335991.01273732-9.685.0000 -.01211830 Heterocedastic Model +--------+--------------+----------------+--------+--------+----------+ Variable Coefficient Standard Error b/st.er. P[ Z >z] Elasticity +--------+--------------+----------------+--------+--------+----------+ ---------+Characteristics in numerator of Prob[Y = 1] Constant.03540389.02141397 1.653.0983 AGE.00344241.00029542 11.653.0000.23862175 EDUC -.00400332.00131402-3.047.0023 -.07217585 MARRIED.01457547.00699227 2.085.0371.01760950 HHNINC -.01231399.01622256 -.759.4478 -.00690472 HHKIDS -.04597420.00631202-7.284.0000 -.02948693 SELF -.09437885.01284107-7.350.0000 -.00934530 ---------+Disturbance Variance Terms FEMALE.07840115.00417011 18.801.0000.05977989 WORKING -.02830082.00782409-3.617.0003 -.03051544 Phd in Transportation / Transport Demand Modelling 8/26

Homocedastic Model +---------------------------------------------------------+ Predictions for Binary Choice Model. Predicted value is 1 when probability is greater than.500000, 0 otherwise. Note, column or row total percentages may not sum to 100% because of rounding. Percentages are of full sample. +------+---------------------------------+----------------+ Actual Predicted Value Value 0 1 Total Actual +------+----------------+----------------+----------------+ 0 615 ( 2.3%) 9520 ( 34.8%) 10135 ( 37.1%) 1 583 ( 2.1%) 16608 ( 60.8%) 17191 ( 62.9%) +------+----------------+----------------+----------------+ Total 1198 ( 4.4%) 26128 ( 95.6%) 27326 (100.0%) +------+----------------+----------------+----------------+ Heterocedastic Model Predictions for Binary Choice Model. Predicted value is 1 when probability is greater than.500000, 0 otherwise. Note, column or row total percentages may not sum to 100% because of rounding. Percentages are of full sample. +------+---------------------------------+----------------+ Actual Predicted Value Value 0 1 Total Actual +------+----------------+----------------+----------------+ 0 304 ( 1.1%) 9831 ( 36.0%) 10135 ( 37.1%) 1 269 ( 1.0%) 16922 ( 61.9%) 17191 ( 62.9%) +------+----------------+----------------+----------------+ Total 573 ( 2.1%) 26753 ( 97.9%) 27326 (100.0%) +------+----------------+----------------+----------------+ Phd in Transportation / Transport Demand Modelling 9/26

Wald test NAMELIST ;x=one,age,educ,married,hhninc,hhkids,self ;w=female,working$ LOGIT ;Lhs=doctor ;Rhs=x ;Hetero ;Rh2=w ;Wald: b(8)=0,b(9)=0$ Phd in Transportation / Transport Demand Modelling 10/26

+---------------------------------------------+ Binary Logit Model for Binary Choice Maximum Likelihood Estimates Model estimated: Oct 28, 2010 at 10:20:20AM. Dependent variable DOCTOR Weighting variable None Number of observations 27326 Iterations completed 20 Log likelihood function -17482.31 Number of parameters 9 Info. Criterion: AIC = 1.28020 Finite Sample: AIC = 1.28020 Info. Criterion: BIC = 1.28290 Info. Criterion:HQIC = 1.28107 Restricted log likelihood -18019.55 McFadden Pseudo R-squared.0298142 Chi squared 1074.478 Degrees of freedom 6 Prob[ChiSqd > value] =.0000000 Hosmer-Lemeshow chi-squared = 251.61966 P-value=.00000 with deg.fr. = 8 Heteroscedastic Logit Model for Binary Data Wald test of 2 linear restrictions Chi-squared = 236.01, Sig. level =.00000 +---------------------------------------------+ The Wald Test rejects the hypothesis that both female and working are equal to zero Phd in Transportation / Transport Demand Modelling 11/26

NAMELIST ;x=one,age,educ,married,hhninc,hhkids,self$ LOGIT ;Lhs=doctor ;Rhs=x ;Marginal Effects$ NAMELIST ;w=female,working$ LOGIT ;Lhs=doctor ;Rhs=x ;Hetero ;Rh2=w ;Start=b,0,0 ;Maxit=0$ LOGIT ;Lhs=doctor ;Rhs=x ;Hetero ;Rh2=w ;Marginal Effects$ Phd in Transportation / Transport Demand Modelling 12/26

+---------------------------------------------+ Binary Logit Model for Binary Choice Maximum Likelihood Estimates Model estimated: Oct 28, 2010 at 10:20:21AM. Dependent variable DOCTOR Weighting variable None Number of observations 27326 Iterations completed 1 LM Stat. at start values 273.9433 LM statistic kept as scalar LMSTAT Log likelihood function -17624.39 Number of parameters 9 Info. Criterion: AIC = 1.29059 Finite Sample: AIC = 1.29059 Info. Criterion: BIC = 1.29330 Info. Criterion:HQIC = 1.29147 Restricted log likelihood -18019.55 McFadden Pseudo R-squared.0219299 Chi squared 790.3324 Degrees of freedom 6 Prob[ChiSqd > value] =.0000000 Hosmer-Lemeshow chi-squared = 100.07793 P-value=.00000 with deg.fr. = 8 Heteroscedastic Logit Model for Binary Data +---------------------------------------------+ The LM statistic rejects the hypothesis that q=0 (chisq, with 2 degrees of freedom) Also the LR rejects the hypothesis of homocedasticity LL Homocedastic model: -17624,39 LLHeterocedastic Model:-17482,31 LR=284,6 Phd in Transportation / Transport Demand Modelling 13/26

Panel Data Model for a possible unbalanced panel Effects model u i is the unobserved, individual specific heterogeneity. Random effects model - u i is unrelated to x it, so that the conditional distribution f (u i x it ) is not dependent on x it Fixed effects model - f (u i x it ) is unrestricted. u i and x it may be correlated Phd in Transportation / Transport Demand Modelling 14/26

Panel Data This modeling framework presents several difficulties and unconventional estimation problems: Estimation of the random effects model requires very strong assumptions about the heterogeneity. The fixed effects model encounters an incidental parameters problem that renders the maximum likelihood estimator inconsistent even when the model is properly specified. As in the linear model, there cannot be any time invariant variables in a fixed effects binary choice model. Phd in Transportation / Transport Demand Modelling 15/26

Fixed effects When the appropriate model is either a fixed or random effects specification (or any other specification that involves correlation across observations), the pooled estimator obtained by ignoring the panel nature of the data will usually be inconsistent. Fixed Effects d it is a dummy variable that is equal to 1 in every period for individual i and zero otherwise. x it are the nonconstant variables in the model. Phd in Transportation / Transport Demand Modelling 16/26

Fixed effects The K elements of g and the n individual constant terms are the parameters to be estimated. In the linear regression the estimation of the parameters is made possible by transforming the data to deviations from group means (thus eliminating the individual specific constants from the estimator). This is not usable in this model (nonlinear). To estimate the parameters of this model, it will be necessary actually to compute the possibly huge number of constant terms at the same time as g. Practical obstacle to the estimation of this model (need to invert a potentially large second derivatives matrix) Greene (2010) argues that this is a misconception. Phd in Transportation / Transport Demand Modelling 17/26

The problems with the fixed effects estimator are statistical, not practical. It relies on T i increasing for the constant terms to be consistent in essence, each α i is estimated with T i observations. But, in this case, T i is fixed and generally quite small. Thus, the estimators of the constant terms are inconsistent (they do not converge). Because the estimator of g is a function of the estimators of α i, then MLE of g is not consistent either - Incidental parameters problem In a small sample (small T i ) the upward bias in the estimators is in the order of T/(T 1). The seriousness of this bias is contentious. Phd in Transportation / Transport Demand Modelling 18/26

Since the fixed effects approach does not require an assumption of orthogonality of the independent variables and the heterogeneity it is an appealing approach. Tradeoff between this and the incidental parameters problem. Phd in Transportation / Transport Demand Modelling 19/26

Random effects model Random effects model vit and ui are independent random variables (X are the independent variables) Phd in Transportation / Transport Demand Modelling 20/26

Random effects model The probit model is Phd in Transportation / Transport Demand Modelling 21/26

Random effects model When the data is pooled and the within group correlation is ignored the maximum likelihood estimator provides a consistent estimator of g, but not of g. This estimator is inconsistent (biased toward zero) The estimated asymptotic covariance matrix is also inappropriate because the observations (within groups) are correlated Partial effects for the random effects probit model Phd in Transportation / Transport Demand Modelling 22/26

Random effects model Assuming the data x it are well behaved, the pooled model does produce the appropriate estimator of the partial effects in the random effects probit model. This establishes a case for estimating the pooled model, with an appropriate correction to the estimator of the asymptotic covariance matrix. Maximum likelihood estimator for the random effects model Computation of integrals (with no closed form). The parameters may also be estimated by maximum simulated likelihood. Typically specified using the normal distribution (probit model) for both v it and u i. Using the simulation based estimator, the logit model could be used for either or both terms. Phd in Transportation / Transport Demand Modelling 23/26

Testing for heterogeneity Test whether there is heterogeneity. With homogeneity (α i = α), the model can be estimated as a pooled probit or logit model. Random effects model: A Wald (t ) test of the statistical significance of the estimate of ρ is appropriate. The likelihood ratio test by comparing the log likelihoods of the random effects and pooled models. Phd in Transportation / Transport Demand Modelling 24/26

Testing for heterogeneity Testing for heterogeneity in the fixed effects case is more difficult. It is not possible to test the hypothesis using the likelihood ratio test because the two likelihoods are not comparable. The conditional likelihood is based on a restricted data set that excludes individuals for which y it is the same in every period. Since the individual effects are not estimated, none of the usual tests of restrictions can be used. Hausman s test is a natural one to use here. A large value will indicate doubts on the hypothesis of homogeneity. But is not definitive. It seems that there is no conclusive test for fixed effect versus no effects Phd in Transportation / Transport Demand Modelling 25/26

Testing for fixed or random effects Variable addition test If the random effects model is appropriate, then the coefficients on the group means should be zero If θ 0, this casts doubt on the random effects model, which suggests the fixed-effects model as a preferable alternative. The Wald and likelihood ratio tests should be usable. The effects in the fixed effects model are projected on the means of the (time varying) regressors Proposed middle ground between fixed and random effects Phd in Transportation / Transport Demand Modelling 26/26