Part I: Discrete Choice Models (Theory and Applications)

Size: px

Start display at page:

Download "Part I: Discrete Choice Models (Theory and Applications)"

Martha Carter
5 years ago
Views:

1 Part I: Discrete Choice Models (Theory and Applications) Mauricio Sarrias Universidad Católica del Norte Workshop SOCHER 2017 Fondecyt Project N , Individual-specific inference for choice models October 5, 2017

2 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

3 Outline of this workshop Part I: Discrete choice models (Theory and Applications): Binary outcomes. Ordered outcomes. Multinomial Logit Model. Part II: Discrete choice models and individual-heterogeneity (Theory and Applications): Binary outcomes. Multinomial Logit Model.

4 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

5 Binary Dependent Variable We will study method for estimating model with binary dependent variable: { 1 if some event occurs y i = 0 if the even does not occurs Some examples: Is an adult a member of the labor force? Did a citizen vote in the last election? Does a high school student decide to go to college? Is a consumer more likely to buy the same brand or try a new brand? Does the individual migrate?

6 Binary Dependent Variable So, the question is: How to estimate a model then the dependent variable is binary? The first approach is to apply OLS as if the dependent variable is continuous.

7 What if OLS...? The linear probability model is the regression model applied to a binary dependent variable. The structural model is: where y i = y i = x iβ + ɛ i, { 1 if some event occurs 0 if the even does not occurs When y is a binary random variable, then: E (y i x i ) = [1 Pr (y i = 1 x i )]+[0 Pr (y i = 0 x i )] = Pr (y i = 1 x i ) = x iβ

8 Linear Probability Model For a Single Independent Variable

9 Problems with the LPM Heterokedasticity: If a binary random variable has mean µ, then its variance is µ(1 µ). Then: Var (y i x i ) = Pr (y i = 1 x i ) [1 Pr (y i = 1 x i )] = x iβ (1 x iβ), which implies that the variance of the errors depend on the x s and is not constant. Nonsensical Predictions: The LPM predicts values of y that are negative or greater than 1. Functional Form: Since the model is linear, a unit increase in x k results in change of β k in the probability of an event. The increase is the same regardless of the current value of x.

10 Problems with the LPM More on Functional Form Consider the following two specifications: where d i is a dummy variable. y i = α + βx i + δd i + ɛ i y i = F (α + βx i + δd i ) The discrete change in y as d changes from 0 to 1, holding x constant as: y d = (α + βx i + δ 1) (α + βx i + δ 0) = δ For our second function, the discrete change is?

11 Problems with the LPM More on Functional Form

12 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

13 WLS Since we know the exact form of the heterokedasticity function we can use Weighted-Least-Square (WLS) In this case: E ( ɛ 2 i X ) = Var (ɛ i X) = σ 2 0h i (X) (1) where h i (X) = x i β ( 1 x i β ) (2) Then we can estimate the WLS estimator: [ n ] 1 [ n ] β W LS = w i x i x i w i x i y i, w i = 1/h i (3) i=1 i=1

14 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

15 Determinants of Personal Computer Ownership Consider the following model: where: PC i = β 0 + β 1 hsgpa i + β 2 ACT i + β 3 parcoll i + ɛ i (4) PC: binary indicator equal to unity if the student owns a computer, zero otherwise. hsgpa: High school GPA. ACT: is achievement test score. parcoll: binary indicator equal to unity if at least one parent attended college. 1 We use data in GPA1.dta to estimate the probability of owning a computer. 1 Separate college indicators for the mother and father do not yield individually significant results, as these are pretty highly correlated.

16 * Open Data cd "/Users/mauriciosarrias/Documents/Clases/Discrete Choice Mode use "GPA1.DTA", clear * Gen parcoll quietly{ gen parcoll = 0 replace parcoll =1 if fathcoll == 1 mothcoll == 1 reg PC hsgpa ACT parcoll } * Plot Predicted Values quietly margins, at(hsgpa = (16(0.5)33)) atmeans marginsplot, yline(0, lcolor(red)) yline(1, lcolor(red))

18 * Predicted value qui reg PC hsgpa ACT parcoll predict yhat, xb sum yhat Variable Obs Mean Std. Dev. Min M yhat * Weights gen h_hat = yhat * (1 - yhat)

19 * Models quietly eststo ols : reg PC hsgpa ACT parcoll quietly eststo olsr: reg PC hsgpa ACT parcoll, robust quietly eststo wls : reg PC hsgpa ACT parcoll [aweight = 1/ h_hat] esttab ols olsr wls, b se /// mtitle("ols" "OLS Rob" "WLS") (1) (2) (3) OLS OLS Rob WLS hsgpa (0.137) (0.141) (0.130) ACT (0.0155) (0.0161) (0.0155) parcoll 0.221* 0.221* 0.215* (0.0930) (0.0880) (0.0863) _cons (0.491) (0.496) (0.477) N Standard errors in parentheses

20 WLS Final Remarks WLS and Robust Standard Errors help us to lead with Heterokedasticity.... but it does not solve the problems related to interpretation.

21 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

22 Latent Variable Approach The latent y is assumed to be linearly related to the observed x s through the structural model: y i = x iβ + ɛ i The latent variable y is linked to the observed binary variable y by the measurement equation: { 1, if yi y i = > τ 0, if yi τ (5)

23 Latent Variable Approach

24 Latent Variable Approach Pr (y i = 1 x i ) = Pr (y i > 0 x i ) = Pr (x iβ + ɛ i > 0 x i ) = Pr (ɛ i > x iβ x i ) = 1 Pr (ɛ i x iβ x i ) Pr(X > x) = 1 Pr(X x) = Pr (ɛ i x iβ x i ) by symmetry = F (x iβ) = x i β f(ɛ i )dɛ i As usually we assume that E (ε X) = 0

25 Latent Variable Approach

26 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

27 Probit Error term distributed as Normal When ε is normal with E (ε X) = 0 and Var (ε X) = 1, the pdf is φ(ε) = 1 ) exp ( ε2 2π 2 and the cumulative distribution function (cdf) is: ɛ ( ) 1 Φ(ε) = exp t2 dt 2π 2

28 Logit Error term distributed as Logistic In the logistic model, the errors are assumed to have a standard logistic distribution with mean 0 and variance π 2 /3. This unusual variance is chosen because it results in a particularly simple equation for the pdf: λ(ɛ) = exp (ɛ) [1 + exp (ɛ)] 2 and an even simpler equation for the cdf: Λ(ɛ) = exp (ɛ) 1 + exp(ɛ)

29 Normal and Logistic Distribution

30 Probit If the ɛ i s are independently and normally distributed, ɛ i N(0, σ 2 ), then ( ɛi Pr (y i = 1 x i ) = Pr σ > x i β ) σ x i ( = 1 Φ x i β ) σ ( ) x = Φ i β by symmetry of standard normal distribution σ The probability depends on β and σ, but only the ration β/σ is identified, but not the single parameters β and σ. For instance, if β and σ each are multiplied by a constant c, then the probability remains unchanged. Typically, we let σ = 1 as normalization.

31 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

32 ML Estimation The outcome is Bernoulli distributed, the binomial distribution with just one trial. A very convenient compact notation for the density of y i, or more formally its probability mass function, is: f (y i x i ) = P yi i (1 P i ) 1 yi, y i = 1, 0 where P i = F (x i β). This yields probabilities P i and (1 P i ) since f(1) = P 1 (1 P ) 0 = P and f(0) = P 0 (1 P ) 1 = 1 P. Assuming that each probability is independent of that other, the joint probability function (or Likelihood function) is: Pr (Y 1 = y 1, Y 2 = y 2,..., Y n = y n X) = N f (y i x i ) The likelihood function for a sample of n obvervations can be written as n L β y, X = [F (x iβ)] yi [1 F (x iβ)] 1 yi }{{} data i=1 i=1

33 Log-Likelihood Function Taking logs, we obtain the Log-Likelihood function, which must be maximized ( n ) log L (β data) = log [F (x iβ)] yi [1 F (x iβ)] 1 yi Useful Trick = = n i=1 i=1 { log ( [F (x iβ)] yi) )} + log ([1 F (x iβ)] 1 yi n {y i log [F (x iβ)] + (1 y i ) log [1 F (x iβ)]} i=1 If the distribution is symmetric, as the normal and logistic, then 1 F (x i β) = F ( x i β). Let q i = 2y i 1. Then: log L (β data) = n log F (qi x iβ)

34 Asymptotic Distribution of Probit and Logit Model Recall that n ( β β0 ) d N(0, I 1 ) where I is the sample Fisher Information defined as: [ ] 2 I = E β β log f(y i x i ; β) Then, an estimator of the expected value of the Hessian (or information matrix), is given by: [ 2 ] log L I(β) = E β β = 1 n f 2 (x i β) n F (x i β) [1 F (x i β)]x ix i i=1 We can also use the following estimators: ( n i=1 H(w i, β) ) 1 or the outer product of the gradients. (6)

35 What about small samples? Griffiths et al. (1987) We have three estimators. NR: inverse of the negative of the Hessian matrix from the log-likelihood function. Scoring: inverse of the information matrix is used. BHHH: the inverse of the outer product of the first derivatives of the log-likelihood function. They are asymptotically equivalent, but their performance can vary in finite samples. They find that, on average, the Hessian matrix and the information matrix give almost identical results and lead to more accurate estimates of the asymptotic covariance matrix than does the estimator based on first derivatives.

36 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

37 Marginal Effects Recall that: E (y i x i ) = [1 Pr (y i = 1 x i )]+[0 Pr (y i = 0 x i )] = Pr (y i = 1 x i ) = F (x The marginal effect is given by: E (y i x i ) = x }{{ i } (K 1) [ ] df (x i β) d(x i β) β = f(x iβ) where f( ) is the probability density function. So: }{{} (1 1) β }{{} (K 1) Probit = E (y i x i ) x i = φ(x iβ i )β Logit = E (y i x i ) x i = Λ(x iβ) [1 Λ(x iβ)] β

38 Marginal Effects Comments Some aspects are important to note: MEs will vary with the values of x. Current practices: Calculate MEs at means of the variables. Calculate MEs at specific values. Evaluate MEs at every observation and use the sample average of the individual MEs.

39 Marginal Effects Dummy variable The appropriate ME for a binary independent variable, say, d, would be: ME = [ Pr ( y i = 1 x (d), d i = 1 )] [ Pr ( y i = 1 x (d), d i = 0 )] where x (d) denotes the means of all the other variables in the model.

40 Average Partial Effects Current practice: average partial effects. The quantity of interest is: [ ] E(y x) AP E = E x x In practical terms, this suggests the computation: ÂP E = γ = 1 n n i=1 f(x i β) β.

41 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

42 Goodness-of-Fit A number of suggestions have been made for how to evaluate the overall quality of a binary response model. First approach: to mimic the R 2 measure. Assess the predictive performance of the model R 2 are not directly applicable in non-linear models such as binary response models, since we do not have a proper variance decomposition result The so-called pseudo R 2 measures have been suggested.

43 Goodness-of-Fit Let: log L( β r ): the value of the maximized log-likelihood function in the constant-only model. log L( β u ): is the maximized log-likelihood value in the full model. Note that the value of the log-likelihood function is always negative, so: log L( β u ) log L( β r ) = log L( β u ) log L( β r ) So that: 0 1 log L( β u ) log L( β r ) = R2 McFadden 1 The McFadden R 2 will be zero if the full model has no explanatory power.

44 Information Measures Definition (Akaike s Information Criterion (AIC)) Akaike s (1973) information criterion is defined as AIC = 2 log L + 2K n (7) where log L is the likelihood of the model and K is the number of parameters in the model. Larger values of log L indicates a better fit. 2 log L ranges from to 0 + with smaller values indicating a better fit. As K increases, 2 log L becomes smaller since more parameters make what is observed more likely. 2K is added as a penalty. All else being equal, smaller values suggest a better fitting model. Use to compare models across different samples or to compare nonested models.

45 Information Measures Definition (Bayes Information Criterion (BIC)) BIC information criterion is defined as BIC = 2 log L + K log n n (8) where log L is the likelihood of the model and K is the number of parameters in the model and n is the number of individuals. It is possible to increase the likelihood by adding parameters, but doing so may result in overfitting. Both BIC and AIC resolve this problem by introducing a penalty term for the number of parameters in the model; the penalty term is larger in BIC than in AIC.

46 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

47 Stata Example Open BinaryStata.do

48 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

49 Ordinal Outcomes When a variable is ordinal, its categories can be ranked from low to high, but the distances between adjacent categories are unknown. Examples: Kind of Job: low skilled, medium skilled, and highly skilled. Subjective Health Status: very bad, bad, good, very good. Likert Scales: strongly agree, agree, have no opinion, disagree, or strongly disagree with a statement.

50 Ordinal Outcomes Researchers often treat ordinal dependent variables as if they were interval. The dependent categories are numbered sequantially and the LRM is used. This involves the implicit assumption that the intervals between adjacent categories are equal. For example, the distance between strongly agreeing and agreeing is assumed to be the same as the distance between agreeing and being neutral on a Likert Scale. This is not correct. See for example MacKelvey and Zavoina (1975) and Winship and Mare (1984).

51 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

52 Latent Approach Consider the latent variable yi the following structural model ranging from to, determind by y i = x iβ + ɛ i (9) However, we do not observe yi. We only observed a discrete variable y i, which is though of as providing incomplete information about and underlying yi according to the following rule: 1 if yi < κ 1 y i = 2 if κ 1 < yi < κ 2 (10) 3 if κ 2 < yi or more general mechanism that accounts for the ordering nature of yi is y i = j κ j 1 < y i κ j j = 1,..., J (11) The κ s are called thresholds or cutpoints. The extreme categories 1 and J are defined by open-ended intervals with κ 0 = and κ J =. When J = 2, we have the BRM.

53 Latent Approach Consider the following question: How do you grade your actual health status? The answer are: 1 = Poor, 2 = Fair, 3 = Good, and 4 = Very Good. Assume that this ordinal variable is related to a continuous, latent variable y that indicates and individual s underlying health status. The observed y is related to y according to: 1 = Poor if κ 0 = yi < κ 1 2 = Fair if κ 1 yi y i = < κ 2 3 = Good if κ 2 yi < κ 3 4 = Very Good if κ 3 yi < κ 4 = (12)

55 The probabilities First, consider the probability that y = 1. This implies that: Pr(y i = 1 x i ) = Pr(κ 0 yi < κ 1 x i ) = Pr(κ 0 x iβ + ɛ i < κ 1 x i ) = Pr(κ 0 x iβ ɛ i < κ 1 x iβ x i ) = Pr(ɛ i < κ 1 x iβ x i ) Pr(ɛ i < κ 0 x iβ x i ) = F (κ 1 x iβ) F (κ 0 x iβ) (13) So, generalizing we obtain: Note that: and Pr(y i = j x i ) = F (κ j x iβ) F (κ j 1 x iβ) (14) F (κ 0 x iβ) = F ( x iβ) = 0 (15) F (κ J x iβ) = F ( x iβ) = 1 (16)

56 Ordered Probit Model In the ordered probit model, we assume that the error term follow a standard normal distribution, F (ε) = Φ(ε). In this case, the probabilities are: ( κj x i Pr(y i = j x i ) = Φ β ) ( κj 1 x i Φ β ) σ ɛ σ ɛ (17) For identification we need σ ɛ

57 Ordered Probit Model

58 Ordered Logist Model If ɛ i iid Logistically distributed, then ( κj x i Pr(y i = j x i ) = Λ β ) ( κj 1 x i Λ β ) where σ ɛ = π 2 /3 σ ɛ σ ɛ (18)

59 Identification Since y is latent, its means and variance cannot be estimated. For logit Var(ε x) = π 2 /3. For Probit Var(ε x) = 1 Can we estimate a constant is this model?

60 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

61 Estimation The conditional probabilities are: f(y i x i ; β, κ 1,..., κ J 1 ) = P di1 i1...p d ij ij = J j=1 P dij ij (19) where d ij is defined as a binary indicator equal to one if y i = j and 0 otherwise. For a sample of n independent pairs of obserations (y i, x i ), the likelihood function is given by L(θ; y, X) = n J i=1 j=1 P dij ij (20)

62 Estimation Taking logarithms converts products into sums and we can write the log-likelihood funcition as log L(θ; y, X) = n i=1 j=1 J d ij P ij (21) ML estimation of the parameters yields consistent, asymptotically efficient, and asymptotically normally distributed estimators. We can use LR, Wald, and Score test to test for general restrictions, and the invariance property together with the Delta Method for estimation and inference of predicted probabilities or marginal probability effects.

63 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

64 Compesating Variation Definition CV The variation in two regressors such that the latent variable does not change. Let x il denote the l-element in x i and β l the corresponding parameter, and let m index the m-th elements in both vectors x i and β, respectively. Now consider a change in x il and x im at the same time, such that yi = 0 (and therefore all probabilities are unchanged). This requires: 0 = β l x il + β m x im or x il x im = β m β l (22) If, for example, x il is logarithmic income and x im is a dummy variable indicating unemployment, then the above ratio gives a trade-off ration: The relative increase in income required to compensate for the negative effect of unemployment

65 Compesating Variation In some applications, it could be interesting to know how much an explanatory varible should change to reach the next higher response category. For this purpose, one could consider the ratio of the interval length to the parameter k j k j 1 β l (23) The smallest this ratio in absolute terms, the smallest the maximum change in x il required to move the response from y i = j to y i = j + 1

66 Discrete Probability Effect The ceteris paribus effect of a discrete change in one explanatory variable, say the l-th element in x i, is simply the difference between the probabilities before and those after the change, given the values of the other variables: P ij = Pr(y i = j x i + x il ) Pr(y i = j x i ) = [F (κ j βx i x il β) F (κ j 1 βx i x il β)] [F (κ j βx i ) F (κ j 1 βx i )] (24)

67 Shift in Density Due to a Change x i > 0 (β > 0)

68 Marginal Effects The marginal probability effect (MPE) is given by: MP E ijl = Pij x il = [ f(k j 1 x iβ) f(k j x iβ) ] β l (25) Neither the signs nor the magnitudes of the coefficients are directly interpretable in the ordered choice models. MPE s are functions of the covariates and therefore vary across individuals. To calculate average marginal probability effects (AMPE s) we have to take expectations with respect to x, which is estimated consistently by replacing β by its ML estimate β and averaging over the sample.

69 Marginal Effects Furthermore, in the computation of the MPE, the only certainties in the signs of the partial effects in this model are as follows, where we consider a variable with a positive coefficient: Increases in that variable will increase the probability in the highest cell and decrease the probability in the lowest cell. The sum of all the changes will be zero the new probabilities must still sum to one. The effects will begin at Pr(y i = 1) with one or more negative values, then change to a set of positives values; there will be one sign change (Single Crossing)

70 Marginal Effects One might also be interested in cummulative values of the partial effects, such as Pr(y j x i ) x il = J [f(k j 1 x iβ) f(k j x iβ)] β l (26) j=1

71 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

72 Stata Example Open Ordered Lab.do

73 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

74 Introduction Type of dependent variable Discrete choices: choie of one among several mutually exclusive alternatives: Commuting mode: train, car, bycicle, walking. Type of car: standard car, hybrid car, electric car. Type of data Revealed preferences: data which means that the data are observed choice of individuals for, say, a transport mode. Stated preferences data: in this case individuals face a virtual situation of choice for example the choice between different types of automated cars.

75 Design of the choice experiment Figure: Sample of a Choice Situation Presented to Respondents

76 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

77 Random Utility Model The utility of a individual: U ij = V ij + ɛ ij where V ij is the deterministic component, and ɛ ij is the random component. This random component is intended to capture the various unpredictable choices people make. Deterministic Part It is usually assumed that is a linear combination of the observed explantory variables: V ij = α j + x ijβ + z iγ j + w ijδ j

78 Random Utility Model Trhee kinds of variables: alternative specific variables x ij with generic coefficient β. individual specific variables z i with an alternative specific coefficients γ j. alternative specific variables w ij with an alternative specific coefficient δ j

79 Random Utility Model Only differences between utility matters. Let two alternatives j and k V ij V ik = (α j α k ) + β(x ij x ik ) + (γ j γ k )z i + (δ j w ij δ k w ik ) Coefficients for individual specific variables (including the intercept) should be alternative specific. Normalization is required γ 1 = 0. Coefficients for alternative specific variables may (or may not) be alternative specific. Cost is alternative specific, but the cost of standard vehicles may not have the same impact on utility that the cost in a electric car.

80 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

81 Probabilities If the error terms are EV type I, then: P ij = evij = e x ij β (27) j evij j ex ij β For a sample of n independent pairs of obserations (y i, x ij ), the log-likelihood function is given by log L(θ; y, X) = y ij log P ij (28) i j where y ij is a dummy variable which equal to 1 if individual i made choice j and 0 otherwise.

82 Marginal effects The marginal effects respect to individual-specific z i or alternative-specific x ij variables are: P ij z i = P ij ( β j l P ij = γp ij (1 P ij ) x ij P ij = γp ij P il x il P il β l )

83 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

84 gmnl package in R

85 Formulas in gmnl package #load packages and data library("gmnl") library("mlogit") data("travelmode", package = "AER") with(travelmode, prop.table(table(mode[choice == "yes"]))) ## ## air train bus car ## head(travelmode) ## individual mode choice wait vcost travel gcost income size ## 1 1 air no ## 2 1 train no ## 3 1 bus no ## 4 1 car yes ## 5 2 air no ## 6 2 train no

86 Formulas in gmnl package Data is in long-format (one row per available mode). We transform the data in mlogit.format # transform data TM <- mlogit.data(travelmode, choice = "choice", shape = "long", alt.levels = c("air", "train", "bus", "car"))

87 Formulas in gmnl package Suppose we want to estimate a multinomial logit model where the variables wait and vcost are alternative-specific variables with a generic coefficient β, income is an individual-specific variable with an alternative specific coefficient γ j ; and the variable travel is alternative-specific variable with an alternative-specific coefficient δ j. f1 <- choice wait + vcost income travel

88 Formulas in gmnl package By default, the alternative-specific constants (ASC) for each alternative are included. They can be omitted by adding +0 or -1 in the second part of the formula. f2 <- choice wait + vcost income + 0 travel f2 <- choice wait + vcost income - 1 travel

89 Formulas in gmnl package Some parts may by omitted when there is no ambiguity. For instance, a model with only individual-specific variables can be specified as follows: f2 <- choice 0 income + size 0 f2 <- choice 0 income + size 1

90 1 Binary outcomes Introduction to Linear Probability Model (LPM) WLS Example Non-Linear Probability Model Probit and Logit Estimation Marginal Effects Goodness-of-Fit Stata Example 2 Ordinal outcomes Motivation Latent Approach ML Estimation Interpretation of Parameters Stata Example 3 Multinomial Logit Model Introduction RUM Probabilities and Estimation Formulas using gmnl package An example

91 Example The dataset Train contains data about a stated preference survey in Netherlands. Users are asked to choose between two trains trips characterized by four attributes: price: the price in cents of guilders, time: travel time in minutes, change: the number of changes, comfort: the class of comfort, 0, 1 or 2; 0 being the most comfortable class. data("train", package = "mlogit") Tr <- mlogit.data(train, shape = "wide", choice = "choice", varying = 4:11, sep = "", alt.levels = c(1, 2), id = "id")

92 Example We first convert price and time in more meaningful unities, hours and euros (1 guilder is euros): Tr$price <- Tr$price / 100 * Tr$time <- Tr$time / 60

93 Example Now we estimate the model: both alternatives being virtual equal (train trips), it is relevant to use only generic coefficients and to remove the ASC mnl <- gmnl(choice price + time + change + comfort -1, data = Tr, model = "mnl")

94 Example summary(mnl) ## ## Model estimated on: Thu Oct 05 18:16: ## ## Call: ## gmnl(formula = choice price + time + change + comfort -1, ## data = Tr, model = "mnl", method = "nr") ## ## Frequencies of categories: ## ## 1 2 ## ## ## The estimation took: 0h:0m:0s ## ## Coefficients: ## Estimate Std. Error z-value Pr(> z ) ## price < 2.2e-16 *** ## time < 2.2e-16 *** ## change e-08 *** ## comfort < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Optimization of log-likelihood by Newton-Raphson maximisation ## Log Likelihood: ## Number of observations: 2929

95 Example Coefficients are not directly interpretable, but dividing them by the price coefficient, we get the WTPs coef(mnl)[-1] / coef(mnl)[1] ## time change comfort ## or we can use wtp.gmnl(mnl, wrt = "price") ## ## Willigness-to-pay respect to: price ## ## Estimate Std. Error t-value Pr(> t ) ## time < 2.2e-16 *** ## change e-09 *** ## comfort < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We obtain the value of 26 euros for an hour of traveling, 5 euros for a change and 14 euros to access a more confortable class.

96 Example Now we use the Fish data set. data("fishing", package = "mlogit") head(fishing, 3) ## mode price.beach price.pier price.boat price.charter catch.beach ## 1 charter ## 2 charter ## 3 boat ## catch.pier catch.boat catch.charter income ## ## ## Fish <- mlogit.data(fishing, shape = "wide", varying = 2:9, choice = "mode") mnl.fish <- gmnl(mode price income catch, data = Fish, model = "mnl")

97 Example summary(mnl.fish) ## ## Model estimated on: Thu Oct 05 18:16: ## ## Call: ## gmnl(formula = mode price income catch, data = Fish, model = "mnl", ## method = "nr") ## ## Frequencies of categories: ## ## beach boat charter pier ## ## ## The estimation took: 0h:0m:1s ## ## Coefficients: ## Estimate Std. Error z-value Pr(> z ) ## boat:(intercept) e e ** ## charter:(intercept) e e e-13 *** ## pier:(intercept) e e *** ## price e e < 2.2e-16 *** ## boat:income e e ## charter:income e e ## pier:income e e ** ## beach:catch e e e-05 *** ## boat:catch e e e-06 *** ## charter:catch e e e-07 *** ## pier:catch e e *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Optimization of log-likelihood by Newton-Raphson maximisation ## Log Likelihood: ## Number of observations: 1182 ## Number of iterations: 6 ## Exit of MLE: successive function values within tolerance limit

98 Example Predicted probabilities for the outcome or for all alternatives if outcome = FALSE: head(fitted(mnl.fish)) ## 1.beach 2.beach 3.beach 4.beach 5.beach 6.beach ## head(fitted(mnl.fish, outcome = FALSE)) ## beach boat charter pier ## [1,] ## [2,] ## [3,] ## [4,] ## [5,] ## [6,]

Econometric Methods for Valuation Analysis

Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 25 Outline We will consider econometric