Econometrics II Multinomial Choice Models

LV MNC MRM MNLC IIA Int Est Tests End Econometrics II Multinomial Choice Models Paul Kattuman Cambridge Judge Business School February 9, 2018

LV MNC MRM MNLC IIA Int Est Tests End LW LW2 LV LV3 Last Week: Binary Choice Models Recap Sample of i.i.d. observations i = 1,, n of the dependent binary variable Y i and explanatory variables X 1i,, X Ki Prob. that dep. variable takes value 1 modelled as P(Y i = 1 X 1i,, X Ki ) = g 1 (Z i ) = g 1 (β 0 + β 1 X 1i + + β K X Ki ) β0 to β K : the K + 1 slope parameters Zi = β 0 + β 1 X 1i + + β K X Ki : the linear predictor The function g 1 maps the linear predictor into [0, 1] and satisfies, generally: g 1 ( ) = 0; g 1 ( ) = 1 and g 1 (Z ) Z > 0

LV MNC MRM MNLC IIA Int Est Tests End LW LW2 LV LV3 Last Week: Binary Choice Models Recap Logit model when the link function g 1 is the logistic distribution CDF. Response probabilities are: E(Y i ) = P(Y i = 1 X 1i,, X Ki ) = ex i β 1 + e X i β = 1 1 + e X i β Probit model when the link function g 1 is the CDF of the N(0, 1) E(Y ) = P(Y i = 1 X 1i,, X Ki ) = Φ(X i β) = Xi β 1 2π e t2 /2 dt

LV MNC MRM MNLC IIA Int Est Tests End LW LW2 LV LV3 Latent Variable interpretation of binary choice Random Utility interpretation Probit, Logit similar Utility (U) difference for i between Y i = 1 and Y i = 0: U 1i U 0i = Y i = β 0 + β 1 X 1i + + β K X Ki + u i, E(u i ) = 0 Utility maximisation: Y i = 1 only if Y i > 0, Y i = 0 otherwise. Only choice Y i observed. If obs. are i.i.d.; explanatory variables exogenous; error u i X i N(0, σu) 2 (Normal and Homoskedastic), then Probability that i chooses Y i = 1 is: ( β0 P(Y i = 1 X 1i,, X Ki ) = Φ + β 1 X 1i + + β ) K X Ki σ u σ u σ u The Probit model: σ u set to unity (hence assumption of SND for u). Ratio β j /σ u estimated

LV MNC MRM MNLC IIA Int Est Tests End LW LW2 LV LV3 Latent Variable Model / Random Utility model

LV MNC MRM MNLC IIA Int Est Tests End MNC MRM Multinomial Choice Models Introduction Choice among several (> 2) discrete alternatives Assuming Independence of Irrelevant Alternatives (IIA) Examples: Transportation : Car, bus, railroad, bicycle (unordered choice) Voting : Con, Lab, Libdem (unordered choice) Market survey : Product Bad, Fair, Good, Excellent (ordered) Rating of bonds : B, B++, A, A++ (ordered) Dep. var Y i {0, 1,, J}: individual choice between J + 1 numbered, unordered options Theory and intuition similar to binary choice models: multinomial choice multiple binary choices Multinomial Logit and Probit, Conditional Logit, Mixed Logit

LV MNC MRM MNLC IIA Int Est Tests End MNC MRM Multinomial Choice (among unordered alternatives) Types of data Chooser: individual who chooses among several alternatives Choice: alternatives / options faced by the chooser Three types of models: Models for choice based on chooser-specific data How do the chooser characteristics affect choice among alternatives? regressors vary across choosers (individuals) Models for choice based on choice-specific data How do the characteristics or features of various alternatives affect individuals choice among them? regressors vary across alternatives Models for choice based on chooser-specific and choice-specific data

LV MNC MRM MNLC IIA Int Est Tests End MRM1 MRM1 MRM2 MRM3 MRM4 MRM5D Random Utility / Latent Variable Framework with Multiple Choices Individuals make choices yielding highest utility U ji : utility of i when choosing option j subscripts j = 0,, J index choices subscripts i = 1,, n index individuals (choosers) Random Utility formulation: U ji = V ji (observable utility) + u ji (random utility) Observable utility (V ji ) related to: observed explanatory variables pertaining to i: V ji = X i β j through choice specific parameters pertaining to j Random utility (u ji ) captures how utility of choice j varies across individuals i as a random variable

LV MNC MRM MNLC IIA Int Est Tests End MRM1 MRM1 MRM2 MRM3 MRM4 MRM5D Random Utility / Latent Variable Framework with Multiple Choices Individuals make choices yielding highest utility Utility maximisation: i chooses j if U ji U si 0 s j Idea: Compare Utility of every option to the benchmark, allowing for the random utility component Any one option can be chosen as the benchmark (option 0)

LV MNC MRM MNLC IIA Int Est Tests End MRM1 MRM1 MRM2 MRM3 MRM4 MRM5D Random Utility / Latent Variable Framework with Multiple Choices Basis for the observed choice of chooser i (Y i = j): Yji = U ji U 0i = (V ji + u ji ) (V 0i + u 0i ) : unobserved; latent variable Y i = 0 : if Yji < 0 for all j = 1,, J, 1 : if Y1i Yji for all j = 1,, J, 2 : if Y2i Yji for all j = 1,, J,, J : if Y Ji Y ji for all j = 1,, J Underlying Model for Utility U: observable and unobservable components U ji = β j0 + β j1 X 1ji + β j2 X 2ji + + β jk X K ji }{{} V ji +u ji

LV MNC MRM MNLC IIA Int Est Tests End MRM1 MRM1 MRM2 MRM3 MRM4 MRM5D Random Utility / Latent Variable Framework with Multiple Choices: Multinomial Logit formualtion E.g.: What is the probability that option 1 is chosen? Pr(Y i = 1) = Pr(Y 1i > Y ji j = 2,..., J) = Pr(Y 1i Y ji > 0 j = 2,..., J) = Pr(β 1 X i + u 1i (β j X i + u ji ) > 0 j = 2,..., J) = Pr((β 1 β j )X i > u ji u 1i j = 2,..., J) Under the assumption that the random utility (u.. ) independently and identically follows an extreme value distribution [CDF: F(u ji ) = e e u ji ] the difference of two random variables (e.g., u ji u 1i ) logistic distribution (leading to a logit model)

LV MNC MRM MNLC IIA Int Est Tests End MRM1 MRM1 MRM2 MRM3 MRM4 MRM5D Random Utility / Latent Variable Framework with Multiple Choices U ji = β j0 + β j1 X 1ji + β j2 X 2ji + + β jk X K ji + u ji U ji unobserved: cannot estimate J + 1 equations But choice Y i = j arises out of utility comparisons i makes: U ji with U si for all s options Can think of comparing J options with a common benchmark to obtain model for: P(Y i =j) P(Y i =0) in terms of X i J 1,, J equations: Different coefficients sets βj0,, β jk in the J equations βjk indexed by both choice j and regressor k choice attributes (e.g. prices) can matter for choice chooser attributes (e.g. age) can matter for choice Ideally explanatory variables for multinomial choice should include attributes of choosers and/or of choices Difficulty in including both types of regressors (eg., age of chooser and price of brand) in the same model

LV MNC MRM MNLC IIA Int Est Tests End MRM1 MRM1 MRM2 MRM3 MRM4 MRM5D Chooser specific / Choice specific Explanatory Variables MN Logit/Probit; and Conditional Logit models Eg: Travellers choose among modes: car, air, train, bus MN Logit/MN Probit models: regressors are chooser attributes Explanatory variables only on attributes of traveller: income, age not on attributes of travel modes Conditional Logit model: regressors are choice attributes Explanatory variables only on attributes of travel mode: travel cost per mile, say not on attributes of travellers Mixed Nominal Model: for choice based on chooser-specific and choice-specific data

LV MNC MRM MNLC IIA Int Est Tests End MRM6 MNLM2 MNLM2B MNLM3 MRM5A MRM5B MN Multinomial Logit and Probit Models Recap: Equation underlying Multinomial Logit and Probit Models U ji = β j0 + β j1 X 1i + β j2 X 2i + + β jk X K i + u ji Explanatory variables Xk i describe only the chooser attributes, not attributes of the choices Parameters βjk differ across choices (j = 1,, J) MN Logit and MN Probit arise out of different assumptions about the error (random utility, u ji ) MN Logit: the random utility independently and identically follows an extreme value distribution (CDF: F(u ij ) = e e u ij. difference of two error terms logistic distribution (as in logit)) MN Probit: the random utilities (not necessarily independent) follow the Multivariate Normal distribution

LV MNC MRM MNLC IIA Int Est Tests End MRM6 MNLM2 MNLM2B MNLM3 MRM5A MRM5B MN Multinomial Logit Model Probability that chooser i opts for choice j Model: Log-odds ratio linear in X i β j (as in binary logit): ) log = X i β j Probabilities add up to 1: Substitute for Pr(Y i = j): ( Pr(Yi =j) Pr(Y i =0) Pr(Y i = j) Pr(Y i = 0) = ex i β j Pr(Y i = j) = Pr(Y i = 0) e X i β j Pr(Y i = 0) + J s=1 Pr(Y i = s) = 1 Pr(Y i = 0) + J s=1 Pr(Y i = 0) e X i β s = 1

LV MNC MRM MNLC IIA Int Est Tests End MRM6 MNLM2 MNLM2B MNLM3 MRM5A MRM5B MN Multinomial Logit Model Pr(Y i = 0) + J s=1 Pr(Y i = 0) e X i β s = 1 ( Pr(Y i = 0) 1 + ) J s=1 ex i β s = 1 Pr(Y i = 0) = 1 1 + J s=1 ex i β s Probability of individual i making choice j ( 0) is: Pr(Y i = j X i ) = e X i β j 1 + J s=1 ex i β s Parameter vector βj are not uniquely identified; and a vector added to all the vectors (β j + q) cancels in choice probabilities β j is identified by setting β i = 0 for benchmark/reference choice

LV MNC MRM MNLC IIA Int Est Tests End MRM6 MNLM2 MNLM2B MNLM3 MRM5A MRM5B MN Multinomial Logit Model Note: Sum of probabilities = 1 the J + 1 choice probabilities (for each i) [Pr(Y i = 0), Pr(Y i = 1),, Pr(Y i = J)] are not independent of each other Hence one choice (labelled 0) is set as base / reference Yji = U ji U 0i Uji = β j0 + β j1 X 1i + β j2 X 2i + + β jk X K i + u ji U0i = β 00 + β 01 X 1i + β 02 X 2i + + β 0K X K i + u 0i For the reference choice 0, we set β0k = 0 for all k = 0, 1,, K Pr(Y i = 0) = e X i 0 1+ J = 1 s=1 ex i β s 1+ J s=1 ex i β s

LV MNC MRM MNLC IIA Int Est Tests End MRM6 MNLM2 MNLM2B MNLM3 MRM5A MRM5B MN Multinomial Logit model example Type of college = f(chooser characteristics) Gujarati Econometrics by Example, Chapter 9 Figure: Choice options: no college (reference: j=1), 2 year college (j=2), 4 year college (j=3). Explanatory variables are all chooser characteristics. Coefficients apply to log odds for choices 2 and 3 over choice 1

LV MNC MRM MNLC IIA Int Est Tests End MRM6 MNLM2 MNLM2B MNLM3 MRM5A MRM5B MN Multinomial Logit model example Gujarati Econometrics by Example, Chapter 9

LV MNC MRM MNLC IIA Int Est Tests End MRM6 MNLM2 MNLM2B MNLM3 MRM5A MRM5B MN Pros and Cons of Multinomial Logit Recall Random utility of choice j (u j. ) is different in each of J equations MNL model assumes errors in the different equations are uncorrelated with one another: a strong assumption which implies the assumption: choice probabilities satisfy Independence of Irrelevant Alternatives (IIA) property

LV MNC MRM MNLC IIA Int Est Tests End MRM6 MNLM2 MNLM2B MNLM3 MRM5A MRM5B MN Pros and Cons of Multinomial Logit Implication of IIA in Multinomial Logit Consider ratio of probabilities of 2 choices j, and k: Pr(Y i = j X i ) Pr(Y i = k X i ) = e X i β j 1+ J s=1 ex i β s e X i β k 1+ J s=1 ex i β s = ex i β j e X i β k = e X i (β j β k ) Implying: when choosing between the two options, other options (other than j and k) do not matter Pr(Y i = j X i ) Pr(Y i = 0 X i ) = ex i βj e X i 0 = ex i β j

LV MNC MRM MNLC IIA Int Est Tests End IIA1 IIA2 IIA3 IIA4 MNLM3 Independence of Irrelevant Alternatives Transportation Example Suppose: initially, a commuter has choice between car (Y = 0) or public transport (Y = 1) IIA property relates to odds ratio: Pr(Y =1) Pr(Y =0) IIA This odds ratio will be the same, regardless of what the other options are Pr(Y =1) e.g., suppose, initially, Pr(Y =0) = 1 Commuter equally likely to take the car or public transport Must be Pr(Y = 1) = Pr(Y = 0) = 0.5

LV MNC MRM MNLC IIA Int Est Tests End IIA1 IIA2 IIA3 IIA4 MNLM3 Independence of Irrelevant Alternatives Example where IIA is satisfied Now suppose a bicycle lane constructed: commuters can cycle to work (Y = 2 an option) IIA : Addition of new alternative does not alter the fact that Pr(Y =1) Pr(Y =0) = 1 Reasonable? Possibly yes, in this case e.g., if 20% of commuters start cycling, and these cyclists drawn equally from other two options Could end up with Pr(Y = 2) = 0.2 and Pr(Y = 1) = Pr(Y = 0) = 0.40 This still implies Pr(Y =1) Pr(Y =0) = 1: IIA satisfied

LV MNC MRM MNLC IIA Int Est Tests End IIA1 IIA2 IIA3 IIA4 MNLM3 Red Bus-Blue Bus Problem Example where IIA is not satisfied Original choice is between car (Y = 0) or a Red Bus (Y = 1) Pr(Y =1) Suppose Pr(Y = 0) = Pr(Y = 1) = 1/2, thus Pr(Y =0) = 1: equally likely to take car or bus Company paints half the buses Blue: a new option Blue Bus virtually identical to Red. The new option leaves you just as likely choose car (choice between Red bus and Car not independent of introduction of Blue bus) Implying Pr(Y = 0) = 0.5 and Pr(Y = 1) = Pr(Y = 2) = 0.25 Blue Bus option implies by new option Violates IIA; Not allowed for by MN Logit Pr(Y =1) Pr(Y =0) =.5: Odds ratio changed

LV MNC MRM MNLC IIA Int Est Tests End IIA1 IIA2 IIA3 IIA4 MNLM3 Is IIA reasonable? Depends on your empirical application. Sometimes it is reasonable, but other times not There are extensions of multinomial logit models (not covered in this course) that do not have restrictive IIA property Nested logit model and mixed logit model All alternatives are computationally more demanding

LV MNC MRM MNLC IIA Int Est Tests End IIA1 IIA2 IIA3 IIA4 MNLM3 Multinomial Probit Models Pros and Cons These errors in different equations could be correlated with one another (some choices are more similar than others?) e.g., If choice between 3 options, there are two regressions involving utility differences Y1i and Y2i Multivariate Normal distribution allows for corr(u 1i, u 2i ) 0 Multivariate Normal distribution leads to Multinomial Probit, which is flexible But number of correlations = J (J 1)/2: may grow large MN Probit is computation intensive in estimating all correlations: typically used only if number of options relatively small (say, 3) We do not pursue MN probit in this course

LV MNC MRM MNLC IIA Int Est Tests End I0 I0 I1 I2 Multinomial Logit Model: Interpretation Relative risk = Odds ratio = (Prob. choice j / Prob. of reference choice) RR = Pr(Y = j X + i) Pr(Y = 0 X i ) = e X i β j ( ) Pr(Y = j X log i ) = X i β Pr(Y = 0 X i ) j log(pr j / Pr 0 ) = β jk X k A +ve and significant coefficient β jk the relative prob. of choice j over benchmark increases with the variable k Comparing choice j over choice l: log(pr j / Pr l ) X k = β jk β lk

LV MNC MRM MNLC IIA Int Est Tests End I0 I0 I1 I2 Multinomial Logit Model: Interpretation Relative Risk Ratio (RRR) How is the Relative Risk of j affected by unit change in X k? RRR j w.r.t. regressor X k : Ratio of Prob of j X k = x k + 1; to Prob of j X k = x k RRR j wrt X k = Pr(Y =j X 1,,X k +1,,X K ) Pr(Y =0 X 1,, X k +1,, X K ) Pr(Y =j X 1,,X k,, X K ) Pr(Y =0 X 1,,X k,,x K ) = e(β j0+β j1 X 1 + +β jk (X k +1)+ +β jk X K ) e (β j0+β j1 X 1 + +β jk (X k )+ +β jk X K ) = e β jk Exponentiated coefficients are the RRRs for the corresponding regressors

LV MNC MRM MNLC IIA Int Est Tests End I0 I0 I1 I2 Multinomial Logit Model: Interpretation Marginal Effect Marginal effect of an regressor X k on the probability of choice j: Pr(Y i = j) X k = Pr ij X k = ˆPr ij ( ˆβ jk J h=1 ˆPr ih ˆβ hk ) Magnitude depends not only on the parameter β jk but also on other parameters

LV MNC MRM MNLC IIA Int Est Tests End I0 I0 I1 I2 Multinomial Logit Model: Interpretation Predicted probability of choice Can calculate predicted probability of choice of each option for each chooser / type of chooser ˆPr(Yi = j) for i = 1,, N and j = 0,, J ˆPr(Y i = j X i ) = e X i ˆβ j 1 + J s=1 ex i ˆβ s

LV MNC MRM MNLC IIA Int Est Tests End E1 Multinomial Logit Model: Estimation Estimation using Maximum Likelihood The log likelihood function: LogL = N J i=1 j=1 d ij log Pr ij where: d ij = 1 if i chooses option j; d ij = 0 otherwise and Pr(Y ij ) = e X i β j 1+ J s=1 ex i β s The ML estimator of β is consistent, asymptotically efficient and normally distributed

LV MNC MRM MNLC IIA Int Est Tests End E1 Multinomial Logit Model: Tests MLE based tests (In the Lab) Model as a whole is significant A variable has no effect Two choices can be combined

LV MNC MRM MNLC IIA Int Est Tests End R References J. Scott Long (1997) Regression Models for Categorical and Limited Dependent Variables, Sage Optional: Gujarati (2014) Econometrics by Example, Chapter 9 (Basic) Wooldridge (2010) Econometric Analysis of Cross Section and Panel Data, Chapter 15 Greene (2003) Econometric Analysis, Chapter 21 Paper Bruno Cassiman and Reinhilde Veugelers (2006) In Search of Complementarity in Innovation Strategy: Internal R&D and External Knowledge Acquisition, Management Science, Vol. 52, No. 1, pp. 68-82