Lecture 1: Logit. Quantitative Methods for Economic Analysis. Seyed Ali Madani Zadeh and Hosein Joshaghani. Sharif University of Technology

Lecture 1: Logit Quantitative Methods for Economic Analysis Seyed Ali Madani Zadeh and Hosein Joshaghani Sharif University of Technology February 2017 1 / 38

Road map 1. Discrete Choice Models 2. Binary Logit 3. Logit 4. Power and Limitations of Logit Taste Variation Independence from Irrelevant Alternatives Dynamic models 5. Derivatives and Elasticities 6. Estimation: Maximum Likelihood (ML) 7. Review of few case studies Transportation: McFadden et. al. (1977) Health: Education: Labor: Card (2001) JLE, Sicherman (1991) JLE 2 / 38

General Discrete Choice Model Discrete choice models are usually derived under an assumption of utility-maximizing behavior by the decision maker. observable {}}{ un-observable {}}{ Utility = representative utility + error U ni = V ni + ɛ ni Choice probability: P ni = Prob(U ni > U nj j i) = Prob(V ni + ɛ ni > V nj + ɛ nj j i) = Prob(ɛ nj ɛ ni < V ni V nj j i) = I (ɛ nj ɛ ni < V ni V nj j i)f (ɛ)dɛ ɛ where I (.) is an indicator function and ɛ = ɛ nj ɛ ni and f (.) is the pdf of ɛ. 3 / 38

Binary Logit Model Only two choices: i is either 1 or 2. Type I Exterme Value Distribution (Gumble) f (x) = e ɛ ni e e ɛ ni F (x) = e e ɛ ni difference between two extreme value variables is distributed logistic. That is, if ɛ ni and ɛ nj are iid extreme value, then ɛ = ɛ nj ɛ nj follows the logistic distribution: proof: see problem set 2! f ɛ (s) = F ɛ (s) = e s (1 + e s ) 2 es 1 + e s 4 / 38

Binary Logit Model Closed form solution for choice probability: P ni = F ɛ (V ni V nj ) = ev ni V nj 1 + e V ni V nj linear assumption: V ni = X ni β P ni = e(x ni X nj ) β 1 + e (X ni X nj ) β P nj = 1 P ni 5 / 38

Binary Logit Model Two main assumptions: Distributional assumption: error differences distributed logistc. Independence: the unobserved portion of utility for one alternative is unrelated to the unobserved portion of utility for another alternative. The ultimate goal of the researcher is to represent utility so well that the only remaining aspects constitute simply white noise; that is, the goal is to specify utility well enough that a logit model is appropriate. Seen in this way: The logit model is the ideal rather than a restriction. 6 / 38

Binary Logit Model: Misspecification If you think that the unobserved portion of utility is correlated over alternatives given your specification of representative utility, then you have three options: 1. use a different model that allows for correlated errors, such as those described later (Probit, Mixed Logit, GEV) 2. respecify representative utility so that the source of the correlation is captured explicitly and thus the remaining errors are independent, 3. use the logit model under the current specification of representative utility, considering the model to be an approximation. 7 / 38

Binary Logit Model: Improvements Improving bus service in areas where the service is so poor that few travelers take the bus would be less effective, than making the same improvement in areas where bus service is already sufficiently good to induce a moderate share of travelers to choose it (but not so good that nearly everyone does). 8 / 38

Multinomial Logit Model: McFadden (1974) Given independence, for each given ɛ ni, the probability that a decision maker n chooses i is: P ni ɛ ni = Prob(ɛ nj < ɛ ni + V ni V nj j i) = j i e e (ɛ ni +V ni V nj ) notice independence and distributional assumptions. 9 / 38

Multinomial Logit Model: McFadden (1974) Of course, ɛ ni is not given, so the choice probability is ( P ni = e e (ɛ ni +V ni V nj )e ) ɛ ni e e ɛni dɛ ni j i = ev ni j evnj Proof: see problem set 2! 10 / 38

Identification: The Scale Parameter So far, we assumed that ɛ ni have Gumble distribution variance is π2 6. What if in our model ɛ ni has a different variance? say σ 2 π2 6. U ni = V ni + ɛ ni Since the scale of utility is irrelevant to behavior, utility can be divided by σ without changing behavior. choice probabilities are P ni = P ni = e(β /σ) x ni j e(β /σ) x nj the parameter σ is called the scale parameter. 11 / 38

Identification: The Scale Parameter only β σ can be estimated, β and σ are NOT separately identified. The parameters β are estimated, but these estimated parameters are actually estimates of the original coefficients β divided by the scale parameter σ. The coefficients that are estimated indicate the effect of each observed variable relative to the variance of the unobserved factors A larger variance in unobserved factors leads to smaller coefficients, even if the observed factors have the same effect on utility poor specification higher variance of unobserved smaller coefficients 12 / 38

Power and Limitations of Logit 1. systematic taste variation 2. proportional substitution across alternatives IIA assumption 3. dynamic models: unobserved factors have to be independent over time in repeated choice situations. 13 / 38

Taste Variation Logit can represent systematic taste variation: that is, taste variation that relates to observed characteristics of the decision maker larger families prefer larger cars families with more kids, prefer homes close to schools but not random taste variation: differences in tastes that cannot be linked to observed characteristics families with more frequent road trips prefer larger cars more religious families prefer homes closer to mosques. 14 / 38

Taste Variation If taste variation is at least partly random, logit is a misspecification. As an approximation, logit might be able to capture the average tastes fairly well even when tastes are random, since the logit formula seems to be fairly robust to misspecifications. The researcher might therefore choose to use logit even when he knows that tastes have a random component, for the sake of simplicity. 15 / 38

Independence from Irrelevant Alternatives When the attributes of one alternative improve (e.g., its price drops), the probability of its being chosen rises. When a cell-phone manufacturer launches a new product with extra features, the firm is interested in knowing the extent to which the new product will draw customers away from its other cell phones rather than from competitors? phones. The logit model implies a certain pattern of substitution across alternatives. If substitution actually occurs in this way given the researcher s specification of representative utility, then the logit model is appropriate. To allow for more general patterns of substitution and to investigate which pattern is most accurate, more flexible models are needed. 16 / 38

Independence from Irrelevant Alternatives The relative odds of choosing i over k are the same no matter what other alternatives are available or what the attributes of the other alternatives are: j ev nj / P ni = evni P nk V nk / = e V ni V nk = e (X ni X nk ) β j ev nj Since the ratio is independent from alternatives other than i and k, it is said to be independent from irrelevant alternatives. IIA property is realistic in some choice situations red-bus and blue-bus 17 / 38

Panel Data 18 / 38

Derivatives and Elasticities To what extent these probabilities change in response to a change in some observed factors? 1. To what extent will the probability of choosing a given car increase if the vehicle s fuel efficiency is improved? 2. To what extent will the probability of households choosing, say, a Toyota decrease if the fuel efficiency of a Honda improves? P ni z ni = V ni z ni P ni (1 P ni ) proof: see problem set 2! what is this partial derivative, if utility is linear in z ni? This derivative maximizes when P ni = 1/2. Interpret. 19 / 38

Derivatives and Elasticities Cross elasticities: P ni z nj = V nj z nj P ni P nj proof: see problem set 2! what is this partial derivative, if utility is linear in z nj? How does this derivative changes with P ni and P nj. Interpret. 20 / 38

Log Likelihood function contribution of individual observation in the likelihood: (P ni ) y ni where y ni = 1 if person n chose i and zero otherwise. Assuming independence, N L(β) = (P ni ) y ni i n=1 i then, the log likelihood is N LL(β) = y ni ln P ni n=1 i McFadden (1974) shows that LL(β) is globally concave for linear-in-parameters utility. 21 / 38

Estimation: ML Maximum Likelihood Estimator: ˆβ ML = arg max log L(β) β First order condition: log L(β) β = 0. It can be shown that the FOC is equivalent to (see problem set 2 for the proof!) 1 N y ni x ni = 1 P ni x ni N n i n i ˆβML makes the predicted average of each explanatory variable equal to the observed average in the sample. 22 / 38

Goodness of Fit The likelihood ratio index: ρ = 1 log L( ˆβ) log L(0) It is important to note that the likelihood ratio index is not at all similar in its interpretation to the R 2 used in regression, despite both statistics having the same range. 23 / 38

Case Study 1: Transportation McFadden (1977) Introduction of new tranit method: BART Why do we need behavioral model? Structural versus reduced form estimation 24 / 38

25 / 38

26 / 38

Case Study 2: Labor Market Impacts of Immigrants David Card (2001) JLE This paper estimated a set of multinomial logit models for probabilities of working in six different occupation groups The estimated coefficients were then used to assign probabilities of working in different occupations 27 / 38

28 / 38

29 / 38

Case Study 3: Redistribution of Resources within family McGarry and Shoeni (1995) JHR Whether parents give greater financial assistance to their adult children who have lower incomes? Whether adult children give greater financial and time assistance to their less wealthy elderly parents? Why are redistributional aspects of transfers important? Altruism reduces the effect of government assistance programs, because it crowds out familial assistance As discussed by Barro, if individuals are altruistic, then there is no difference between tax and deb 30 / 38

31 / 38

32 / 38

Table 12 Continued Results suggest that parents give more to their less well off children and elderly parents 33 / 38

Case Study 4: Agglomeration benefits and Location Choice Head et. al. (1995) JIE Does externalities based on geographical proximity affect firm s location choice? How pervasive is it? whether the agglomeration effects operate on a nationality-specific basis? This paper estimates a location choice model using data on Japanese investors who established new manufacturing plants in the US 34 / 38

A logit model for location decision of Japanese Industries The profitability of state s for investor j θ s + α US ln A US js + α J ln A J js + α G ln A G js + ε js (1) θ s captures attractiveness of state s to the average investor A US js, AJ js, and AG js are agglomeration variables measured as count of US, Japan, and members of Japanese Industrial Group (Keiretsu) establishments Pr(js) = exp(θ s + i {US,J,G} α i ln A i js ) l exp(θ l + i {US,J,G} α i ln A i jl ) (2) 35 / 38

36 / 38

Japanese establishments do not simply mimic the geographical pattern of US establishments Instead, initial investments by Japanese firms spur subsequent investors in the same industry or industrial group to select the same states This pattern of location choice supports agglomeration-externalities theory rather than inter-state endowment differences theory 37 / 38

References Card, David. Immigrant inflows, native outflows, and the local labor market impacts of higher immigration. Journal of Labor Economics 19, no. 1 (2001): 22-64. Head, Keith, John Ries, and Deborah Swenson. Agglomeration benefits and location choice: Evidence from Japanese manufacturing investments in the United States. Journal of international economics 38, no. 3 (1995): 223-247. McGarry, Kathleen, and Robert F. Schoeni. Transfer behavior in the health and retirement study: Measurement and the redistribution of resources within the family. Journal of Human resources (1995): S184-S226. 38 / 38