Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

1/31 Choice Probabilities Basic Econometrics in Transportation Logit Models Amir Samimi Civil Engineering Department Sharif University of Technology Primary Source: Discrete Choice Methods with Simulation (Train) The utility that the decision maker obtains is U nj = V nj + ε nj j. The logit model is obtained by assuming that each ε nj is IID extreme value. For each unobserved component of utility, PDF and CDF are: The variance of this distribution is π 2 /6. Difference between two extreme value variables (ε nji = ε nj ε ni ) is distributed logistic: The key assumption is not so much the shape of the distribution as that the errors are independent of each other. One should specify utility well enough that a logit model is appropriate (get white noise error term). 2/31 3/31 Choice Probabilities Logit Choice Probabilities Derivation If the researcher e thinks that the unobserved portion o of utility ty is correlated over alternatives given her specification of representative utility, then she has three options: Use a different model that allows for correlated errors, Re-specify representative utility so that the source of the correlation is captured explicitly and thus the remaining errors are independent, Use the logit model under the current specification of representative utility, considering the model to be an approximation. For a given ε ni, this is the cumulative distribution for each ε ni : Of course, ε ni is not given, and so the choice probability is the integral of P ni ε ni over all values of ε ni weighted by its density: Some algebraic manipulation of this integral results in:

4/31 5/31 Properties of Logit Probabilities Scale Parameter P ni is necessarily between zero and one. The logit probability for an alternative is never exactly zero. Choice probabilities for all alternatives sum to one. The relation of the logit probability to representative utility is sigmoid, or S-shaped. The sigmoid shape of logit probabilities is shared by most discrete choice models and has important implications for policy makers. It is easily interpretable. Ratio of coefficients has economic meaning. Setting the variance to π 2 /6 is equivalent to normalizing the model for the scale of utility. Utility can be expressed as U nj = V nj + ε nj, that becomes U nj = V nj /σ + ε nj The Choice probability becomes: Only the ratio β /σ can be estimated (β and σ are not separately identified). A larger variance in unobserved factors leads to smaller coefficients, even if the observed factors have the same effect on utility. Willingness to pay, values of time, and other measures of marginal rates of substitution are not affected by the scale parameter: β 1 /β 2 = β 1 /β 2 6/31 7/31 Power and Limitations of Logit Taste Variation Three eetopcse topics explain pa the epower and limitations tato sof logit models: Taste variation Logit can represent systematic taste variation but not random taste variation. Substitution patterns The logit model implies proportional substitution across alternatives, given the researcher s specification of representative utility. Repeated choices over time (panel data) If unobserved factors are independent over time in repeated choice situations, then logit can capture the dynamics of repeated choice. However, logit cannot handle situations where unobserved factors are correlated over time. Logit can only represent ese systematic taste variation. at Consider choice among makes and models of cars to buy: Two observed attributes are purchase price (PP), and shoulder room (SR). So utility is written as U nj = α n SR j + β n PP j + ε nj Suppose SR varies only with household size (M): α n = ρm n Suppose PP is inversely related to income(i): β n = θ/i n. Substituting these relations: U nj = ρ(m nsr j j) + θ(pp( j/i n n) + ε nj Now suppose α n = ρm n + μ n and β n = θ/i n + η n Thus U nj = ρ(m n SR j ) + μ n SR j + θ(pp j /I n ) + η n PP j + ε nj = ρ(m n SR j ) + θ(pp j /I n ) + The new error term cannot possibly be distributed independently and identically as required for the logit formulation.

8/31 9/31 Substitution Patterns Increase in the probability of one alternative necessarily means a decrease in probability for other alternatives. What is the market share of an alternative and where the share comes from? Logit implies a certain substitution pattern. If it actually occurs, logit model is appropriate. To allow for more general patterns more flexible models are needed. IIA (Independence from Irrelevant Alternatives) property IIA While the IIA property is realistic in some choice situations, it is clearly inappropriate in others: Red-bus blue-bus problem: Assume choice probabilities of car and blue bus are equal: P c = P bb = 0.5 Now a red bus is introduced that is exactly like the blue bus, and thus P rb /P bb = 1 In logit, P c /P bb is the same whether or not red bus exists. The only probabilities for which P c /P bb = 1 and P rb /P bb = 1 are P c = P bb = P rb = 1/3 In real life, we would expect P c = ½ and P bb = P rb = ¼ A new express bus Introduced along a line that already has standard bus service. This new mode might be expected to reduce the probability of regular bus by a greater proportion than it reduces the probability of car. The logit model over-predict transit share in this situation. 10/31 11/31 Proportional Substitution Advantages of IIA Consider changing gan attribute buteof alternative at j.wewa want toknow effect of this change on the probabilities for all other alternatives. We will show that the elasticity of P ni with respect to Z nj is This cross-elasticity is the same for all i. An improvement in the attributes of an alternative reduces the probabilities for all the other alternatives by the same percentage. This pattern of substitution is a manifestation of the IIA property. Electric car example: Share of large gas, small gas, and small electric cars are 0.66, 0.33, and 0.01. We want to subsidize electric cars and increase their share to 0.10. By logit, probability for large gas car would drop to 0.60, and that for the small gas car would drop by the same ten percent, from 0.33 to 0.30. This pattern of substitution is clearly unrealistic. It is possible to estimate model parameters a consistently ste t on a subset of alternatives for each sampled decision maker. Since relative probabilities within a subset of alternatives are unaffected by the attributes or existence of alternatives not in the subset, exclusion of alternatives in estimation does not affect the consistency of the estimator. At an extreme, the number of alternatives might be so large as to preclude estimation altogether if it were not possible to utilize a subset of alternatives. When one is interested in examining choices among a subset of alternatives and not among all alternatives. Choice only between car and bus modes for travel to work. If IIA holds, one can exclude persons who used walking, bicycling, etc.

12/31 13/31 Tests of IIA Panel Data Whether IIA holds in a particular setting is an empirical question, open to statistical investigation. Two types of tests are suggested: 1. The model can be reestimated on a subset of the alternatives. Ratio of probabilities for any two alternatives is the same under IIA. 2. The model can be reestimated with new, cross-alternative variables. Variables from one alternative entering the utility of another alternative. If P ni /P nk depends on alternative j (no IIA), attributes of alternative j will enter significantly the utility of alternatives i or k within a logit specification. Introduction of non-iia models, makes testing IIA easier than before. Data on current and past vehicle purchases of sampled households Data that represent repeated choices like these are called panel data. Logit model can be used to examine panel data If the unobserved factors are independent over the repeated choices. Each choice situation by each decision maker becomes a separate observation. Dynamic aspects of behavior Dependent variable in previous periods can be entered as an explanatory variable. Inclusion of the lagged variable does not induce inconsistency in estimation, since for a logit model the errors are assumed to be independent over time. The assumption of independent errors over time is severe. Use a model that allows unobserved factors to be correlated over time, Respecify utility to bring the sources of the unobserved dynamics into the model. 14/31 15/31 Consumer Surplus Derivatives Definition, A person s consumer surplus is utility ($), that he receives in the choice situation. CS n = (1/α n ) max j (U nj ), where α n is the marginal utility of income. For policy analysis If a light rail system is being considered in a city, it is important to measure the benefits of the project to see if they justify the costs. Expected consumer surplus It can be shown that expected consumer surplus in a logit model that is simply the log of the denominator (the log-sum term) of the choice probability. This resemblance has no economic meaning, and just the outcome of the mathematical form of the extreme value distribution. How probabilities change with a change in some observed factor? How do you interpret these?

16/31 17/31 Elasticities Estimation Response is usually measured by elasticity rather than derivative. Elasticity is normalized for the variable s unit. There are various estimation methods under different sampling procedures. We discuss estimation under the most prominent of these sampling schemes: Estimation when the sample is exogenous. Estimation on a subset of alternatives. Estimation with certain types of choice-based (i.e., non-exogenous) samples. 18/31 19/31 Estimation on an Exogenous Sample Estimation on an Exogenous Sample Assume The sample is random or stratified random. Explanatory variables are exogenous to the choice situation. That is, the variables entering representative utility are independent of the unobserved component. Maximum-likelihood procedures can be applied to logit: The probability of person n choosing the alternative that he was observed to choose can be expressed as: where y ni =1 if person n chose i and zero otherwise. Assuming independence of agents, probability of each person choosing the alternative that he chose is: McFadden shows that LL(β) is globally concave for linear-in parameters utility. First-order optimality ty condition o for becomes es MLE of β are those that Make the predicted average of each x equal to observed average in the sample. Make sample covariance of the residuals with the explanatory variables zero.

20/31 21/31 Estimation on a Subset of Alternatives Estimation on a Subset of Alternatives Denote The full set of alternatives as F and a subset of alternatives as K. q(k i) be the probability that K is selected given that alternative i is chosen. We have q(k i) = 0 for any K that does not include i. (why?) P ni is the probability that person n chooses alternative i from the full set. P n (i K) is the probability that the person chooses alternative i conditional on the researcher selecting subset K for him. (our goal is to determine that). Joint probability that K and i are selected are expressed in 2 forms: Prob(K, i) = q(k i) P ni. Prob(K, i ) = P n (i K)Q(K) where Q(K) = j F P nj q(k j) Q(K) is the probability of selecting K marginal over all the alternatives that the person could choose. Equate these two expressions and solve for P n (i K). If selection process is designed so q(k j) is the same for all j K: The conditional LL under the uniform conditioning property is Since information on alternatives not in each subset is excluded, the estimator based on CLL is not efficient. 22/31 23/31 Estimation on a Subset of Alternatives Estimation on Choice-Based Samples If a selection process does not exhibit the uniform conditioning property, q(k i) should be included into the model: ln q(kn j ) is added and its coefficient is constrained to 1. Why would one want to a selection procedure that does not satisfy the uniform conditioning property? Example of choice ce of home location: o To identify factors that contribute to choosing one particular community, one might draw randomly from within that community at a rate different from all other communities. This procedure assures that the researcher has an adequate number of people in the sample from the area of interest. Sampling Procedures: Purely choice-based: population is divided into those choose each alternative, and decision makers are drawn randomly within each group, at different rates. Hybrid of choice-based and exogenous: an exogenous sample is supplemented with a sample drawn on the basis of the households choices.

24/31 25/31 Estimation on Choice-Based Samples Goodness of Fit Purely choice-based estimation If one is using a purely choice-based sample and includes an alternative-specific constant, then estimating a regular logit model produces consistent estimates for all the model parameters except the alternative-specific constants. These constants can be adjusted to be consistent: Percent correctly predicted: Percentage of sampled decision makers for which the highest-probability alternative and the chosen alternative are the same. It is based on the idea that the decision maker is predicted by the researcher to choose the alternative with the highest probability. The procedure misses the point of probabilities, gives obviously inaccurate market shares, and seems to imply that the researcher has perfect information. A j : Share of decision makers in the population who chose alternative j. S j : Share in the choice-based sample who chose alternative j. 26/31 27/31 Hypothesis Testing Bay Area Rapid Transit (BART) Standard d t-statistics To test hypotheses about individual parameters, such as whether a parameter is 0. More complex hypotheses can be tested by likelihood ratio test. H 0 can be expressed as constraints on the values of the parameters, such as: Several parameters are zero, Two or more parameters are equal. Define the ratio of likelihoods: is the constrained maximum value of the likelihood function under H 0. is the unconstrained maximum of the likelihood function. The test statistic defined as 2 log R is distributed chi-squared with degrees of freedom equal to the number of restrictions implied by the null hypothesis. A prominent testof logit capabilities in the mid-1970s: McFadden applied logit to commuters mode choices to predict BART ridership. Four modes were considered to be available for the trip to work: 1. Driving a car by oneself, 2. Taking the bus and walking to the bus stop, 3. Taking the bus and driving to the bus stop, 4. Carpooling. Parameters: Time and cost of each mode (determined based on home and work location) Travel time was differentiated as walk, wait, and on-vehicle time Commuters characteristics: income, household size, number of cars and drivers in the household, and whether the commuter was head of household. A logit model with linear-in-parameters utility was estimated on these data.

28/31 29/31 Bay Area Rapid Transit (BART) Bay Area Rapid Transit (BART) 30/31 31/31 Bay Area Rapid Transit (BART) Homework 8 Forecasted and actual shares for each mode: Discrete Choice Methods with Simulation (Kenneth et E. Train) Problem set 1 on logit: elsa.berkeley.edu/users/train/ec244ps1.zip BART demand was forecast to be %6.3, compared with an actual share of %6.2. 1. Exercise 1 [20 points] 2. Exercise 2 [10 points] 3. Exercise 3 [20 points] 4. Exercise 4 [20 points] 5. Exercise 5 [15 points] 6. Exercise 6 [15 points] Assignment weight factor = 2