Tutorial: Discrete choice analysis Masaryk University, Brno November 6, 2015

Similar documents
The Multinomial Logit Model Revisited: A Semiparametric Approach in Discrete Choice Analysis

Econometrics II Multinomial Choice Models

Econometric Methods for Valuation Analysis

Heteroskedastic Model

University of Texas at Dallas School of Management. Investment Management Spring Estimation of Systematic and Factor Risks (Due April 1)

Answers to Exercise 8

Heteroskedastic Model

Software Tutorial ormal Statistics

Computer Lab II Biogeme & Binary Logit Model Estimation

Description Remarks and examples References Also see

Quant Econ Pset 2: Logit

M249 Diagnostic Quiz

ONLINE APPENDIX (NOT FOR PUBLICATION) Appendix A: Appendix Figures and Tables

Discrete Choice Modeling of Combined Mode and Departure Time

IPUMS Int.l Extraction and Analysis

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

Mixed Logit or Random Parameter Logit Model

Logit with multiple alternatives

Bidding Decision Example

Unit 5: Study Guide Multilevel models for macro and micro data MIMAS The University of Manchester

Better decision making under uncertain conditions using Monte Carlo Simulation

Estimating Mixed Logit Models with Large Choice Sets. Roger H. von Haefen, NC State & NBER Adam Domanski, NOAA July 2013

SFSU FIN822 Project 1

Tests for Two Variances

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Phd Program in Transportation. Transport Demand Modeling. Session 11

PASS Sample Size Software

Logistic Regression Analysis

Exercise 1. Data from the Journal of Applied Econometrics Archive. This is an unbalanced panel.n = 27326, Group sizes range from 1 to 7, 7293 groups.

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Derivation of zero-beta CAPM: Efficient portfolios

Transport Data Analysis and Modeling Methodologies

One Proportion Superiority by a Margin Tests

Economics Multinomial Choice Models

B003 Applied Economics Exercises

Non-Inferiority Tests for the Ratio of Two Means

Lecture 10: Alternatives to OLS with limited dependent variables, part 1. PEA vs APE Logit/Probit

IPUMS Int.l Extraction and Analysis

Sample Size Calculations for Odds Ratio in presence of misclassification (SSCOR Version 1.8, September 2017)

OPTIMIZAÇÃO E DECISÃO 10/11

General Instructions

LABOR SUPPLY RESPONSES TO TAXES AND TRANSFERS: PART I (BASIC APPROACHES) Henrik Jacobsen Kleven London School of Economics

Lecture 21: Logit Models for Multinomial Responses Continued

Didacticiel - Études de cas. In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA.

List of figures. I General information 1

Non-Inferiority Tests for the Ratio of Two Means in a 2x2 Cross-Over Design

A MODIFIED MULTINOMIAL LOGIT MODEL OF ROUTE CHOICE FOR DRIVERS USING THE TRANSPORTATION INFORMATION SYSTEM

2. ANALYTICAL TOOLS. E(X) = P i X i = X (2.1) i=1

User Guide of GARCH-MIDAS and DCC-MIDAS MATLAB Programs

INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics

Formulating Models of Simple Systems using VENSIM PLE

Client Software Feature Guide

Crash Involvement Studies Using Routine Accident and Exposure Data: A Case for Case-Control Designs

Description Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas Acknowledgment References Also see

Nested logit. Michel Bierlaire

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Automobile Ownership Model

Bloomberg. Portfolio Value-at-Risk. Sridhar Gollamudi & Bryan Weber. September 22, Version 1.0

Homework 1 Due February 10, 2009 Chapters 1-4, and 18-24

Using the CTELL Portfolio

Problem Set 2. PPPA 6022 Due in class, on paper, March 5. Some overall instructions:

ESG Yield Curve Calibration. User Guide

Econ 8602, Fall 2017 Homework 2

CHAPTER 8 ACH ELECTRONIC FUNDS TRANSFER 8.0 OVERVIEW 8.1 REQUIREMENTS AND INSTALLATION Special Requirements

starting on 5/1/1953 up until 2/1/2017.

Interpreting Rate of Change and Initial Value

Tutorial 3: Working with Formulas and Functions

Analysis of implicit choice set generation using the Constrained Multinomial Logit model

TOURISM GENERATION ANALYSIS BASED ON A SCOBIT MODEL * Lingling, WU **, Junyi ZHANG ***, and Akimasa FUJIWARA ****

Project Budgeting Release 2015

The Final Topic: Taylor Rules. A Simple Characterization of Fed Policy

Nested logit. Michel Bierlaire

Discrete Choice Model for Public Transport Development in Kuala Lumpur

Online Appendix (Not For Publication)

Arkansas State University Banner Finance Self-Service

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

Panel Data with Binary Dependent Variables

Contents. Part I Getting started 1. xxii xxix. List of tables Preface

15. Multinomial Outcomes A. Colin Cameron Pravin K. Trivedi Copyright 2006

9. Logit and Probit Models For Dichotomous Data

*9-BES2_Logistic Regression - Social Economics & Public Policies Marcelo Neri

11. Logistic modeling of proportions

Project Appraisal Guidelines for National Roads Unit Guidance on using COBALT

This document will provide a step-by-step tutorial of the RIT 2.0 Client interface using the Liability Trading 3 Case.

Multinomial Choice (Basic Models)

A Gender-based Analysis of Work Trip Mode Choice of Suburban Montreal Commuters Using Stated Preference Data

Models of Multinomial Qualitative Response

Expected Return Methodologies in Morningstar Direct Asset Allocation

FIT OR HIT IN CHOICE MODELS

Gamma Distribution Fitting

Point-Biserial and Biserial Correlations

NCSS Statistical Software. Reference Intervals

Lecture 3: Factor models in modern portfolio choice

A Comparison of Univariate Probit and Logit. Models Using Simulation

Introduction to POL 217

Logit Models for Binary Data

Computational Methods forglobal Change Research. Economics & Computable General Equilibrium models

SEX DISCRIMINATION PROBLEM

Name:... ECO 4368 Summer 2016 Midterm 2. There are 4 problems and 8 True-False questions. TOTAL POINTS: 100

Transcription:

Tutorial: Discrete choice analysis Masaryk University, Brno November 6, 2015 Prepared by Stefanie Peer and Paul Koster November 2, 2015 1 Introduction Discrete choice analysis is widely applied in transport analysis and the logit model of McFadden (1974) has been the workhorse model for many applied transport studies. For example, it has been used to study the choice of transport mode or to value non-market goods such as travel time and traffic safety. In this tutorial you will develop some practical skills that enable you to analyze discrete choice data. For this assignment you need to estimate the value of travel time using data obtained from a stated preference experiment among car peak commuters. Please do not hesitate to ask questions during the exercises. 2 Trading travel time and money The data is collected using a stated choice experiment that aims at estimating a monetary value attached to reductions in travel time (in short: value of time). Respondents make 6 choices each, where they need to trade off travel costs and travel time. An example of a trade-off is given below. Suppose that these are the only two existing alternatives to travel from home to work. Indicate which one you prefer. Alternative 1 Alternative 2 Travel costs (in Euro) 6 8 Travel time (in minutes) 40 30 1

Clearly there is a trade-off between the faster and more expensive (Alternative 2) and the slower and cheaper alternative (Alternative 1) 1. Therefore, if a respondent chooses alternative 1 or 2, we learn something about his trading procedure of money and travel time. In order to make this more explicit, we first look for the trade-off value of travel time (VOTT) for which the respondent is indifferent between choice alternatives 1 and 2. This value is usually referred to as the bid-value. Suppose that Alternative 1 is always the slowest alternative. Then the bid-value is given by Eq. 1 bid = (C 1 C 2 ) T 1 T 2, (1) where C 1 and C 2 are the travel costs of alternative 1 and alternative 2 respectively. T 1 and T 2 are the travel times of alternatives 1 and 2 respectively. The bid-value of the example is therefore (6 8)/(40 30) = 2/10 Euro per minute or 12 Euro per hour. Now suppose a respondent chooses Alternative 1 (i.e. the slower and cheaper alternative). Then we learn that the VOTT of this respondent is lower than (or equal to) 6 Euro/hour. Similarly a choice for alternative 2 implies that the respondents VOTT is larger than 12 Euro. Rather than looking at the trade-offs in each choice situation we would prefer to analyze the preferences of the whole sample of respondents. Therefore we proceed with a model that is able to analyze large datasets. 3 Binary logit analysis The logit model can account for errors in decision making and other factors that are unobserved by the researcher and influence the choice of a respondent. We start by writing down the utility function for alternative j = 1, 2 and for choice n = 1,..., N: U jn = V jn [β; T jn ; C jn ] + ɛ jn (2) The utility function consist of two components: A systematic component given by V jn [.] 2, and a random component ɛ jn. The systematic component is a function of the travel time and travel cost (T jn and C jn ) and the sensitivity to changes in these variables indicated (i.e. the coefficients to be estimated) by β. If we assume that ɛ jn is logistically distributed, the probability that alternative j = 1, 2 is chosen is given by: P j = exp V jn [.] exp V 1n [.] + exp V 2n [.] (3) The beauty of Eq. 3 is that the formula has a simple closed form expression. The value of travel time is then given as ratio of the marginal utilities and is given by Eq. 4: V OT T = V [.]/ T V [.]/ C 1 That is also how the dataset to be used in the exercises is structured 2 For convenience V jn[.]is used, but of course V jn is a function of several variables. (4) 2

The ratio indicates the willingness to pay for reducing travel time by one unit. It is of key importance to understand that the VOTT is a ratio of marginal utilities. When nonlinear formulations of the systematic utility are used, this ratio may be more complicated. Throughout this exercise we assume that a linear systematic utility applies such as in Equation 5: Then the VOTT is given by: V = β C C + β T T (5) V OT T = β T /β C, (6) which is the ratio of the sensitivities to changes in travel time and travel costs. More general specifications of Eqs. 5 and 6 will be discussed in the following exercises. 4 Exercise I: Exploratory analysis Applied analysis always should always with a first exploratory analysis of the data. This provides insides in the quality of the data and some general trends. For this analysis you should use Excel and the dataset brno2014.xlsx. Here you find data for 1357 respondents making 6 choices each. The columns in the dataset indicate the variables. In the tab varnames you find a dictionary of the variables. Read this carefully before you proceed. Ex. 1. Make summary statistics (minimum, average, maximum) of the reference travel time, income for the whole sample and for males and females separately. 3 Ex. 2. How many times is the fastest alternative chosen? How many times is the slowest alternative chosen? Ex. 3. What is the probability that the fastest alternative is chosen? What is the probability that the slowest alternative is chosen? Ex. 4. Are there any dominant alternatives (hence, alternatives that are both cheaper AND faster) included in the dataset? 5 Exercise II: Logit analysis For this analysis you should use Biogeme and the dataset brno2014.dat. Biogeme is open source software developed by Michel Bierlaire (Bierlaire, 2003). Open the file VOTT BL.mod. 4 Read the file carefully in order to understand the structure. What 3 The excel function averageif is useful in this context. 4 Right click on the file and use open with and choose Wordpad. 3

are the parameters to be estimated? Where is the utility specified? How does the file relate to the dataset? If you understand the file, you can proceed. If you have any questions please ask! Ex. 5. Estimate a logit model with the linear specification of utility of Eq. 5 using the Biogeme model file VOTT BL.mod. 5 Do the coefficients have the expected sign? Are the coefficients significantly different from 0? Calculate the VOTT in Euro/hour using Eq. 5. Ex. 6. Try to use the [Exclude] section in the file VOTT BL.mod to exclude some data. For example to only consider those with a lower income, use: income > 4. Several statements can be combined using the OR operator, for example: (statement 1) (statement 2) 6 Exercise III: Covariates It may well be that the VOTT depends on the income of the respondent, because respondents with a high income are likely having a lower sensitivity to changes in travel costs. Therefore we extend the utility function of Eq. 5 to incorporate an interaction effect of the variable INC in your dataset: V = β C C + β C,INC C INC + β T T (7) Ex. 7. What is the interpretation of β C,INC? Derive the formula for the VOTT as in Eq. 6. Ex. 8. Adjust the Biogeme model file VOTT BL.mod to incorporate the interaction effect of travel costs and income. Store the new modfile as VOTT BL covars.mod. First, calculate the variable C INC in the [Expressions] section. Second, use Eq. 7 and change the utility function in the [Utilities] section. Third, add the new variable β C,INC in the [Beta] section. Ex. 9. Estimate the model and interpret your results. 6 What is the VOTT at the average income level? Compare this average to your findings at exercise 6. What is the maximum and minimum VOTT? 5 This can be done in the following way: 1) Go to biogeme.epfl.ch, go to downloads and click on (Executables for) Windows. Go to the download folder, choose gui and open inside this folder guibiogeme. You should then see a graphical user interface for Biogeme. 2) To specify the model click on the Select File button (the highest one) and select the model file (in this case VOTT BL.mod). 3) To specify the data click on the Select File button (the lower one) and select the datafile brno2014.dat. 4) To estimate the model, click on estimate. The model will be estimated. 5) If the model is finished, click Display File to see the results. You can also open the html file that is created in the directory you are working in. 6 You can use the income variable as if it was continuous. Thus, you do not have to take into account that in fact it is an ordinal variable. 4

Ex. 10. Test if males and females have different marginal utilities of travel time (e.g. different β T ). First, write down the extension of the utility function of 7. Second, adjust the Biogeme model file and save it. Third, run the model and interpret your results. Derive the VOTT at minimum, average and maximum income levels for males and females separately. 7 Exercise IV: Unobserved heterogeneity - cross-sectional mixed logit For this exercise you should again use Biogeme and the dataset brno2014.dat. The standard logit model assumes that unobserved effects are captured by the error term. For example: education may have an effect on the VOTT, but if we do not measure education (e.g. the variable is not in our data), the effect of education is unobserved. This effect will therefore end up in the random part of the utility. Ignoring unobserved heterogeneity may lead to biases in the logit estimates and therefore it is important to control for these effects. The common workhorse model to do so is the mixed logit model. The mixed logit model estimates a distribution of preferences instead of a single VOTT, assuming that the distribution of preferences is continuous. An alternative are latent class (or: discrete mixture) models that assume a distribution with discrete masspoints (classes), whereby the researcher determines the number of classes. Because of limited time we will only consider the mixed logit model here. Assume a linear in parameters utility function as in Eq. 5. The choice probability of Eq. 3 then will change to: P jn = exp(β C C jn + β T T jn ) exp(β C C 1n + β T T 1n ) + exp(β C C 2n + β T T 2n ) f[β C]dβ C (8) The logit probability is a continuous mixture of logit probabilities, where the mixture is governed by the probability distribution f[β C ]. A mixture can be viewed as a weighted average of logit probabilities. We are interested in the distribution f[β C ], which provides the distribution of the cost coefficient in the sample and try to estimate it. Meanwhile, the time coefficient is still assumed homogenous across choices. Throughout the analysis, a normal distribution is assumed. Ex. 11. Use the Biogeme model file VOTT CSLC.mod to estimate the model and interpret your results (running the model may take some time). 7 Ex. 12. Again try to include the interaction variables income and gender. Compare your results with the result of exercise I and II. 7 If a bug results due to numerical problems: lower the number of Draws in the section [Draws] and try to estimate the model. Use the results of the estimation that works and gradually increase the number of draws. 5

8 Exercise V: Unobserved heterogeneity - panel mixed logit Skip this exercise if you are short on time, as the time required to estimate the model is quite long. For this exercise you should use Biogeme and again the dataset brno2014.dat. The cross sectional mixed logit model (as estimated in the previous section) assumes that the error term over a series over choices is uncorrelated and therefore analyzes the probability of an isolated choice. In other words, it does not take into account that each person makes 6 choices rather than just 1, and that the preferences across these 6 choices will be similar. The panel mixed logit model goes one step further and analyzes the probability that an individual makes a sequence of choices. Assume a linear in parameters utility function as in Eq. 5. The choice probability of a sequence of choices made by individual i is given by: Z(i) P ji = z=1 exp(β C C jiz + β T T jiz ) f[β C ]dβ C, (9) exp(β C C 1n + β T T 1iz ) + exp(β C C 2n + β T T 2iz ) where Z(i) is the number of choices of the number of choices of individual i (in our case 6 for each individual). The key difference with Eq. 8 is that instead of analyzing each choice separately, we now analyze the probability of a sequence of choices, which is given by the product of the choice probabilities of each separate choice. Therefore Eq. 9 integrates over the product of a sequence of choices, where β C is kept constant over the series of choices of an individual. Ex. 13. Use the Biogeme model file VOTT PMXL.mod to estimate the model and interpret your results (running the model may take some time). Furthermore, compare your results with the estimates of Exercise 12. 9 Exercise VI: Advanced examples from the Biogeme website Go to http://biogeme.epfl.ch/swissmetro/examples.html and download the dataset concerning the Swissmetro. First check what the dataset is about. Ex. 14. Run some of the advanced models shown on the site using the Swissmetro dataset (e.g. (cross-) nested logit). Become acquainted with the logic of the models and make some amendments (e.g. add some exclude statements or interaction terms). See in which way the results change. Ex. 15. Check the pythonbiogeme version of the models, and try to understand their structure and syntax. 6