Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Similar documents
Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Logistic Regression Analysis

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Module 4 Bivariate Regressions

Final Exam - section 1. Thursday, December hours, 30 minutes

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

Catherine De Vries, Spyros Kosmidis & Andreas Murr

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Econometric Methods for Valuation Analysis

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian. Binary Logit

West Coast Stata Users Group Meeting, October 25, 2007

Analyzing the Determinants of Project Success: A Probit Regression Approach

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Limited Dependent Variables

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

Estimating Ordered Categorical Variables Using Panel Data: A Generalised Ordered Probit Model with an Autofit Procedure

CHAPTER 11 Regression with a Binary Dependent Variable. Kazu Matsuda IBEC PHBU 430 Econometrics

Nonlinear Econometric Analysis (ECO 722) Answers to Homework 4

Quantitative Techniques Term 2

Comparing Odds Ratios and Marginal Effects from Logistic Regression and Linear Probability Models

Why do the youth in Jamaica neither study nor work? Evidence from JSLC 2001

List of figures. I General information 1

NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Appendix. Table A.1 (Part A) The Author(s) 2015 G. Chakrabarti and C. Sen, Green Investing, SpringerBriefs in Finance, DOI /

EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit

Module 13: Autocorrelation Problem Module 15: Autocorrelation Problem(Contd.)

Probits. Catalina Stefanescu, Vance W. Berger Scott Hershberger. Abstract

Mai Thanh Loan, Hung Vuong University Ho Chi Minh City (HVUH) Phan Du Thuy Anh, BIDV Nguyen Quoc Uy, BIDV

Description Remarks and examples References Also see

Postestimation commands predict Remarks and examples References Also see

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education

Introduction to fractional outcome regression models using the fracreg and betareg commands

Morten Frydenberg Wednesday, 12 May 2004

Effect of Health Expenditure on GDP, a Panel Study Based on Pakistan, China, India and Bangladesh

Post-Estimation Techniques in Statistical Analysis: Introduction to Clarify and S-Post in Stata

Copyrighted 2007 FINANCIAL VARIABLES EFFECT ON THE U.S. GROSS PRIVATE DOMESTIC INVESTMENT (GPDI)

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

F. ANALYSIS OF FACTORS AFFECTING PROJECT EFFICIENCY AND SUSTAINABILITY

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS. Pooja Shivraj Southern Methodist University

A Test of the Normality Assumption in the Ordered Probit Model *

Model fit assessment via marginal model plots

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

Advanced Econometrics

gologit2 documentation Richard Williams, Department of Sociology, University of Notre Dame Last revised February 1, 2007

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

Mendelian Randomization with a Binary Outcome

You created this PDF from an application that is not licensed to print to novapdf printer (

Discrete Choice Modeling

Questions of Statistical Analysis and Discrete Choice Models

There are also two econometric techniques that are popular methods for linking macroeconomic factors to a time series of default probabilities:

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Allison notes there are two conditions for using fixed effects methods.

Problem Set 6 ANSWERS

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

South African Dataset for MAMS

Analysis of Microdata

ECON Introductory Econometrics Seminar 2, 2015

PASS Sample Size Software

Calculating the Probabilities of Member Engagement

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Multinomial Choice (Basic Models)

u panel_lecture . sum

Estimating Heterogeneous Choice Models with Stata

Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS)

Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541

3. Multinomial response models

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Introduction to POL 217

Modeling wages of females in the UK

STATA Program for OLS cps87_or.do

Problem Set 9 Heteroskedasticty Answers

STATA log file for Time-Varying Covariates (TVC) Duration Model Estimations.

What determines Paid Parental Leave Provisions in Collective Agreements in New Zealand?

Simulated Multivariate Random Effects Probit Models for Unbalanced Panels

9. Logit and Probit Models For Dichotomous Data

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7

Running head: FINDING THE IS CURVE 1

Phd Program in Transportation. Transport Demand Modeling. Session 11

A Two-Step Estimator for Missing Values in Probit Model Covariates

A Comparison of Univariate Probit and Logit. Models Using Simulation

Effect of Foreign Ownership on Financial Performance of Listed Firms in Nairobi Securities Exchange in Kenya

Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models

Creation of Synthetic Discrete Response Regression Models

THE EQUIVALENCE OF THREE LATENT CLASS MODELS AND ML ESTIMATORS

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Handout seminar 6, ECON4150

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Duration Models: Parametric Models

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

XI Congreso Internacional de la Academia de Ciencias Administrativas A.C. (ACACIA) Tema: Finanzas y Economía

Vlerick Leuven Gent Working Paper Series 2003/30 MODELLING LIMITED DEPENDENT VARIABLES: METHODS AND GUIDELINES FOR RESEARCHERS IN STRATEGIC MANAGEMENT

Transcription:

Getting Started in Logit and Ordered Logit Regression (ver. 3. beta Oscar Torres-Reyna Data Consultant otorres@princeton.edu http://dss.princeton.edu/training/

Logit model Use logit models whenever your dependent variable is binary (also called dummy which takes values 0 or. Logit regression is a nonlinear regression model that forces the output (predicted values to be either 0 or. Logit models estimate the probability of your dependent variable to be (Y=. This is the probability that some event happens.

Logit model = = = = = =... (... ( 0 0 0,..., Pr(,..., Pr(... (,..., Pr( K K K K k k K K k e Y e Y F Y From Stock & Watson, key concept 9.3. The logit model is: Logit and probit models are basically the same, the difference is in the distribution: Logit Cumulative standard logistic distribution (F Probit Cumulative standard normal distribution (Φ Both models provide similar results.

In Stata you run the model as follows: Logit model Dependent variable Independent variable(s. logit y_bin x x x3 x4 x5 x6 x7 Iteration 0: log likelihood = -5.97 Iteration : log likelihood = -9.384 Iteration : log likelihood = -65.56847 Iteration 3: log likelihood = -60.76756 Iteration 4: log likelihood = -60.4443 Iteration 5: log likelihood = -60.44 Logistic regression Number of obs = 490 LR chi( 7 = 83.06 Prob > chi = 0.0000 Log likelihood = -60.44 Pseudo R = 0.3633 If this number is < 0.05 then your model is ok. This is a test to see whether all the coefficients in the model are different than zero. Logit coefficients are in log-odds units and cannot be read as regular OLS coefficients. To interpret you need to estimate the predicted probabilities of Y= (see next page y_bin Coef. Std. Err. z P> z [95% Conf. Interval] x.69763.759677.53 0.5 -.0758.64657 x -.50059.459846 -.7 0.087 -.536837.0360653 x3.50445.4868 0.77 0.439 -.7647.4063306 x4.36497.53434.38 0.07.06447.6656973 x5 -.334.467796 -.3 0.033 -.600804 -.054386 x6 -.36499.566993-0.87 0.385 -.443749.70975 x7 3.06987.36348 8.83 0.000.4959 3.98744 _cons.5864.3997 3.97 0.000.803585.368695 Note: failure and success completely determined. Test the hypothesis that each coefficient is different from. To reject this, the t-value has to be higher than.96 (for a 95% confidence. If this is the case then you can say that the variable has a significant influence on your dependent variable (y. The higher the z the higher the relevance of the variable. Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (95%, you could choose also an alpha of 0.0, if this is the case then you can say that the variable has a significant influence on your dependent variable (y

After running the model: Logit: predicted probabilities logit y_bin x x x3 x4 x5 x6 x7 Type predict y_bin_hat /*These are the predicted probabilities of Y= */ Here are the estimations for the first five cases, type: browse y_bin x x x3 x4 x5 x6 x7 y_bin_hat Predicted probabilities To estimate the probability of Y= for the first row, replace the values of into the logit regression equation. For the first case, given the values of there is 79% probability that Y=: Pr( Y =,,... 7 = e (.58 0.6.5 0. 0.36 0.3 0.3 3.0 3 4 5 6 7 = 0.78404

Logit: Odds ratio You can request odds ratio rather than logit coefficients by adding the option or (after comma Dependent variable Independent variable(s Getting odds ratios. logit y_bin x x x3 x4 x5 x6 x7, or Iteration 0: log likelihood = -5.97 Iteration : log likelihood = -9.384 Iteration : log likelihood = -65.56847 Iteration 3: log likelihood = -60.76756 Iteration 4: log likelihood = -60.4443 Iteration 5: log likelihood = -60.44 Logistic regression Number of obs = 490 LR chi( 7 = 83.06 Prob > chi = 0.0000 Log likelihood = -60.44 Pseudo R = 0.3633 If this number is < 0.05 then your model is ok. This is a test to see whether all the coefficients in the model are different than zero. y_bin Odds Ratio Std. Err. z P> z [95% Conf. Interval] x.309653.304567.53 0.5.97646.84904 x.7787547.3686 -.7 0.087.5849765.03674 x3.93.66738 0.77 0.439.838453.5099 x4.440474.076.38 0.07.066356.945847 x5.736.07396 -.3 0.033.5483705.974883 x6.8778.367534-0.87 0.385.649307.8646 x7 4.70453 8.97405 8.83 0.000.45 50.3378 Note: failure and success completely determined. They represent the odds of Y= when increases by unit. These are the exp(logit coeff. If the OR > then the odds of Y= increases If the OR < then the odds of Y= decreases Look at the sign of the logit coefficients Test the hypothesis that each coefficient is different from 0. To reject this, the t-value has to be higher than.96 (for a 95% confidence. If this is the case then you can say that the variable has a significant influence on your dependent variable (y. The higher the z the higher the relevance of the variable. Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (95%, you could choose also an alpha of 0.0, if this is the case then you can say that the variable has a significant influence on your dependent variable (y

Logit: adjust After running the logit model you can estimate predicted probabilities or odds ratios by different levels of a variable (in particular for categorical or nominal variables. You can also use the command prvalue explaing at the end of the document. Using the command adjust. Odds ratio per different levels of variable x. For example, when x = the odds of Y= increase by a factor of 7.8 (controlling by the other s Predicted probabilities per different levels of variable x. For example, when x = the probability of Y= is 88% (controlling by the other s. adjust, by(x exp. adjust, by(x pr Dependent variable: y_bin Command: logi Variables left as is: x, x3, x4, x5, x6, x7 Dependent variable: y_bin Command: logit Variables left as is: x, x3, x4, x5, x6, x7 x exp(xb 7.834 0.379 3 7.9768 Key: exp(xb = exp(xb x pr.88666.973 3.879484 Key: pr = Probability NOTE: Please see http://www.ats.ucla.edu/stat/stata/library/odds_ratio_logistic.htm

Ordinal logit When a dependent variable has more than two categories and the values of each category have a meaningful sequential order where a value is indeed higher than the previous one, then you can use ordinal logit. Here is an example of the type of variable:. tab y_ordinal Agreement level Freq. Percent Cum. Disagree 90 38.78 38.78 Neutral 04. 60.00 Agree 96 40.00 00.00 Total 490 00.00

Ordinal logit: the setup Dependent variable Independent variable(s. ologit y_ordinal x x x3 x4 x5 x6 x7 Iteration 0: log likelihood = -50.79694 Iteration : log likelihood = -475.83683 Iteration : log likelihood = -458.8354 Iteration 3: log likelihood = -458.383 Iteration 4: log likelihood = -458.3845 Ordered logistic regression Number of obs = 490 LR chi( 7 = 4.83 Prob > chi = 0.0000 Log likelihood = -458.3845 Pseudo R = 0.98 If this number is < 0.05 then your model is ok. This is a test to see whether all the coefficients in the model are different than zero. Logit coefficients are in log-odds units and cannot be read as regular OLS coefficients. To interpret you need to estimate the predicted probabilities of Y= (see next page y_ordinal Coef. Std. Err. z P> z [95% Conf. Interval] x.088.09588.30 0.0.033079.40868 x -.054357.089953-0.60 0.546 -.305834.8779 x3.066394.09503.5 0.49 -.0746775.879563 x4.479.093585.46 0.04.0456697.4037885 x5 -.90978.09074-3. 0.00 -.4704886 -.3707 x6.0034756.0860736 0.04 0.968 -.6555.7767 x7.566.7853 8.79 0.000.684.9558 /cut -.558058.03594 -.7558463 -.3497654 /cut.538937.07893.3374604.740387 Note: observation completely determined. Standard errors questionable. Ancillary parameters to define the changes among categories (see next page Test the hypothesis that each coefficient is different from. To reject this, the t-value has to be higher than.96 (for a 95% confidence. If this is the case then you can say that the variable has a significant influence on your dependent variable (y. The higher the z the higher the relevance of the variable. Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (95%, you could choose also an alpha of 0.0, if this is the case then you can say that the variable has a significant influence on your dependent variable (y

Ordinal logit: predicted probabilities Following Hamilton, 006, p.79, ologit estimates a score, S, as a linear function of the s: S = 0. -0.05 0. 3 0. 4-0.9 5 0.003 6.57 7 Predicted probabilities are estimated as: P(y_ordinal= disagree = P(S u _cut = P(S u -0.558058 P(y_ordinal= neutral = P(_cut < S u _cut = P(-0.558058 < S u 0.538937 P(y_ordinal= agree = P(_cut < S u = P(0.538937 < S u To estimate predicted probabilities type predict right after ologit model. Unlike logit, this time you need to specify the predictions for all categories in the ordinal variable (y_ordinal, type: predict disagree neutral agree

Ordinal logit: predicted probabilities To read these probabilities, as an example, type browse country disagree neutral agree if year==999 In 999 there is a 6% probability of agreement in Australia compared to 58% probability in disagreement in Brazil while Denmark seems to be quite undecided.

Predicted probabilities: using prvalue After runing ologit (or logit you can use the command prvalue to estimate the probabilities for each event. Prvalue is a user-written command, if you do not have it type findit spost, select spost9_ado from http://www.indiana.edu/~jslsoc/stata and click on (click here to install If you type prvalue without any option you will get the probabilities for each category when all independent values are set to their mean values.. prvalue ologit: Predictions for y_ordinal Confidence intervals by delta method 95% Conf. Interval Pr(y=Disagree x: 0.367 [ 0.359, 0.4094] Pr(y=Neutral x: 0.643 [ 0.97, 0.3090] Pr(y=Agree x: 0.3730 [ 0.36, 0.498] x x x3 x4 x5 x6 x7 x=.000408-8.94e-0 -.60e-0 -.e-0.539e-09-9.744e-0-6.040e-0 You can also estimate probabilities for a particular profile (type help prvalue for more details.. prvalue, x(x= x=3 x3=0 x4=- x5= x6= x6=9 x7=4 ologit: Predictions for y_ordinal Confidence intervals by delta method 95% Conf. Interval Pr(y=Disagree x: 0.009 [-0.0033, 0.0090] Pr(y=Neutral x: 0.0055 [-0.006, 0.07] Pr(y=Agree x: 0.996 [ 0.9738,.0094] x x x3 x4 x5 x6 x7 x= 3 0-9 4 For more info go to: http://www.ats.ucla.edu/stat/stata/dae/probit.htm

Predicted probabilities: using prvalue If you want to estimate the impact on the probability by changing values you can use the options save and dif (type help prvalue for more details. prvalue, x(x= save ologit: Predictions for y_ordinal Confidence intervals by delta method 95% Conf. Interval Pr(y=Disagree x: 0.3837 [ 0.3098, 0.4576] Pr(y=Neutral x: 0.64 [ 0.95, 0.3087] Pr(y=Agree x: 0.35 [ 0.806, 0.438] Probabilities when x= and all other independent variables are held at their mean values. Notice the save option. x x x3 x4 x5 x6 x7 x= -8.94e-0 -.60e-0 -.e-0.539e-09-9.744e-0-6.040e-0. prvalue, x(x= dif ologit: Change in Predictions for y_ordinal Confidence intervals by delta method Probabilities when x= and all other independent variables are held at their mean values. Notice the dif option. Current Saved Change 95% CI for Change Pr(y=Disagree x: 0.367 0.3837-0.00 [-0.0737, 0.037] Pr(y=Neutral x: 0.643 0.64 0.0003 [-0.006, 0.003] Pr(y=Agree x: 0.3730 0.35 0.008 [-0.099, 0.074] x x x3 x4 x5 x6 x7 Current= -8.94e-0 -.60e-0 -.e-0.539e-09-9.744e-0-6.040e-0 Saved= -8.94e-0 -.60e-0 -.e-0.539e-09-9.744e-0-6.040e-0 Diff= 0 0 0 0 0 0 NOTE: You can do the same with logit or probit models Here you can see the impact of x when it changes from to. For example, the probability of y=agree goes from 35% to 37% when x changes from to (and all other independent variables are held at their constant mean values.

Useful links / Recommended books DSS Online Training Section http://dss.princeton.edu/training/ UCLA Resources to learn and use STATA http://www.ats.ucla.edu/stat/stata/ DSS help-sheets for STATA http://dss/online_help/stats_packages/stata/stata.htm Introduction to Stata (PDF, Christopher F. Baum, Boston College, USA. A 67-page description of Stata, its key features and benefits, and other useful information. http://fmwww.bc.edu/gstat/docs/stataintro.pdf STATA FAQ website http://stata.com/support/faqs/ Princeton DSS Libguides http://libguides.princeton.edu/dss Books Introduction to econometrics / James H. Stock, Mark W. Watson. nd ed., Boston: Pearson Addison Wesley, 007. Data analysis using regression and multilevel/hierarchical models / Andrew Gelman, Jennifer Hill. Cambridge ; New York : Cambridge University Press, 007. Econometric analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, 008. Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O. Keohane, Sidney Verba, Princeton University Press, 994. Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King, Cambridge University Press, 989 Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam Kachigan, New York : Radius Press, c986 Statistics with Stata (updated for version 9 / Lawrence Hamilton, Thomson Books/Cole, 006