Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Similar documents
sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Final Exam - section 1. Thursday, December hours, 30 minutes

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

Logistic Regression Analysis

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

u panel_lecture . sum

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

Model fit assessment via marginal model plots

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit

Labor Market Returns to Two- and Four- Year Colleges. Paper by Kane and Rouse Replicated by Andreas Kraft

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education

Morten Frydenberg Wednesday, 12 May 2004

Heteroskedasticity. . reg wage black exper educ married tenure

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

The Multivariate Regression Model

Assignment #5 Solutions: Chapter 14 Q1.

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

Problem Set 6 ANSWERS

Module 4 Bivariate Regressions

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

Technical Documentation for Household Demographics Projection

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Advanced Econometrics

Econometrics is. The estimation of relationships suggested by economic theory

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

Problem Set 9 Heteroskedasticty Answers

F^3: F tests, Functional Forms and Favorite Coefficient Models

Allison notes there are two conditions for using fixed effects methods.

You created this PDF from an application that is not licensed to print to novapdf printer (

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Quantitative Techniques Term 2

Modeling wages of females in the UK

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Nonlinear Econometric Analysis (ECO 722) Answers to Homework 4

The relationship between GDP, labor force and health expenditure in European countries

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program. on the United Methodist Church in Texas

West Coast Stata Users Group Meeting, October 25, 2007

Solutions for Session 5: Linear Models

STATA Program for OLS cps87_or.do

Stat 328, Summer 2005

Effect of Education on Wage Earning

Limited Dependent Variables

Lecture 21: Logit Models for Multinomial Responses Continued

Handout seminar 6, ECON4150

Testing the Solow Growth Theory

Why do the youth in Jamaica neither study nor work? Evidence from JSLC 2001

A generalized Hosmer Lemeshow goodness-of-fit test for multinomial logistic regression models

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

AIC = Log likelihood = BIC =

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

South African Dataset for MAMS

Introduction to fractional outcome regression models using the fracreg and betareg commands

Prof. Dr. Ben Jann. University of Bern, Institute of Sociology, Fabrikstrasse 8, CH-3012 Bern

Impact of Household Income on Poverty Levels

Time series data: Part 2

Example 7.1: Hourly Wage Equation Average wage for women

STATA log file for Time-Varying Covariates (TVC) Duration Model Estimations.

gologit2 documentation Richard Williams, Department of Sociology, University of Notre Dame Last revised February 1, 2007

STA 4504/5503 Sample questions for exam True-False questions.

List of figures. I General information 1

Duration Models: Parametric Models

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

An Introduction to Event History Analysis

Visualisierung von Nicht-Linearität bzw. Heteroskedastizität

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Estimating Ordered Categorical Variables Using Panel Data: A Generalised Ordered Probit Model with an Autofit Procedure

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS. Pooja Shivraj Southern Methodist University

Catherine De Vries, Spyros Kosmidis & Andreas Murr

Description Remarks and examples References Also see

COMPLEMENTARITY ANALYSIS IN MULTINOMIAL

Logit Models for Binary Data

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Creation of Synthetic Discrete Response Regression Models

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian. Binary Logit

Econometric Methods for Valuation Analysis

Generalized Multilevel Regression Example for a Binary Outcome

Postestimation commands predict Remarks and examples References Also see

Day 3C Simulation: Maximum Simulated Likelihood

1) The Effect of Recent Tax Changes on Taxable Income

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

To be two or not be two, that is a LOGISTIC question

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Religion and Volunteerism

Chapter 6 Part 3 October 21, Bootstrapping

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Example 8.1: Log Wage Equation with Heteroscedasticity-Robust Standard Errors

Relation between Income Inequality and Economic Growth

Didacticiel - Études de cas. In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA.

Intro to GLM Day 2: GLM and Maximum Likelihood

Transcription:

Categorical Outcomes Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Nominal Ordinal 28/11/2017 R by C Table: Example Categorical, more than two outcomes No ordering on outcomes Females Males Total Indemnity 234 (51%) 60 (40%) 294 (48%) Prepaid 196 (42%) 81 (53%) 277 (45%) No Insurance 32 (7%) 13 (8%) 45 (7%) Total 462 (100%) 154 (100%) 616 (100%) χ 2 = 6.32, p = 0.04 tab insure male, co chi2

Analysing an R by C Table Odds Ratios from Tables χ 2 -test: says if there is an association Need to assess what that association is Can calculate odds ratios for each row compared to a baseline row Prepaid vs Indemnity OR for males = 81 234 60 196 = 1.61 No Insurance vs Indemnity OR for males = 13 234 60 32 = 1.58 Multiple Logistic Regression Models Multiple Logistic Regression Models: Example Previous results can be duplicated with 2 logistic regression models Prepaid vs Indemnity No Insurance vs Indemnity Logistic regression model can be extended to more predictors Logistic regression model can include continuous variables. logistic insure1 male insure1 Odds Ratio Std. Err. z P> z [95% Conf. Interval] male 1.611735.3157844 2.44 0.015 1.09779 2.36629. logistic insure2 male insure2 Odds Ratio Std. Err. z P> z [95% Conf. Interval] male 1.584375.5693029 1.28 0.200.7834322 3.204163

Example. mlogit insure male, rrr It would be convenient to have a single analysis give all the information Can be done with multinomial logistic regression Also provides more efficient estimates (narrower confidence intervals) in most cases. Multinomial logistic regression Number of obs = 616 LR chi2(2) = 6.38 Prob > chi2 = 0.0413 Log likelihood = -553.40712 Pseudo R2 = 0.0057 insure RRR Std. Err. z P> z [95% Conf. Interval] Prepaid male 1.611735.3157844 2.44 0.015 1.09779 2.36629 Uninsure male 1.584375.5693021 1.28 0.200.7834329 3.20416 (Outcome insure==indemnity is the comparison group) in Stata Using predict after mlogit Command mlogit Option rrr (Relative risk ratio) gives odds ratios, rather than coefficients Option basecategory sets the baseline or reference category Can predict probability of each outcome Need to give k variables predict p1-p3, p Can predict probability of one particular outcome Need to specfy which with outcome option predict p2, p outcome(2)

Using predict after mlogit: Example Using lincom after mlogit. by male: summ p1-p3 -> male = 0 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- p1 477.5064935 0.5064935.5064935 p2 477.4242424 0.4242424.4242424 p3 477.0692641 0.0692641.0692641 -> male = 1 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- p1 167.3896104 0.3896104.3896104 p2 167.525974 0.525974.525974 p3 167.0844156 0.0844156.0844156 Can use lincom to test if coefficients are different calculate odds of being in a given outcome category Need to specify which outcome category we are interested in Normally, use the option eform to get odds ratios, rather than coefficients Using lincom after mlogit Ordinal Outcomes. lincom [Prepaid]male - [Uninsure]male ( 1) [Prepaid]male - [Uninsure]male = 0 insure Coef. Std. Err. z P> z [95% Conf. Interval] (1).017121.3544299 0.05 0.961 -.6775487.7117908 Can ignore ordering, use multinomial model Can use a test for trend Can use an ordered logistic regression model

Test for Trend Test for Trend: Example χ 2 -test tests for any differences between columns (or rows) Not very powerful against a linear change in proportions Can divide the χ 2 -statistic into two parts: linear trend and variations around the linear trend. Test for trend more powerful against a trend Has no power to detect other differences Often used for ordinal predictors Treatment A Treatment B Total Healed 12 (38%) 5 (16%) 17 (27%) Improved 10 (31%) 8 (25%) 18 (28%) No Change 4 (13%) 8 (25%) 12 (19%) Worse 6 (19%) 11 (34%) 17 (27%) Total 32 (100%) 32 (100%) 34 (100%) Test for Trend: Results Test for Trend: Caveat. ptrendi 12 5 1 \ 10 8 2 \ 4 8 3 \ 6 11 4 +------------------------+ r nr _prop x ------------------------ 1. 12 5 0.706 1.00 2. 10 8 0.556 2.00 3. 4 8 0.333 3.00 4. 6 11 0.353 4.00 +------------------------+ Trend analysis for proportions ------------------------------ Regression of p = r/(r+nr) on x: Slope = -.12521, std. error =.0546, Z = 2.293 Test for trend only tests for a linear association between predictors and outcome. U-shaped or inverted U-shaped associations will not be detected. Overall chi2(3) = 5.909, pr>chi2 = 0.1161 Chi2(1) for trend = 5.259, pr>chi2 = 0.0218 Chi2(2) for departure = 0.650, pr>chi2 = 0.7226

Test for Trend in Stata Fitting an ordinal predictor Test for trend often used, should know about it Not implemented in base stata: see http://www.stata.com/support/faqs/stat/trend.html Very rarely the best thing to do: If trend variable is the outcome, use ordinal logistic regression If trend variable is a predictor: fit both categorical & continuous, testparm categoricals if non-significant, use continuous variable if significant, use categorical variables writing score 30 40 50 60 70 1 2 3 4 5 6. regress write oread i.oread note: 6.oread omitted because of collinearity Source SS df MS Number of obs = 200 -------------+------------------------------ F( 5, 194) = 22.77 Model 6612.82672 5 1322.56534 Prob > F = 0.0000 Residual 11266.0483 194 58.0724138 R-squared = 0.3699 -------------+------------------------------ Adj R-squared = 0.3536 Total 17878.875 199 89.843593 Root MSE = 7.6205 write Coef. Std. Err. t P> t [95% Conf. Interval] oread 3.288889 1.606548 2.05 0.042.1203466 6.457431 oread 2-6.669841 6.339542-1.05 0.294-19.17311 5.833432 3-3.666385 4.761676-0.77 0.442-13.05768 5.724914 4.3641026 3.568089 0.10 0.919-6.673124 7.401329 5.4233918 2.825015 0.15 0.881-5.148294 5.995078 6 0 (omitted) _cons 42.71111 9.158732 4.66 0.000 24.64764 60.77458. testparm i.oread ( 1) 2.oread = 0 ( 2) 3.oread = 0 ( 3) 4.oread = 0 ( 4) 5.oread = 0 Dose Response Don t confuse trend with dose response All three models may have significant trend test Only first model has a dose-response effect Other models better fitted using categorical variables Genetic Model Genotype aa aa AA Additive(dose-response) 0 0.1 0.2 Dominant 0 0.2 0.2 Recessive 0 0 0.2 F( 4, 194) = 1.36 Prob > F = 0.2497

Ordinal Regression: Example Ordinal Regression: Using Tables Treatment A Treatment B Total Healed 12 (38%) 5 (16%) 17 (27%) Improved 10 (31%) 8 (25%) 18 (28%) No Change 4 (13%) 8 (25%) 12 (19%) Worse 6 (19%) 11 (34%) 17 (27%) Total 32 (100%) 32 (100%) 34 (100%) Dichotomise outcome to Better or Worse Can split the table in three places This produces 3 odds ratios Suppose these three odds ratios are estimates of the same quantity Odds of being in a worse group rather than a better one Ordinal Regression Example: Using Tables Ordered Polytomous Logistic Regression Treatment A Treatment B Total Healed 12 (38%) 5 (16%) 17 (27%) Improved 10 (31%) 8 (25%) 18 (28%) No Change 4 (13%) 8 (25%) 12 (19%) Worse 6 (19%) 11 (34%) 17 (27%) Total 32 (100%) 32 (100%) 34 (100%) OR 1 = (12+10+4) 11 (5+8+8) 6 = 2.3 (1) OR 2 = (12+10) (8+11) (5+8) (4+6) = 3.2 (2) OR 3 = (12) (8+8+11) 5 (10+4+6) = 3.2 (3) Where p i log( ) = α i + βx 1 p i p i = probability of being in a category up to and including the i th α i = Log-odds of being in a category up to and including the i th if x = 0 β = Log of the odds ratio for being in a category up to and including the i th if x = 1, relative to x = 0

Ordinal regression in Stata Ordinal Regression in Stata: Example ologit fits ordinal regression models Option or gives odds ratios rather than coefficients Can compare likelihood to mlogit model to see if common odds ratio assumption is valid predict works as after mlogit. ologit outcome treat, or Iteration 3: log likelihood = -85.2492 Ordered logit estimates Number of obs = 64 LR chi2(1) = 5.49 Prob > chi2 = 0.0191 Log likelihood = -85.2492 Pseudo R2 = 0.0312 outcome Odds Ratio Std. Err. z P> z [95% Conf. Interval] treat 2.932028 1.367427 2.31 0.021 1.175407 7.31388 Ordinal Regression Caveats Assumption that same β fits all outcome categories should be tested AIC, BIC or LR test compared to mlogit model User-written gologit2 can also be used Allows for some variables to satisfy proportional odds, others not Option autofit() selects variables that violate proportional odds There are a variety of other, less widely used, ordinal regression models: see Sander Greenland: Alternative Models for Ordinal Logistic Regression, Statistics in Medicine, 1994, pp1665-1677.