ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS. Pooja Shivraj Southern Methodist University

Similar documents
Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Logistic Regression. Logistic Regression Theory

Non-linearities in Simple Regression

Case Study: Applying Generalized Linear Models

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Regression and Simulation

CREDIT RISK MODELING IN R. Logistic regression: introduction

Generalized Linear Models

Addiction - Multinomial Model

Study 2: data analysis. Example analysis using R

############################ ### toxo.r ### ############################

What America Is Thinking On Energy Issues February 2016

What America Is Thinking About Energy Issues February 2016 Presented by: Harris Poll

6 Multiple Regression

boxcox() returns the values of α and their loglikelihoods,

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

An Empirical Study on Default Factors for US Sub-prime Residential Loans

What America Is Thinking On Energy Issues January 2015

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Random digital dial Results are weighted to be representative of registered voters Sampling Error: +/-4% at the 95% confidence level

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty

Predicting Charitable Contributions

Step 1: Load the appropriate R package. Step 2: Fit a separate mixed model for each independence claim in the basis set.

NHY examples. Bernt Arne Ødegaard. 23 November Estimating dividend growth in Norsk Hydro 8

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Final Exam

The Multivariate Regression Model

Random Effects ANOVA

Homework Assignment Section 3

Access and Infrastructure National April 2014

Production & Offshore Drilling July 2014

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program. on the United Methodist Church in Texas

The SAS System 11:03 Monday, November 11,

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

North Carolina Survey Results

What America Is Thinking Access Virginia Fall 2013

Final Exam - section 1. Thursday, December hours, 30 minutes

Multiple regression - a brief introduction

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS

Lecture Note: Analysis of Financial Time Series Spring 2008, Ruey S. Tsay. Seasonal Time Series: TS with periodic patterns and useful in

The August 2018 AP-NORC Center Poll

Public Issues Survey Wave 12

STA 4504/5503 Sample questions for exam True-False questions.

Stat 401XV Exam 3 Spring 2017

MODEL SELECTION CRITERIA IN R:

Chapter 8 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (May 1, 2010)

Ordinal and categorical variables

How the Survey was Conducted Nature of the Sample: NBC News/WSJ/Marist New Hampshire Poll of 2,059 Adults

Introduction to General and Generalized Linear Models

Logistic Regression Analysis

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

Final Exam Suggested Solutions

Generalized Linear Models

Random digit dial Results are weighted to be representative of Maryland registered voters.

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

How the Survey was Conducted Nature of the Sample: NBC News/WSJ/Marist New Hampshire Poll of 1,108 Adults

ARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS

Labor Market Returns to Two- and Four- Year Colleges. Paper by Kane and Rouse Replicated by Andreas Kraft

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

The December 2017 AP-NORC Center Poll

CHAPTER 4 ESTIMATES OF RETIREMENT, SOCIAL SECURITY BENEFIT TAKE-UP, AND EARNINGS AFTER AGE 50

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction

MCMC Package Example

Data screening, transformations: MRC05

You created this PDF from an application that is not licensed to print to novapdf printer (

First Midterm Examination Econ 103, Statistics for Economists February 16th, 2016

Public Issues Survey Wave 5 PAGE 1

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

book 2014/5/6 15:21 page 261 #285

> attach(grocery) > boxplot(sales~discount, ylab="sales",xlab="discount")

ASSOCIATED PRESS: SOCIAL SECURITY STUDY CONDUCTED BY IPSOS PUBLIC AFFAIRS RELEASE DATE: MAY 5, 2005 PROJECT #

arxiv: v1 [q-fin.ec] 28 Apr 2014

FOR ONLINE PUBLICATION ONLY. Supplemental Appendix for:

Building and Checking Survival Models

For release after 10:00AM/ET Monday, December 11, ALABAMA

Technical Documentation for Household Demographics Projection

Public Issues Survey Wave 7 PAGE 1

General Business 706 Midterm #3 November 25, 1997

Public Issues Survey Wave 6 PAGE 1

Random digit dial Results are weighted to be representative of registered voters.

PENSION POLL 2015 TOPLINE RESULTS

Stat 328, Summer 2005

Florida CD 10 Survey Results

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Econometric Methods for Valuation Analysis

Social Studies 201 January 28, 2005 Measures of Variation Overview

How the Survey was Conducted Nature of the Sample: McClatchy-Marist Poll of 1,249 National Adults

Americans' Views on Healthcare Costs, Coverage and Policy

DETERMINANTS OF SUCCESSFUL TECHNOLOGY TRANSFER

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Weighting: Results are weighted to be representative of 2012 election voters across the United States

Monetary Economics Risk and Return, Part 2. Gerald P. Dwyer Fall 2015

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Panel Data. November 15, The panel is balanced if all individuals have a complete set of observations, otherwise the panel is unbalanced.

Transcription:

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS Pooja Shivraj Southern Methodist University

KINDS OF REGRESSION ANALYSES Linear Regression Logistic Regression Dichotomous dependent variable (yes/no, died/ didn t die, at risk/not at risk, etc.) Predicts the probability of a person belonging in that category.

QUICK REVIEW: LOGISTIC REGRESSION Values calculated from linear regression are continuous need to be transformed on a 0-1 scale to represent probability since 0 p 1 Logistic regression probability calculated by: ^ e (B 1 x + B 0 ) p = 1 + e (B 1 x + B 0 )

CLASS EXAMPLE: LOGISTIC REGRESSION Probability of a person complying for a mammogram, based on whether or not they get a physician s recommendation

CLASS EXAMPLE: LOGISTIC REGRESSION ^ e (B 1 x + B 0 ) p = 1 + e (B 1 x + B 0 ) Probability of complying if NOT recommended by physician: (2.29(0) - 1.84) ^ e p = (2.29(0) - 1.84) 1 + e = 0.14 = 0.61 Probability of complying if recommended by physician: (2.29(1) - 1.84) ^ e p = (2.29(1) - 1.84) 1 + e

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS Type of logistic regression that allows more than two discrete outcomes Outcomes are ordinal: Yes, maybe, no First, second, third place Gold, silver, bronze medals Strongly agree, agree, neutral, disagree, strongly disagree

ASSUMPTION No perfect predictions one predictor variable value cannot solely correspond to one dependent variable value check using crosstabs.

ORDERED LOGISTIC REGRESSION Load libraries: library(arm) library(psych) EXAMPLE Load data: pooj<-read.csv("http://www.ats.ucla.edu/ stat/r/dae/ologit.csv")

ORDERED LOGISTIC REGRESSION Variables: EXAMPLE apply college juniors reported likelihood of applying to grad school (0 = unlikely, 1 = somewhat likely, 2 = very likely) pared indicating whether at least one parent has a graduate degree (0 = no, 1 = yes) public indicating whether the undergraduate institution is a public or private (0 = private, 1 = public) gpa college GPA

> str(pooj) 'data.frame': 400 obs. of 4 variables: $ apply : int 2 1 0 1 1 0 1 1 0 1... $ pared : int 0 1 1 0 0 0 0 0 0 1... $ public: int 0 0 1 0 0 1 0 0 0 0... $ gpa : num 3.26 3.21 3.94 2.81 2.53... > table(pooj$apply) 0 1 2 220 140 40 > table(pooj$pared) 0 1 337 63 > table(pooj$public) 0 1 343 57

CHECK ASSUMPTION CROSS-TABS > xtabs(~pooj$pared+pooj$apply) pooj$apply pooj$pared 0 1 2 0 200 110 27 1 20 30 13 > xtabs(~pooj$public+pooj$apply) pooj$apply pooj$public 0 1 2 0 189 124 30 1 31 16 10 Why is this important?

SINGLE PREDICTOR MODEL - GPA > library(arm) > summary(m1<-bayespolr(as.ordered(pooj$apply)~pooj$gpa)) Call: bayespolr(formula = as.ordered(pooj$apply) ~ pooj$gpa) Coefficients: Value Std. Error t value pooj$gpa 0.7109 0.2471 2.877 Intercepts: Value Std. Error t value 0 1 2.3306 0.7502 3.1065 1 2 4.3505 0.7744 5.6179 Residual Deviance: 737.6921 AIC: 743.6921

CUMULATIVE DISTRIBUTION FUNCTION 0 1 1 2

LABELING COEFFICIENTS Coefficients: Value Std. Error t value pooj$gpa 0.7109 0.2471 2.877 Intercepts: Value Std. Error t value 0 1 2.3306 0.7502 3.1065 1 2 4.3505 0.7744 5.6179 Coefficient of the model coef<m1$coef Intercepts of the model intercept <- m1$zeta Let us look at the likelihood of students with an average GPA applying to graduate school. > x<-mean(pooj$gpa) [1] 2.998925

TRANSFORMING OUTCOMES TO PROBABILITIES prob<-function(input){exp(input)/ (1+exp(input))} (p0<-prob(intercept[1]-coef*x)) 0.5493198 (p1<-prob(intercept[2]-coef*x)-p0) 0.3525213 (p2<-1-(p0+p1)) 0.0981589

WHY NOT USE LINEAR REGRESSION? > summary(linreg<-lm(pooj$apply~pooj$gpa)) Call: lm(formula = pooj$apply ~ pooj$gpa) Residuals: Min 1Q Median 3Q Max -0.7917-0.5554-0.3962 0.4786 1.6012 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -0.22016 0.25224-0.873 0.38329 pooj$gpa 0.25681 0.08338 3.080 0.00221 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.6628 on 398 degrees of freedom Multiple R-squared: 0.02328, Adjusted R-squared: 0.02083 F-statistic: 9.486 on 1 and 398 DF, p-value: 0.002214

AND OUR ASSUMPTIONS AREN T MET

LINEAR REGRESSION VERSUS ORDERED LOGISTIC REGRESSION The decision between linear regression and ordered multinomial regression is not always black and white. When you have a large number of categories that can be considered equally spaced simple linear regression is an optional alternative (Gelman & Hill, 2007). Moral of story: Always start by checking the assumptions of the model.

USING MULTIPLE PREDICTORS summary(m2 <- bayespolr(as.ordered(apply)~gpa + pared + public,pooj)) Call: bayespolr(formula = as.ordered(apply) ~ gpa + pared + public, pooj) Coefficients: Value Std. Error t value gpa 0.6041463 0.2577039 2.3443424 pared 1.0274106 0.2636348 3.8970973 public -0.0528103 0.2931885-0.1801240 Intercepts: Value Std. Error t value 0 1 2.1638 0.7710 2.8064 1 2 4.2518 0.7955 5.3449 Residual Deviance: 727.002 AIC: 737.002

TRANSFORMING OUTCOMES TO PROBABILITIES (coef<- m2$coef) gpa pared public 0.6041463 1.0274106-0.0528103 (intercept<-m2$zeta) 0 1 1 2 2.163841 4.251774 (x1<-cbind(0:4, 0,.14)) [,1] [,2] [,3] [1,] 0 0 0.14 [2,] 1 0 0.14 [3,] 2 0 0.14 [4,] 3 0 0.14 [5,] 4 0 0.14 (x2<-cbind(0:4, 1,.14)) [,1] [,2] [,3] [1,] 0 1 0.14 [2,] 1 1 0.14 [3,] 2 1 0.14 [4,] 3 1 0.14 [5,] 4 1 0.14

TRANSFORMING OUTCOMES TO PROBABILITIES prob<-function(var){exp(var)/(1+exp(var))} > (p1<-prob(intercept[1]-x1 %*% coef)) [,1] [1,] 0.9119769 [2,] 0.8498732 [3,] 0.7556908 [4,] 0.6282669 [5,] 0.4801055 > (p2<-prob(intercept[2]-x1 %*% coef)-p1) [,1] [1,] 0.07538029 [2,] 0.12722869 [3,] 0.20318345 [4,] 0.29895089 [5,] 0.39428044 > (p3<-1-(p1+p2)) [,1] [1,] 0.01264281 [2,] 0.02289816 [3,] 0.04112575 [4,] 0.07278223 [5,] 0.12561404

TRANSFORMING OUTCOMES TO PROBABILITIES > (p4<-prob(intercept[1]-x2 %*% coef)) [,1] [1,] 0.7876055 [2,] 0.6695483 [3,] 0.5254116 [4,] 0.3769123 [5,] 0.2484150 > (p5<-prob(intercept[2]-x2 %*% coef)-p1) [,1] [1,] 0.05348287 [2,] 0.08867445 [3,] 0.13730004 [4,] 0.19186675 [5,] 0.23347632 > (p6<-1-(p4+p5)) [,1] [1,] 0.1589117 [2,] 0.2417772 [3,] 0.3372883 [4,] 0.4312209 [5,] 0.5181087

PLOTTING THE RESULTS Undergrad.GPA <-0:4 plot(undergrad.gpa, p1, type="l", col=1, ylim=c(0,1)) lines(0:4, p2, col=2) lines(0:4, p3, col=3) lines(0:4, p4, col=1, lty = 2) lines(0:4, p5, col=2, lty = 2) lines(0:4, p6, col=3, lty = 2) legend(1.5, 1, legend=c("p(unlikely)", "P(somewhat likely)", "P(very likely)", "Line Type when Pared = 0", "Line Type when Pared = 1"), col=c(1:3,1,1), lty=c(1,1,1,1,2))

PRACTICE Read in the following table (Quinn, n.d.): practice <- read.table("http:// www.stat.washington.edu/quinn/classes/536/data/ nes96r.dat", header=true) Task: Run a regression using the ordered multinomial logistic model to predict the variation in the dependent variable ClinLR using the independent variables PID and educ. ClinLR = Ordinal variable from 1-7 indicating ones view of Bill Clinton s political leanings, where 1 = extremely liberal, 2 = liberal, 3 = slightly liberal, 4 = moderate, 5= slightly conservative, 6 = conservative, 6 = extremely conservative. PID = Ordinal variable from 0-6 indicating ones own political identification, where 0 = Strong Democrat and 6 = Strong Republican educ = Ordinal variable from 1-7 indicating ones own level of education, where 1 = 8 grades or less and no diploma, 2 = 9-11 grades, no further schooling, 3 = High school diploma or equivalency test, 4 = More than 12 years of schooling, no higher degree, 5 = Junior or community college level degree (AA degrees), 6 = BA level degrees; 17+ years, no postgraduate degree, 7 = Advanced degree

REFERENCES Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press. Quinn, K. (n.d.). Retrieved from http://www.stat.washington.edu/quinn/classes/536/ data/nes96r.dat UCLA: Academic Technology Services. (n.d.). Retrieved from http://www.ats.ucla.edu/stat/r/dae/ologit.csv