ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS Pooja Shivraj Southern Methodist University
KINDS OF REGRESSION ANALYSES Linear Regression Logistic Regression Dichotomous dependent variable (yes/no, died/ didn t die, at risk/not at risk, etc.) Predicts the probability of a person belonging in that category.
QUICK REVIEW: LOGISTIC REGRESSION Values calculated from linear regression are continuous need to be transformed on a 0-1 scale to represent probability since 0 p 1 Logistic regression probability calculated by: ^ e (B 1 x + B 0 ) p = 1 + e (B 1 x + B 0 )
CLASS EXAMPLE: LOGISTIC REGRESSION Probability of a person complying for a mammogram, based on whether or not they get a physician s recommendation
CLASS EXAMPLE: LOGISTIC REGRESSION ^ e (B 1 x + B 0 ) p = 1 + e (B 1 x + B 0 ) Probability of complying if NOT recommended by physician: (2.29(0) - 1.84) ^ e p = (2.29(0) - 1.84) 1 + e = 0.14 = 0.61 Probability of complying if recommended by physician: (2.29(1) - 1.84) ^ e p = (2.29(1) - 1.84) 1 + e
ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS Type of logistic regression that allows more than two discrete outcomes Outcomes are ordinal: Yes, maybe, no First, second, third place Gold, silver, bronze medals Strongly agree, agree, neutral, disagree, strongly disagree
ASSUMPTION No perfect predictions one predictor variable value cannot solely correspond to one dependent variable value check using crosstabs.
ORDERED LOGISTIC REGRESSION Load libraries: library(arm) library(psych) EXAMPLE Load data: pooj<-read.csv("http://www.ats.ucla.edu/ stat/r/dae/ologit.csv")
ORDERED LOGISTIC REGRESSION Variables: EXAMPLE apply college juniors reported likelihood of applying to grad school (0 = unlikely, 1 = somewhat likely, 2 = very likely) pared indicating whether at least one parent has a graduate degree (0 = no, 1 = yes) public indicating whether the undergraduate institution is a public or private (0 = private, 1 = public) gpa college GPA
> str(pooj) 'data.frame': 400 obs. of 4 variables: $ apply : int 2 1 0 1 1 0 1 1 0 1... $ pared : int 0 1 1 0 0 0 0 0 0 1... $ public: int 0 0 1 0 0 1 0 0 0 0... $ gpa : num 3.26 3.21 3.94 2.81 2.53... > table(pooj$apply) 0 1 2 220 140 40 > table(pooj$pared) 0 1 337 63 > table(pooj$public) 0 1 343 57
CHECK ASSUMPTION CROSS-TABS > xtabs(~pooj$pared+pooj$apply) pooj$apply pooj$pared 0 1 2 0 200 110 27 1 20 30 13 > xtabs(~pooj$public+pooj$apply) pooj$apply pooj$public 0 1 2 0 189 124 30 1 31 16 10 Why is this important?
SINGLE PREDICTOR MODEL - GPA > library(arm) > summary(m1<-bayespolr(as.ordered(pooj$apply)~pooj$gpa)) Call: bayespolr(formula = as.ordered(pooj$apply) ~ pooj$gpa) Coefficients: Value Std. Error t value pooj$gpa 0.7109 0.2471 2.877 Intercepts: Value Std. Error t value 0 1 2.3306 0.7502 3.1065 1 2 4.3505 0.7744 5.6179 Residual Deviance: 737.6921 AIC: 743.6921
CUMULATIVE DISTRIBUTION FUNCTION 0 1 1 2
LABELING COEFFICIENTS Coefficients: Value Std. Error t value pooj$gpa 0.7109 0.2471 2.877 Intercepts: Value Std. Error t value 0 1 2.3306 0.7502 3.1065 1 2 4.3505 0.7744 5.6179 Coefficient of the model coef<m1$coef Intercepts of the model intercept <- m1$zeta Let us look at the likelihood of students with an average GPA applying to graduate school. > x<-mean(pooj$gpa) [1] 2.998925
TRANSFORMING OUTCOMES TO PROBABILITIES prob<-function(input){exp(input)/ (1+exp(input))} (p0<-prob(intercept[1]-coef*x)) 0.5493198 (p1<-prob(intercept[2]-coef*x)-p0) 0.3525213 (p2<-1-(p0+p1)) 0.0981589
WHY NOT USE LINEAR REGRESSION? > summary(linreg<-lm(pooj$apply~pooj$gpa)) Call: lm(formula = pooj$apply ~ pooj$gpa) Residuals: Min 1Q Median 3Q Max -0.7917-0.5554-0.3962 0.4786 1.6012 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -0.22016 0.25224-0.873 0.38329 pooj$gpa 0.25681 0.08338 3.080 0.00221 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.6628 on 398 degrees of freedom Multiple R-squared: 0.02328, Adjusted R-squared: 0.02083 F-statistic: 9.486 on 1 and 398 DF, p-value: 0.002214
AND OUR ASSUMPTIONS AREN T MET
LINEAR REGRESSION VERSUS ORDERED LOGISTIC REGRESSION The decision between linear regression and ordered multinomial regression is not always black and white. When you have a large number of categories that can be considered equally spaced simple linear regression is an optional alternative (Gelman & Hill, 2007). Moral of story: Always start by checking the assumptions of the model.
USING MULTIPLE PREDICTORS summary(m2 <- bayespolr(as.ordered(apply)~gpa + pared + public,pooj)) Call: bayespolr(formula = as.ordered(apply) ~ gpa + pared + public, pooj) Coefficients: Value Std. Error t value gpa 0.6041463 0.2577039 2.3443424 pared 1.0274106 0.2636348 3.8970973 public -0.0528103 0.2931885-0.1801240 Intercepts: Value Std. Error t value 0 1 2.1638 0.7710 2.8064 1 2 4.2518 0.7955 5.3449 Residual Deviance: 727.002 AIC: 737.002
TRANSFORMING OUTCOMES TO PROBABILITIES (coef<- m2$coef) gpa pared public 0.6041463 1.0274106-0.0528103 (intercept<-m2$zeta) 0 1 1 2 2.163841 4.251774 (x1<-cbind(0:4, 0,.14)) [,1] [,2] [,3] [1,] 0 0 0.14 [2,] 1 0 0.14 [3,] 2 0 0.14 [4,] 3 0 0.14 [5,] 4 0 0.14 (x2<-cbind(0:4, 1,.14)) [,1] [,2] [,3] [1,] 0 1 0.14 [2,] 1 1 0.14 [3,] 2 1 0.14 [4,] 3 1 0.14 [5,] 4 1 0.14
TRANSFORMING OUTCOMES TO PROBABILITIES prob<-function(var){exp(var)/(1+exp(var))} > (p1<-prob(intercept[1]-x1 %*% coef)) [,1] [1,] 0.9119769 [2,] 0.8498732 [3,] 0.7556908 [4,] 0.6282669 [5,] 0.4801055 > (p2<-prob(intercept[2]-x1 %*% coef)-p1) [,1] [1,] 0.07538029 [2,] 0.12722869 [3,] 0.20318345 [4,] 0.29895089 [5,] 0.39428044 > (p3<-1-(p1+p2)) [,1] [1,] 0.01264281 [2,] 0.02289816 [3,] 0.04112575 [4,] 0.07278223 [5,] 0.12561404
TRANSFORMING OUTCOMES TO PROBABILITIES > (p4<-prob(intercept[1]-x2 %*% coef)) [,1] [1,] 0.7876055 [2,] 0.6695483 [3,] 0.5254116 [4,] 0.3769123 [5,] 0.2484150 > (p5<-prob(intercept[2]-x2 %*% coef)-p1) [,1] [1,] 0.05348287 [2,] 0.08867445 [3,] 0.13730004 [4,] 0.19186675 [5,] 0.23347632 > (p6<-1-(p4+p5)) [,1] [1,] 0.1589117 [2,] 0.2417772 [3,] 0.3372883 [4,] 0.4312209 [5,] 0.5181087
PLOTTING THE RESULTS Undergrad.GPA <-0:4 plot(undergrad.gpa, p1, type="l", col=1, ylim=c(0,1)) lines(0:4, p2, col=2) lines(0:4, p3, col=3) lines(0:4, p4, col=1, lty = 2) lines(0:4, p5, col=2, lty = 2) lines(0:4, p6, col=3, lty = 2) legend(1.5, 1, legend=c("p(unlikely)", "P(somewhat likely)", "P(very likely)", "Line Type when Pared = 0", "Line Type when Pared = 1"), col=c(1:3,1,1), lty=c(1,1,1,1,2))
PRACTICE Read in the following table (Quinn, n.d.): practice <- read.table("http:// www.stat.washington.edu/quinn/classes/536/data/ nes96r.dat", header=true) Task: Run a regression using the ordered multinomial logistic model to predict the variation in the dependent variable ClinLR using the independent variables PID and educ. ClinLR = Ordinal variable from 1-7 indicating ones view of Bill Clinton s political leanings, where 1 = extremely liberal, 2 = liberal, 3 = slightly liberal, 4 = moderate, 5= slightly conservative, 6 = conservative, 6 = extremely conservative. PID = Ordinal variable from 0-6 indicating ones own political identification, where 0 = Strong Democrat and 6 = Strong Republican educ = Ordinal variable from 1-7 indicating ones own level of education, where 1 = 8 grades or less and no diploma, 2 = 9-11 grades, no further schooling, 3 = High school diploma or equivalency test, 4 = More than 12 years of schooling, no higher degree, 5 = Junior or community college level degree (AA degrees), 6 = BA level degrees; 17+ years, no postgraduate degree, 7 = Advanced degree
REFERENCES Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press. Quinn, K. (n.d.). Retrieved from http://www.stat.washington.edu/quinn/classes/536/ data/nes96r.dat UCLA: Academic Technology Services. (n.d.). Retrieved from http://www.ats.ucla.edu/stat/r/dae/ologit.csv