Addiction - Multinomial Model

Similar documents
Case Study: Applying Generalized Linear Models

Ordinal and categorical variables

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction

############################ ### toxo.r ### ############################

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS. Pooja Shivraj Southern Methodist University

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Logistic Regression. Logistic Regression Theory

Logistic Regression with R: Example One

Generalized Linear Models

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

> budworm$samplogit < log((budworm$y+0.5)/(budworm$m budworm$y+0.5))

Using R to Create Synthetic Discrete Response Regression Models

Logit Analysis. Using vttown.dta. Albert Satorra, UPF

Statistics 175 Applied Statistics Generalized Linear Models Jianqing Fan

Stat 401XV Exam 3 Spring 2017

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Step 1: Load the appropriate R package. Step 2: Fit a separate mixed model for each independence claim in the basis set.

Introduction to General and Generalized Linear Models

AIC = Log likelihood = BIC =

Chapter 8 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (May 1, 2010)

Logistic Regression II

MODEL SELECTION CRITERIA IN R:

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Random Effects ANOVA

Building and Checking Survival Models

Stat 328, Summer 2005

Generalized Linear Models

WesVar uses repeated replication variance estimation methods exclusively and as a result does not offer the Taylor Series Linearization approach.

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

boxcox() returns the values of α and their loglikelihoods,

Lecture Note: Analysis of Financial Time Series Spring 2008, Ruey S. Tsay. Seasonal Time Series: TS with periodic patterns and useful in

Didacticiel - Études de cas. In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA.

Model fit assessment via marginal model plots

> attach(grocery) > boxplot(sales~discount, ylab="sales",xlab="discount")

Generalized Multilevel Regression Example for a Binary Outcome

Study 2: data analysis. Example analysis using R

1 Stat 8053, Fall 2011: GLMMs

Bradley-Terry Models. Stat 557 Heike Hofmann

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

Lecture 21: Logit Models for Multinomial Responses Continued

Morten Frydenberg Wednesday, 12 May 2004

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Multiple regression - a brief introduction

Logit Models for Binary Data

The SAS System 11:03 Monday, November 11,

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7

Predicting Charitable Contributions

Empirical Asset Pricing for Tactical Asset Allocation

Appendix. Table A.1 (Part A) The Author(s) 2015 G. Chakrabarti and C. Sen, Green Investing, SpringerBriefs in Finance, DOI /

Intro to GLM Day 2: GLM and Maximum Likelihood

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Lapse Modeling for the Post-Level Period

Lecture 1: Empirical Properties of Returns

FAV i R This paper is produced mechanically as part of FAViR. See for more information.

Non-linearities in Simple Regression

Maximum Likelihood Estimation

Credit Risk Modelling

Creation of Synthetic Discrete Response Regression Models

To be two or not be two, that is a LOGISTIC question

STK Lecture 7 finalizing clam size modelling and starting on pricing

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

Girma Tefera*, Legesse Negash and Solomon Buke. Department of Statistics, College of Natural Science, Jimma University. Ethiopia.

Two Way ANOVA in R Solutions

STA 4504/5503 Sample questions for exam True-False questions.

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

book 2014/5/6 15:21 page 261 #285

GLM III - The Matrix Reloaded

is the bandwidth and controls the level of smoothing of the estimator, n is the sample size and

MCMC Package Example

Modelling the potential human capital on the labor market using logistic regression in R

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Final Exam Suggested Solutions

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

CREDIT RISK MODELING IN R. Logistic regression: introduction

Log-linear Modeling Under Generalized Inverse Sampling Scheme

Projects for Bayesian Computation with R

Analysis of Variance in Matrix form

6 Multiple Regression

Internet Appendix to The Booms and Busts of Beta Arbitrage

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

σ e, which will be large when prediction errors are Linear regression model

Regression and Simulation

Lecture Note of Bus 41202, Spring 2010: Analysis of Multiple Series with Applications. x 1t x 2t. holdings (OIH) and energy select section SPDR (XLE).

SEX DISCRIMINATION PROBLEM

Determination of the Optimal Stratum Boundaries in the Monthly Retail Trade Survey in the Croatian Bureau of Statistics

LAMPIRAN. Null Hypothesis: LO has a unit root Exogenous: Constant Lag Length: 1 (Automatic based on SIC, MAXLAG=13)

Labor Supply and Taxation in Europe

Regression Model Assumptions Solutions

Example 1 of econometric analysis: the Market Model

Window Width Selection for L 2 Adjusted Quantile Regression

11. Logistic modeling of proportions

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Americans AV Preferences: Dynamic Ride-Sharing, Privacy & Long-Distance Mode Choices. Dr. Kara Kockelman & Krishna Murthy Gurumurthy

Duration Models: Parametric Models

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Transcription:

Addiction - Multinomial Model February 8, 2012 First the addiction data are loaded and attached. > library(catdata) > data(addiction) > attach(addiction) For the multinomial logit model the function multinom from the nnet package is used. > library(nnet) The response ill has to be used as factor. > ill <- as.factor(ill) > addiction$ill<-as.factor(addiction$ill) The first model is a model with the covariates gender, university and a linear effect of age > multinom0 <- multinom(ill ~ gender + age + university, data=addiction) # weights: 15 (8 variable) initial value 749.253581 iter 10 value 675.937605 final value 675.208456 converged > summary(multinom0) multinom(formula = ill ~ gender + age + university, data = addiction) (Intercept) gender age university 1-1.160717 0.4366061 0.02991096 1.622052 2-2.015571 0.2879080 0.04208660 1.067295 Std. Errors: (Intercept) gender age university 1 0.2654366 0.1938408 0.006235135 0.2534615 2 0.3076299 0.2207805 0.006821200 0.2891136 Residual Deviance: 1350.417 AIC: 1366.417 1

Another possibility to fit multinomial response models is given by the function vglm from the package VGAM. > library(vgam) > multivgam0<-vglm(ill ~ gender + age + university, multinomial(reflevel=1), + data=addiction) > summary(multivgam0) vglm(formula = ill ~ gender + age + university, family = multinomial(reflevel = 1), data = addiction) Pearson Residuals: Min 1Q Median 3Q Max log(mu[,2]/mu[,1]) -4.4464-0.83311-0.41954 0.99377 1.5516 log(mu[,3]/mu[,1]) -4.2426-0.55806-0.27917-0.18371 2.4954 Value Std. Error t value (Intercept):1-1.160714 0.2654346-4.3729 (Intercept):2-2.015564 0.3076272-6.5520 gender:1 0.436607 0.1938397 2.2524 gender:2 0.287912 0.2207791 1.3041 age:1 0.029911 0.0062350 4.7972 age:2 0.042086 0.0068211 6.1700 university:1 1.622048 0.2534585 6.3997 university:2 1.067287 0.2891095 3.6916 Number of linear predictors: 2 Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1]) Dispersion Parameter for multinomial family: 1 Residual Deviance: 1350.417 on 1356 degrees of freedom Log-likelihood: -675.2085 on 1356 degrees of freedom Number of Iterations: 4 Both models yield the same parameter estimates. The second model includes an additional quadratic effect of age. > addiction$age2 <- addiction$age^2 > multinom1 <- update(multinom0,. ~. + age2) # weights: 18 (10 variable) initial value 749.253581 iter 10 value 666.374546 final value 658.875161 converged 2

> summary(multinom1) multinom(formula = ill ~ gender + age + university + age2, data = addiction) (Intercept) gender age university age2 1-3.720298 0.5264935 0.1840509 1.4546712-0.001891845 2-3.502998 0.3562860 0.1357464 0.9362573-0.001173966 Std. Errors: (Intercept) gender age university age2 1 0.011047538 0.1023630 0.008783214 0.11373313 0.0001533591 2 0.008699935 0.0827317 0.009064134 0.09599875 0.0001540031 Residual Deviance: 1317.75 AIC: 1337.75 > multivgam1<-vglm(ill ~ gender + age + university + age2, multinomial(reflevel=1), + data=addiction) > summary(multivgam1) vglm(formula = ill ~ gender + age + university + age2, family = multinomial(reflevel = 1), data = addiction) Pearson Residuals: Min 1Q Median 3Q Max log(mu[,2]/mu[,1]) -3.4647-0.69123-0.35630 0.85570 2.7077 log(mu[,3]/mu[,1]) -2.8800-0.48233-0.28217-0.18006 2.8677 Value Std. Error t value (Intercept):1-3.7202408 0.54661481-6.8060 (Intercept):2-3.5029582 0.59581914-5.8792 gender:1 0.5264746 0.20083037 2.6215 gender:2 0.3562789 0.22432535 1.5882 age:1 0.1840478 0.02860279 6.4346 age:2 0.1357440 0.03010190 4.5095 university:1 1.4546676 0.25770640 5.6447 university:2 0.9362483 0.29040051 3.2240 age2:1-0.0018918 0.00033580-5.6336 age2:2-0.0011739 0.00033989-3.4539 Number of linear predictors: 2 Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1]) Dispersion Parameter for multinomial family: 1 Residual Deviance: 1317.75 on 1354 degrees of freedom 3

Log-likelihood: -658.8752 on 1354 degrees of freedom Number of Iterations: 4 It should be noted that the standard errors for the models generated by nnet and VGAM differ when age is included quadratically. The parameter estimates are equal again. Now the necessity of the quadratic term is tested by using the function anova. > anova(multinom0,multinom1) Likelihood ratio tests of Multinomial Models Response: ill Model Resid. df Resid. Dev Test Df LR stat. 1 gender + age + university 1356 1350.417 2 gender + age + university + age2 1354 1317.750 1 vs 2 2 32.66659 Pr(Chi) 1 2 8.063801e-08 > multinom1$dev - multinom0$dev [1] -32.66659 Now we plot the probabilities for the responses against age. First a sequence within the range of age has to be created. > minage <- min(na.omit(age)) > maxage <- max(na.omit(age)) > ageindex <- seq(minage, maxage, 0.1) > n <- length(ageindex) Now the vectors for the other covariates and the data sets for men and women are built. > ageindex2 <- ageindex^2 > gender1 <- rep(1, n) > gender0 <- rep(0, n) > university1 <- rep(1, n) > datamale <- as.data.frame(cbind(gender=gender0,age=ageindex,university= + university1,age2=ageindex2)) > datafemale <- as.data.frame(cbind(gender=gender1,age=ageindex,university= + university1,age2=ageindex2)) Now for the built data sets the probabilities based on model multinom1 are computed. > probsmale <- predict(multinom1, datamale, type="probs") > probsfemale <- predict(multinom1, datafemale, type="probs") 4

Now the probabilities can be plotted. > par(cex=1.4, lwd=2) > plot(ageindex, probsmale[,1], type="l", lty=1, ylim=c(0,1), main= + "men with university degree", ylab="probabilities") > lines(ageindex, probsmale[,2], lty="dotted") > lines(ageindex, probsmale[,3], lty="dashed") > legend("topright", legend=c("weak-willed", "diseased", "both"), lty=c("solid", + "dotted", "dashed")) men with university degree probabilities 0.0 0.2 0.4 0.6 0.8 1.0 Weak willed diseased both 20 40 60 80 ageindex > par(cex=1.4, lwd=2) > plot(ageindex, probsfemale[,1], type="l", lty=1, ylim=c(0,1), main= + "women with university degree", ylab="probabilities") > lines(ageindex, probsfemale[,2], lty="dotted") > lines(ageindex, probsfemale[,3], lty="dashed") > legend("topright", legend=c("weak-willed", "diseased", "both"), + lty=c("solid", "dotted", "dashed")) 5

women with university degree probabilities 0.0 0.2 0.4 0.6 0.8 1.0 Weak willed diseased both 20 40 60 80 ageindex 6