I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Size: px

Start display at page:

Download "I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN"

Kelley McKenzie
5 years ago
Views:

1 Modeling Counts & ZIP: Extended Example Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Modeling Counts Slide 1 of 36

2 Outline Outline A little exploratory analysis. Revised models Zero inflated models (something new) Modeling Counts Slide 2 of 36

3 The data are from: Espelage, D.L., Holt, M.K., & Henkel, R.R. (2004). Examination of peer-group contextual effects on aggression during early adolescence. Child Development, 74, Two ways to measure bullying Self Report: 9 item Illinois Bully Scale (Espelage & Holt, 2001). Peer nominations: Kids list everyone who they view as a bully. The total number of nominations a child receives is a measure of bullying that child s bullying. Peer nominations more objective than self report and it s getting harder to obtain IRB approval of peer nominations. Model peer nominations (a count) with self report measure (bully scale) as a predictor variable.... ignoring clustering... Modeling Counts Slide 3 of 36

4 The Predictor Variable The Predictor Variable SAS for these SAS for Distribution of Peer Nominations SAS for Bully Scale Relationship Between the Measures We Know So Far That... Modeling Counts Slide 4 of 36

5 The Predictor Variable The Predictor Variable The Predictor Variable SAS for these SAS for Distribution of Peer Nominations SAS for Bully Scale Relationship Between the Measures We Know So Far That... Modeling Counts Slide 5 of 36

6 The Predictor Variable The Predictor Variable The Predictor Variable SAS for these SAS for Distribution of Peer Nominations SAS for Bully Scale Relationship Between the Measures We Know So Far That... Modeling Counts Slide 6 of 36

7 SAS for these The Predictor Variable The Predictor Variable SAS for these SAS for Distribution of Peer Nominations SAS for Bully Scale Relationship Between the Measures We Know So Far That... data bullynom; input BULLYSC BULLYNM; datalines; run; Modeling Counts Slide 7 of 36

8 SAS for Distribution of Peer Nominations The Predictor Variable The Predictor Variable SAS for these SAS for Distribution of Peer Nominations SAS for Bully Scale Relationship Between the Measures We Know So Far That... options nogstyle; ods select Quantiles MyHist; proc univariate data=bullynom; var BULLYNM; histogram BULLYNM/ cfill=ltgray midpoints = name= MyHist ; inset n= Sample Size mean= Mean std= Standard Deviation / position=ne; Title Distribution of Number of Bully Nominations ; run; options gstyle; Modeling Counts Slide 8 of 36

9 SAS for Bully Scale The Predictor Variable The Predictor Variable SAS for these SAS for Distribution of Peer Nominations SAS for Bully Scale Relationship Between the Measures We Know So Far That... options nogstyle; ods select Quantiles BullyScaleHist; proc univariate data=bullynom; var bullysc; histogram bullysc / lognormal gamma cfill=ltgray name= BullyScaleHist ; inset n= Sample Size mean= Mean std= Standard Deviation min= Mimimumn max= Maximum / position=ne; inset lognormal gamma / position=e; Title Distribtuion of the Self Report Scale of Bullyness ; run; options gstyle; Modeling Counts Slide 9 of 36

10 Relationship Between the Measures The Predictor Variable The Predictor Variable SAS for these SAS for Distribution of Peer Nominations SAS for Bully Scale Relationship Between the Measures We Know So Far That... Modeling Counts Slide 10 of 36

11 We Know So Far That... The Predictor Variable The Predictor Variable SAS for these SAS for Distribution of Peer Nominations SAS for Bully Scale Relationship Between the Measures We Know So Far That... Both variables are highly positively skewed. There are a lot of kids who did not receive any peer nominations. There does appear to be a relationship between peer nominations and scale score. Mean peer nominations is much smaller than the variance: 2.49 < Modeling Counts Slide 11 of 36

12 Fit of Poisson Regression Model Fit of Marginal Distribution Starting Model: Random Component: Y ij = the number of nominations received by kid i in peer group j. Poisson distribution. Linear Predictor: β 0 + β 1 (bullysc) ij = β 0 + β 1 x ij The Link is the Log, the canonical link. The initial models is a standard Poisson regression model E(Y ij ) = µ ij = exp[β 0 + β 1 x ij ] where P(Y ij = y) = e µ ij µ y ij y ij! Modeling Counts Slide 12 of 36

13 Fit of Poisson Regression Model (model fit and then grouped to look at fit) Fit of Poisson Regression Model Fit of Marginal Distribution Modeling Counts Slide 13 of 36

14 Fit of Marginal Distribution Fit of Poisson Regression Model Fit of Marginal Distribution Modeling Counts Slide 14 of 36

15 Fit Statistics & Parameter Fit of Negative Binomial Model to Data Fit of Marginal Distribution Change of the Link Function Fit Statistics & Parameter Fit of Negative Binomial Model w/ Identity Fit of Marginal Distribution w/ Identity To deal with the overdispersion, we ll change the random component to Negative Binomial. The GLM model has Random= Negative Binominal Linear predictor= β 0 + β 1 x ij. log link Our next model is Y ij = µ ij ǫ ij = exp[β 0 + β 1 x ij ] ǫ ij }{{}}{{} Poisson Gamma where E(ǫ ij ) = 1 var(ǫ ij ) = 1/φ (φ is the dispersion parameter). E(Y ij x ij ) = µ ij = exp[β 0 + β 1 x ij ] var(y ij x ij ) = µ ij + µ 2 ij /φ and P(Y ij = y) = Γ(y + φ) y!γ(φ) ( φ φ + µ ij ) φ ( µij φ + µ ij ) y Modeling Counts Slide 15 of 36

16 Fit Statistics & Parameter df = 289 for all of these Dist Link G 2 X 2 X 2 /df AIC BIC Poisson log NegBin log Fit Statistics & Parameter Fit of Negative Binomial Model to Data Fit of Marginal Distribution Change of the Link Function Fit Statistics & Parameter Fit of Negative Binomial Model w/ Identity Fit of Marginal Distribution w/ Identity Poisson Negative Binomial Parm est. se Wald p est. se Wald p β < <.01 β < <.01 1/φ For interpretation, exp(.81) = 2.25 and exp(1.09) = 2.98 Modeling Counts Slide 16 of 36

17 Fit of Negative Binomial Model to Data Fit Statistics & Parameter Fit of Negative Binomial Model to Data Fit of Marginal Distribution Change of the Link Function Fit Statistics & Parameter Fit of Negative Binomial Model w/ Identity Fit of Marginal Distribution w/ Identity Modeling Counts Slide 17 of 36

18 Fit of Marginal Distribution Fit Statistics & Parameter Fit of Negative Binomial Model to Data Fit of Marginal Distribution Change of the Link Function Fit Statistics & Parameter Fit of Negative Binomial Model w/ Identity Fit of Marginal Distribution w/ Identity Modeling Counts Slide 18 of 36

19 Change of the Link Function The relationship between Y ij and x ij looks like a straight line... The New GLM: Fit Statistics & Parameter Fit of Negative Binomial Model to Data Fit of Marginal Distribution Change of the Link Function Fit Statistics & Parameter Fit of Negative Binomial Model w/ Identity Fit of Marginal Distribution w/ Identity Negative Binomial β 0 + β 1 x ij Identity Link function This model is E(ǫ ij ) = 1 var(ǫ ij ) = 1/φ Y ij = µ ij ǫ ij = (β 0 + β 1 x ij ) }{{} ǫ ij }{{} Poisson Gamma E(Y ij x ij ) = µ ij = β 0 + β 1 x ij and P(Y ij = y) = Γ(y + φ) y!γ(φ) ( φ φ + µ ij ) φ ( µij φ + µ ij ) y Modeling Counts Slide 19 of 36

20 Fit Statistics & Parameter df = 289 for all of these Fit Statistics & Parameter Fit of Negative Binomial Model to Data Fit of Marginal Distribution Change of the Link Function Fit Statistics & Parameter Fit of Negative Binomial Model w/ Identity Fit of Marginal Distribution w/ Identity Dist Link G 2 X 2 X 2 /df AIC BIC Poisson log NegBin log NegBin Identity Log Link Identity Link Parm est. se Wald p est. se Wald p β < <.01 β < <.01 1/φ For interpretation, a one unit chance in bully scale leads to exp(1.09) = 2.98 times larger or 3.07 more nominations Modeling Counts Slide 20 of 36

21 Fit of Negative Binomial Model w/ Identity Fit Statistics & Parameter Fit of Negative Binomial Model to Data Fit of Marginal Distribution Change of the Link Function Fit Statistics & Parameter Fit of Negative Binomial Model w/ Identity Fit of Marginal Distribution w/ Identity Modeling Counts Slide 21 of 36

22 Fit of Marginal Distribution w/ Identity Fit Statistics & Parameter Fit of Negative Binomial Model to Data Fit of Marginal Distribution Change of the Link Function Fit Statistics & Parameter Fit of Negative Binomial Model w/ Identity Fit of Marginal Distribution w/ Identity Modeling Counts Slide 22 of 36

23 The bully scale can reasonably be used lieu of the peer nominations. Support from this comes from The similarity of the marginal distributions for the two measures (both positively skewed). Goodness of fit of the negative binomial regression with identity link function. Qualifications (i.e., more to be done): Add in other variables known to be related to bullying (e.g., gender) to try to account for extra variability (i.e, systematic vs random). More modeling that takes into account peer groupings (i.e., see whether there are errors or systematic differences between peer groups). Modeling Counts Slide 23 of 36

24 Models for situations where there might be two underlying types or groups: one group that follows the regression model and the other that just gives 0 s. Recommended supplemental reading: Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match Long, J.S. (1997). Regression Models for Categorical and Limited Dependent Variables. Donald Erdman, Laura Jackson, Arthur Sinko (2008). Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure (Paper ). SAS Institute Inc., Cary, NC. PROC COUNTREG is in SAS v9.2 Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. NY: Chapman & Hall Modeling Counts Slide 24 of 36

25 Basic Zero Inflated Model (e.g., ZIP ) Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match The basic model is essentially a latent class type model of the form { π + (1 π)p(0 x ij ) for y = 0 P(Y ij = y x ij ) = (1 π)p(y x ij ) for y > 0 where π = the probability of being in the zero only type or class. P(0 x ij ) and P(y x ij ) are based on some model, such as Poisson or Negative Binomial regression. ZIP model is a zero inflated Poisson usually with a log link: { π + (1 π) exp( µij ) for y = 0 P(Y ij = y x ij ) = (1 π) exp( µ ij)µ y ij y! for y > 0 Modeling Counts Slide 25 of 36

26 ZIP Model (continued) Mean P(Y ij = y x ij ) = { π + (1 π) exp( µij ) for y = 0 (1 π) exp( µ ij)µ y ij y! for y > 0 Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match Variance: E(Y ij x ij ) = (0 π) + µ ij (1 π) = µ ij µ ij π var(y ij x ij ) = µ ij (1 π)(1 + µ ij π) Note that if π = 0, we simply have a standard Poisson regression with log link. Extending the ZIP model by noting that class membership is dichotomous, so we can do a logistic regression (or other model for binary data) on the probability of class membership, ( e.g., a logit ) model, πij log = γ o + γ 1 z 1ij γ q z qij 1 π ij Modeling Counts Slide 26 of 36

27 ZIP and Bully Nominations Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match Modeling Counts Slide 27 of 36

28 Extending the ZIP Since class membership is dichotomous, so we can do a logistic regression (or other model for binary data) on the probability of class membership For example, log ( πi 1 π i ) = γ o + γ 1 z 1i γ q z qi Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match For our Bully nominations, we could try ( ) πij log = γ o + γ 1 (bully scale) ij 1 π ij A comparison of how well various ZIP models fit the data: Model Dist. link for π df G 2 X 2 AIC BIC Poi log none Poi log logit Poi Ident logit Modeling Counts Slide 28 of 36

29 ZIP Model Parameter and How to interpret them: Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match ZIP w/o model for π ZIP With Logit model for π parm est se Wald p est se Wald p β < <.01 β < <.01 γ <.01 γ <.01 ZIP w/o model for π: exp(0.60) = 1.82 and ˆπ = exp(0.21) 1 + exp(0.21) =.55 ZIP With Logit model for π: exp(0.60) = 1.82 and ˆπ = exp( (bullysc) ij) 1 + exp( (bullysc) ij ) Note that exp(.77) = Modeling Counts Slide 29 of 36

30 ZIP w/ logit model and Bully Nominations Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match Modeling Counts Slide 30 of 36

31 Comparing all Fitted Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match Modeling Counts Slide 31 of 36

32 Comparing all Fitted Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match Modeling Counts Slide 32 of 36

33 Mix and Match You can also have a zero inflated Negative Binomial model. Basic Zero Inflated Model (e.g., ZIP ) ZIP Model (continued) ZIP and Bully Nominations Extending the ZIP ZIP Model Parameter ZIP w/ logit model and Bully Nominations Mix and Match You can specify a model other than logit for the mixing probability. You can do all this as a multi-level (random effects) model. In SAS: If you use v 9.1, to fit a ZIP you have to use PROC NLMIXED or PROC GENMOD with programing statements. If you use v 9.2, a ZIP can be fit easily using PROC GENMOD. For a zero inflated Negative Binomial you can use PROC NLMIXED. v 9.2, PROC COUNTREG in ETS; however, it doesn t appear to be in the version that I have. Documentation on it can be found at default/countreg toc.htm Modeling Counts Slide 33 of 36

34 These work for v 9.1 and beyond: /* Poisson Regression */ proc genmod data=bullynom; model bullynm = bullysc / link=log dist=poi type3; output out=genmodpoi pred=fitpoi upper=uppoi lower=lopoi stdreschi=res_poi; title1 Poisson Regression ; SAS v 9.1 using NLMIXED /* Negative Binomial Regression */ proc genmod data=bullynom; model bullynm = bullysc / link=log dist=negbin type3 ; output out=genmodnb pred=nbfit upper=nbup lower=nblo stdreschi=res_negbin; title1 Negative Binomial ; Modeling Counts Slide 34 of 36

35 These work for v 9.2 and beyond: /* Zero Inflated Poission Regression w/o model for inflation probability*/ proc genmod data=bullynom; model bullynm = bullysc / link=log dist=zip type3 obstats; zeromodel / link=logit; output out=zip1 pred=zipfit1 ; title1 Zero Inflated Poisson ; SAS v 9.1 using NLMIXED /* Zero Inflated Poission Regression */ proc genmod data=bullynom; model bullynm = bullysc / link=log dist=zip type3; zeromodel bullysc / link=logit; output out=zip2 pred=zipfit2 ; title1 ZIP w/ logit model for inflation probability ; Modeling Counts Slide 35 of 36

36 SAS v 9.1 using NLMIXED SAS v 9.1 using NLMIXED proc nlmixed data=bullynom; /* Some starting values */ parm beta0= beta1= a0=1; /* linear predictor for the inflation probability */ linpinf = a0 + a1*bullysc; /* infprob = inflation probability for zeros * / /* = logistic transform of the linear predictor*/ infprob = 1/(1+exp(-linpinf)); /* Poisson mean */ mu = exp( beta0 + beta1*bullysc); /* Build the ZIP log likelihood */ if bullynm=0 then ll = log(infprob + (1-infprob)*exp(-mu)); else ll = log((1-infprob)) - mu + bullynm*log(mu) - lgamma(bullynm + 1); model bullynm general(ll); title Zero Inflated Poisson regression ; SAS demo... Modeling Counts Slide 36 of 36

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Hierarchical Generalized Linear Models Measurement Incorporated Hierarchical Linear Models Workshop Hierarchical Generalized Linear Models So now we are moving on to the more advanced type topics. To begin