Causal effects in mediation analysis with limited-dependent variables

Size: px

Start display at page:

Download "Causal effects in mediation analysis with limited-dependent variables"

Elvin Gallagher
5 years ago
Views:

1 Causal effects in mediation analysis with limited-dependent variables By: Mårten Schultzberg Department of Statistics Uppsala University Supervisor: Fan Yang-Wallentin 2016

2 Contents 1 Introduction Mediation analysis in general Direct and indirect effects Counterfactual-based causal effects in mediation analysis Limited-dependent variable Research questions Methodology Mediation analysis The simple mediation model and its motivation Adding relations to the simple mediation model Estimation Two-group regression Limited-dependent variable analysis The Two-part model Mediation analysis with limited-dependent variables The counterfactual framework Effect notation and calculations for mediation Assumptions for causal effect of mediation models Mediation, two-part M Model formulation Estimation Derivation of effects Conditional expected value of Y Causal effects Mediation, two-part M and two-part Y Model formulation Estimation Derivation of causal effects Conditional expected values

3 4.3.2 Causal effects Monte Carlo simulations Synthetic models and true data generating processes Weak and Moderately strong effects model Censoring Model 1 - Two-part M Model 1 - Two-part M, Weak Model 1 - Two-part M, Moderately strong Model 2 - Two-part M, Two-part Y Model 2 - Two-part M, Two-part Y, Weak Model 2 - Two-part M, Two-part Y, Moderately strong Estimation Results Outcome variables Two-part M Two-part M, Two-part Y Sensitivity analysis Discussion 52 7 Conclusion 54 A Appendix - Derivation - Two-part M 59 B Appendix - Derivation - Twopart M, twopart Y 65 C Appendix - Mplus syntax 80 2

4 Abstract Mediation is used to separate direct and indirect effects of an exposure variable on an outcome variable. In this thesis, a mediation model is extended to account for censored mediator and outcome variable. The two-part framework is used to account for the censoring. The counterfactual based causal effects of this model are derived. A Monte Carlo study is performed to evaluate the behaviour of the causal effects accounting for censoring, together with a comparison with methods for estimating the causal effects without accounting for censoring. The results of the Monte Carlo study show that the effects accounting for censoring have substantially smaller bias when censoring is present. The proposed effects also seem to have a low cost with unbiased estimates for sample sizes as small as 100 for the two-part mediator model. In the case of limited mediator and outcome, sample sizes larger than 300 is required for reliable improvements. A small sensitivity analysis stresses the need of further development of the two-part models. Keywords: counterfactuals, two-part model, potential outcome 3

5 1 Introduction The introduction of this study will give a quick overview motivation of the study followed by the research questions. 1.1 Mediation analysis in general Mediation analysis is used to quantify the effects that an exposure variable has on an outcome variable, mediated by some intermediate variable. For example a gene that causes cancer, also causes increased cigarette usage that in turn causes cancer. The effect of the gene on cancer is mediated by cigarette usage. The intermediate variable cigarette usage is often called the mediator variable, or the mediator. The hypothesised relationships of a simple mediation model is that an exposure variable X causes some change in a mediator variable M that in turn causes a change in an outcome variable Y Hayes, The mediation analysis has become widely used in social sciences and biomedical studies especially since the influential paper by Baron and Kenny The claim of a possibility to open up the black-box, answering question such as Through what mechanism does X affect Y? or How does a change in X affect Y? is probably an explanation of the vast usage. For a thorough overview of traditional mediation analysis see Hayes In recent years the causal claims of these models and their limitations has been investigated in detail. The potential outcome and the counterfactual framework has been developed and applied contributing to general definitions of causal effects and inference of mediation analysis. The causal mediation literature has also focused on acknowledging and assessing the strong assumptions on which the causal interpretations of these effects rely Imai et al., 2010; Pearl, 2001; Robins and Greenland, 1992; VanderWeele, Direct and indirect effects The need to separate direct and indirect effects in mediation is essentially a tool to make complex relations comprehensible. If a variable X affects both M and Y, but M also affects Y, then how should the effect of X on Y be separated from X on Y trough M? The corresponding question in the the cancer example would be How is the direct effect of the gene on cancer separated from the effect of the gene through 4

6 cigarette usage on cancer? As will be demonstrated several research questions can be answered once the set of causal effects are defined. The traditional way of calculating the indirect effect is called the product method and is credited to Baron and Kenny 1986, oftentimes the method is even referred to as the Baron and Kenny-method. Baron and Kenny 1986 has been one of the most influential papers in the mediation field, making the product method commonly applied. The product method is adequate for linear mediation models with continuous mediators and outcome. In some research areas this is the most commonly applied mediation model Rucker et al., However, as Robins and Greenland 1992 and Pearl 2001 pointed out the product method is unable to account for non-continuous mediators and outcomes, as well as mediation models with moderation and other non-linear functional forms. General effect definitions, building on the potential outcome framework Rubin, 1974 was suggested by Robins and Greenland 1992 and Pearl Counterfactual-based causal effects in mediation analysis The causal effects based on counterfactuals offer general causal effects definition. The definition does not assume any functional form or model and can be applied to a wide range of mediation models with varying complexity Pearl, More recently, causal effect in many special cases of mediation models has been derived from these definitions Muthén et al., 2016; Vanderweele, 2012; VanderWeele and Vansteelandt, 2010; Wang and Albert, Limited-dependent variable In many research situations limited dependent variables are encountered. Figure 1 shows an example of a sample from a limited variable, censored from below. This is characteristic histogram for a censored variable with many observations at one point and no observations below that point, as if the range of the observations was limited by something. The importance of accounting for censoring has been pointed out by e.g Tobin 1958, Cragg 1971, Jones 1989 and Brown et al If limited-dependent variables are not handled, model estimates will be biased. Thus, to estimate effects without bias in a mediation model where some dependent variable is limited, special methods are required. Limited dependent outcome variables in mediation is recently handled in Muthén et al. 5

7 2016. However, also the mediator in a mediation analysis is dependent in the regression on the exposure. Figure 1: A sample of 1000 observation from a limited normal variable with mean and variance equal to 1. Censored in the point If the bias found in regression analysis with limited-dependent variables transfers to the mediation analysis this might have severe consequences on the effect estimates and conclusions from mediation analysis. It is of interest to investigate the impact of the ability to account for limited dependent variables as mediators and outcome variables. 1.5 Research questions The aims of this study is to answer the following questions: 1. a How can the mediation model be formulated to account for a limited mediator and/or outcome variable? b What are the additional assumptions for the two-part mediation models compared to the simple mediation model? 2. How are the counterfactual based causal effects for the two-part mediation models derived? 3. Does acknowledging and accounting for limited mediators and/or outcome variable improve the accuracy of the causal effect estimates? 4. What are the sampling behaviours of the causal effects for the two-part mediation models? 6

8 The remaining parts of the thesis will have the following structure. Section 2 will give a detailed overview of the methods and motivations for the formulation of the limiteddependent variable models. Section 3 contains the model formulation and causal effects derivations for the two-part M model. Section 4 contains the model formulation and causal effects derivations for the two-part M, two-part Y model. In Section 5 Monte Carlo simulations are performed to evaluate the small sample properties for these models. Section 6 and 7 contain discussion and conclusions of the study. 2 Methodology In this section all the parts necessary to construct the two-part mediation models is introduced and motivated in detail. 2.1 Mediation analysis In this section the development and properties of mediation analysis are presented. The simple mediation model is presented and extended to become more suitable for this study The simple mediation model and its motivation The simple mediation model is illustrated in Figure 2. The exposure X affects the outcome Y, both directly and indirectly mediated by the mediator M. Rather than to focus on the size of the total effect of an exposure on an outcome, mediation analysis directs special attention to the How part. That is, how or by what means, does the exposure affect the outcome? Through what intermediate steps does the exposure affect the outcome? An easy way to motivate the need of answers to this kind of questions is through the perspective of policy makers. In many situations an exposure cannot be regulated by policies, however some mediators might. Drawing on the example from VanderWeele 2015, originally analysed in Vanderweele 2012, the risk of lung cancer is investigated. A genetic variant of a chromosome X is believed to affect the risk of lung cancer Y. Moreover, evidence has shown that this genetic variant affect smoking behaviour, making carriers of the genetic variant smoke more. It is known that smoking cigarettes increases the risk of lung cancer. It is possible that the genetic variant of the chromosome is causing cancer only through its effect on cigarette usage. In that case, the policy makers 7

9 can try to reduce the cigarette usage by laws and taxes in order to decrease the number of lung cancer patients. It is also possible that the indirect effect of the gene through cigarette usage on the risk of lung cancer is small, and the gene directly causes cancer. In the latter scenario, it might be difficult for the policy makers to take effective actions to decrease the number of patients diagnosed with lung cancer. This over-simplified example illustrates the importance of understanding the role which mechanisms themselves play in effective policy making. If one can quantify and compare the importance of single mediators, resources can be directed more effectively. This way of coming at questions transfers to a wide range of situations, in various kinds of research fields. In Equation 1 ɛ M M ɛ Y X Y Figure 2: The simple mediation model. X is the exposure variable, M the mediator and Y the outcome. the model formulation of the simple mediation model is displayed. M i = γ 0 + γ 1 X i + ɛ Mi Y i = β 0 + β 1 M i + β 2 X i + ɛ Y i 1, X i is the exposure variable for individual i. M i and Y i are mediator and outcome of individual i, both assumed continuous. ɛ i is the error term. The error terms are most commonly assumed to be iid normally distributed with mean zero and uncorrelated with X and M. The model is constructed by two linear models. One model where the mediator is modelled by the exposure, and one where the outcome is modelled by the mediator and the exposure Adding relations to the simple mediation model In most situations the simple mediation model is too parsimonious to capture relevant mechanisms. For example the assumptions of no unmeasured confounder between M 8

10 and Y see Section for details cannot be guaranteed to be fulfilled, but by adding relevant covariates the violation might be substantially reduced. Moreover, interaction between M and X is common; VanderWeele 2015 even suggest that it might generally be better to keep interaction terms in analysis even when non-significant interaction estimates are found. In Figure 3, a path diagram for a mediation model with a covariate affecting M and Y is displayed. Additionally, interaction between M and X is visualized by a path from X to the path between M and Y. In social sciences interaction is more often referred to as moderation. The interaction term poses no problems in estimation, however the traditional product method-based direct and indirect effects are no longer applicable Pearl, The model including the interaction and covariate can be written as in Equation 2. C ɛ M M ɛ Y X Y Figure 3: Mediation model with covariate and interaction between M and X. C is the covariate, X the exposure variable, M the mediator and Y the outcome. M i = γ 0 + γ 1 X i + γ 2 C i + ɛ Mi 2 Y i = β 0 + β 1 M i + β 2 X i + β 3 M i X i + β 4 C i + ɛ Y i The model formulation in Equation 2 is similar to that of the simple mediation model in Equation 1. The covariate C and the interaction term MX is added. X i and C i are the exposure and covariate variable for individual i. M i and Y i are the mediator and outcome for individual i. Again the error term, ɛ are usually assumed iid normally distributed with mean zero, uncorrelated with X, C and M. 9

11 2.1.3 Estimation The simple mediation model Equation 1 and the extended mediation model Equation 2 are estimated with Maximum Likelihood ML estimation or ordinary least square OLS. If the error terms are independent normally distributed, the OLS estimation of the two regression models one by one give the same result as the ML estimation of the whole system simultaneously. The likelihood function of these models is given in Equation 3. The right hand expression in Equation 3 implies that if there are no common parameters, as in typical cases, the terms can be maximized separately. The likelihood can be expressed n n n log L = log[y i, m i x i, c i ] = log[y i m i, x i, c i ] + log[m i x i, c i ] 3 i=1 i=1 i=1, where log[...] is the log of the conditional density function. 2.2 Two-group regression Two-group regression is special case of multi-group regression, fitting two different regressions to two subgroups within a sample. This can be compared to a single model with a dummy variable estimating the mean differences between two subgroups in a sample. The main difference is that two-group regression allows for different covariates in the two regressions. For the variables that are common for the two regressions the coefficients can be constrained to be the same, or to have different values, between the groups. The mean difference between the subgroups that a dummy coefficient would estimate in a one regression model, is estimated also in the two-group setting by the intercept difference. If the two models include the same covariates and all coefficients are constrained to be equal between the models, the intercept difference will be exactly the same as a dummy variable coefficient in a single model. If some different covariates are included and/or common covariates are not constrained the intercept difference will not be the same as the dummy coefficient. Additionally the two-group regression makes it possible to use different transformations of the same variable, between the groups. The technical difference between two separate regressions and a two-group regression is that common parameters constrained to be equal in the two regressions can be estimated using all available information from both subsets of the sample. However, if no parameter 10

12 is set to be common, two separate regressions and the two-group regression will give the exact same estimates as two-group. Hence, two-group regression is motivated when two subgroups of a sample are believed to have substantially different relationships between the covariates and the outcome for some covariates but equal for others. As in the case with the limited mediator M see details in Section 3, it might be believed that the relation between the exposure and the outcome for the M=0 and the M>0 group is similar. However, the M>0 group might have a relation with Y, that the fixed M=0 group will not have. Two-group regression makes it possible to fit one linear regression of Y on C and X, for the M=0 part, another linear regression for the M>0 part where the logarithm of M can be added to the independent variables C and X. The estimation of the slope of Y on X can be constrained to be the same for both regressions, allowing the estimates of these parameters to be based on the full data set. The reasons mentioned above indicate that a two-group regression of the outcome in the two-part mediator models gives a flexible model. The possibility to constrain slopes of common variables is preserved, still allowing for different covariates in the regressions. If all the slopes are constrained to be equal, the regression collapses back into a single linear regression. Similarly, if it is chosen not to constrain any parameter it will simply be two separate regressions. Two-group regression of Y will make possible general applications of the derived causal effects. 2.3 Limited-dependent variable analysis Limited-dependent variables are referred to as many things depending on the context e.g. two-part, hurdle, corner solution outcome or censored variable. A limited variable is a variable that for some reason is censored from above and/or below, having a point mass at the limit the case of truncated variables are beyond the scope of this study, see Figure 1. Sometimes such variables are referred to as suffering from ceiling respectively floor effects. This is probably due to the fact that in histograms of such variables it looks like the observations hit the ceiling and/or floor with a lot of observations on one value and no values above/below. To describe the principle of how to handle limiteddependent variables it is useful to consider only one type of censoring, even though all results can be used for both censoring from above and below. For the current study only censoring from below at zero will be considered to simplify examples and derivations, 11

13 without loss of generality. There are different methods of handling limited-dependent variables. Most methods have in common that the variable is split into two parts; one binary part handling the large number of zeros and one continuous part for the non-zero part of the variable. Usually this is modelled by one binary regression and one standard linear regression, using the same covariates in both regressions. One of the first ways to handle limited-dependent variables was proposed in Tobin His method, today known as the Tobit model, is widely applied. One limitation of the Tobit model is that it only allows equal signs for the corresponding parameters in the two regressions Wooldridge, If the binary part has a substantially different data generating process than the positive part it is in some cases also reasonable that effects of certain independent variables has different signs on the two parts of the dependent variable. Cragg 1971 suggested two extensions which solves the limitation of the Tobit model, the truncated normal hurdle and the log-normal hurdle. In these models the regressions of the binary and the continuous part of the limited-dependent variable is estimated independently. Thus, the coefficients of the independent variables are allowed to have different signs and sizes on the two different parts of the dependent variable. Throughout this study these models will be referred to as two-part models. For a thorough overview and comparison between different ways of handling limited-dependent variables and how they differ from sample selection problems, see Wooldridge 2002 and Greene The Two-part model Two-part modelling splits the limited-dependent variable into two parts. One binary zero/non-zero part and one positive part. The intuition is that first a mechanism decides if the variable will take a positive value or not, and if that value is non-zero; a second mechanism decides what positive number it will take. For example Will a individual smoke or not?, if yes; How much will the individual smoke?. Zero in this setting is viewed as a category and not the continuous numeric value. That is, a person that smokes zero cigarettes a day is simply a non-smoker. The zero indicates that the person belongs to the group non-smokers, rather than the amount. It might seem as an unimportant distinction, however to understand why it is not is crucial for the motivation of the two-part model. The two-part model is based on the idea that there might be a more 12

14 substantial difference between a non-smoker and a smoker, than between a one cigarette a day -smoker and a two cigarettes a day -smoker. Even though the difference in number of smoked cigarettes between the zero cigarettes a day -smoker and a one cigarette a day -smoker is the same as that between the one cigarette a day -smoker and two cigarette a day -smoker, the two that actually smokes might have more characteristics in common. It is likely that different mechanisms explain if you choose to smoke or not, and how much you choose to smoke. This reasoning implies that there are situations where the dependent variable has a point mass at zero but two-part analysis is not suitable. If the group in the point mass is not viewed as a group of observations with substantial different characteristics than the other observations, then the variable is not suitable for two-part analysis. In practice the zero/non-zero part will be estimated with a binary regression and the continuous part with standard linear regression. The probit and logit model are naturally considered for the binary part. Given the small difference in estimation result between the two Gill, 2000, probit is chosen to make the derivations in Appendix A and B simpler. Even though in theory, the two-part model collapses back to the standard regression for small amounts of censoring, there has to be a certain amount of censoring for the estimation procedure to work well. The binary regression will behave badly if a too small amount of the observations belongs to one group. Hence, the estimated coefficients, and therefore casual effects, from a two-part model will never coincide exactly with the classical estimates. The probit estimation will break down more severely the closer the censoring gets to zero. This estimation limitation is discussed in detail in Section For the continuous part of the two-part variable a distributional assumption has to be made. This is crucial for the derivations of the effects. The density function of the continuous part of the two-part variables has an important role in the derivations. The most common assumption is that the continuous part of the two-part dependent variable is normal or lognormal distributed. The experience of the author is that this is often a somewhat strong assumption not likely to be fulfilled. The sensitivity of this assumption has, to the best knowledge of the author, not been investigated in detailed. Sensitivity is discussed further in Section

15 2.4 Mediation analysis with limited-dependent variables The mediation analysis is special regarding independent/dependent relations. As can be seen in Figure 2, even the simple mediation model implies two dependent variables. M is dependent of X, but Y is also dependent on X and M. This means that important considerations normally investigated for the dependent variable should be investigated for at least two variables, in a mediation setting. The focus of this study is to establish the importance of accounting for limited-dependent variables, in mediation analysis. In order to cover all cases for limited-dependent variables in a simple mediation setting, three cases need to be considered. In Case 1 the outcome Y is limited, in Case 2 the mediator M is limited and in Case 3 both Y and M are limited. Case 1 is the most obvious since the outcome Y is what would be viewed as the only dependent variable in most regression settings. There are many ways suggested in literature to handle Case 1 in regression analysis Cragg, 1971; Duan et al., Limited-dependent variables in mediation analysis is recently handled in Muthén et al. 2016, where causal effects for the two-part approach for mediation with limited outcome are derived. A related approach is given in Wang and Albert 2012 where causal effects in mediation with limited counts is handled. The second and third case is, to the best knowledge of the author, not investigated. The second and third case is covered in detail under Section 3 and 4. First the counterfactual framework, used to define the causal effects of these models, is presented. 2.5 The counterfactual framework In order to understand the counterfactual framework it is helpful to use an simple example, where the exposure variable is a dichotomous treatment variable. An individual can be given the treatment, or not given the treatment at the arbitrary time point t. The outcome, say health on a continuous scale, is measured after the exposure at time point t+1. The desired effect to measure is the difference between the individuals health at time point t+1 if given the treatment, and the individuals health at time point t+1 if not be given the treatment. This is of course impossible to retrieve since on person can at time t only receive one treatment, and can thus at time t+1, only have received either the treatment or not. This is the effect of interest since this effect is the true treatment effect 14

16 i.e. true in the sense that the healing effect of time would not distort the measure of the treatment effect. Unfortunately, this cannot be resolved by giving both treatments after each other to one individual, due to to carry-over effects, the time points would also not be the same. Rubin 1974 started with a similar setup as above and suggested that since only one outcome can be observed for each individual, the unobserved outcome could be called the potential outcome. That is, the health that an individual would potentially have had at time t+1 if given the other treatment. Rubin suggested that focus should be shifted from the, by logic impossible to retrieve, individual effect, to instead look at the effects on group level. The expected value for an individual conditioned on being given treatment or not, could then be calculated. The difference between the expected value of health given treatment and the expected value of health not given treatment, can then be used as an estimate of the desired treatment effect. This was soon adopted and refined by a large number of researchers in different fields Imbens and Angrist, 1994; Pearl, 1995, 2001; Robins and Greenland, 1992; Spirtes et al., 1993 see Wooldridge 2002 and VanderWeele 2015 for recent overviews. Attempting to generalize the causal effect definitions in the mediation field, Robins and Greenland 1992 and Pearl 2001 suggested counterfactual-based effect definitions as a complement to the traditionally used product method Baron and Kenny, Effect notation and calculations for mediation To present the effect definition from Robins and Greenland 1992 and Pearl 2001 some convenient notation is first defined. Consider the mediation model in Figure 2, and for simplicity let X be dichotomous. Let Y 0 be the outcome of an individual who was exposed to X=0 and Y 1 the outcome of X=1 respectively. Additionally let Y 1m be the outcome of an individual that was exposed to X=1 and where M was set to the value m. Now let M0 be M conditioned on X=x 0. This means that Y 1M0 or short Y 1,0 is the outcome of a individual with X=1, however with M set to whatever it would have been conditioned on X=0. This can be generalized to non-dichotomous X, where the last expression would be Y x1,mx 0 for arbitrary chosen points x 1 and x 0. The Controlled Direct Effect CDE, for when X changes from x 0 to x 1, is defined CDEm=Y x1 m Y x0 m. However, since two different values can never be observed for one individual,the average effect is being 15

17 considered for all effects presented below. The Average of CDE is defined as CDEm = E[Y x1 m Y x0 m] 4 where Y x1 m is Y conditioned on X=x 1 and M=m, Y x0 m is Y conditioned on X=x 0 and M=m. CDE can be interpreted as the effect X has on Y when it changes from x 0 to x 1 when M is fixed to m. If returning to the example of lung cancer in from VanderWeele 2015, then CDE10= Y 1,10 Y 2,10, corresponds to the effect of moving from not having, to having the genetic variant of chromosome gene, if smoking 10 cigarettes per day. Even though this kind of effect is often interesting, the fixed m=10 does not correspond to a natural situation. A more natural situation might be considered if instead of fixing M letting it take the value it would have taken conditioned on the x 0 considered. The Pure Natural Direct Effect PNDE is defined as P NDEm = E[Y x1 Mx 0 Y x0 Mx 0 ] 5 and can be interpreted as effect X has on Y when it changes from x 0 to x 1 when M takes the value it would take on average for X = x 0. Applying this to the lung cancer example will give the effect of moving from not having, to having the genetic variant, given smoking the amount of cigarettes the average individual does in the absence of the gene. This effect is natural in the sense that M takes a value it would naturally do on average for one of the values of X considered. A corresponding Total Natural Indirect Effect TNIE, with the average TNIE being defined as T NIEm = E[Y x1 Mx 1 Y x1,mx 0 ] 6 and can be interpreted as the effect X have trough M on Y when X changes from x 0 to x 1. In the lung cancer example this would correspond to the effect of moving from not having to having the gene has on the risk of lung cancer only by affecting the number of cigarettes smoked per day. In addition to the Pure Natural effects there is also Total Natural effects; the Total Natural Direct Effect and the Total Natural Indirect Effect. The 16

18 difference is on which value of X, Y or M is conditioned. T NDEm = E[Y x1 Mx 1 Y x0,mx 1 ] 7 P NIEm = E[Y x0 Mx 1 Y x0,mx 0 ] 8 In linear mediation models there is no difference between the Pure Natural and the Total Natural effects, however for non-linear models the difference can be substantial. The counterfactual based causal effects of course also covers the usual treatment effect. That is, the difference on the outcome if given the treatment or not, or in our continuous exposure case: The difference on the outcome if exposed to x 0 or x 1. This is the total effect of the exposure on the outcome, the sum of all indirect and direct effects of X on Y. The Total effect is defined T Em = E[Y x1 Mx 1 Y x0,mx 0 ] 9 One important aspect of these counterfactual based effects is the fact that they always fulfil the relation TE = TNIE+PNDE. This property is obvious from the definition and is an important key to why these effects does not rely on any specific functional form. The product method effects only fulfil this relation for linear models Assumptions for causal effect of mediation models The review of the assumptions is based on that off VanderWeele 2015, which offers a thorough overview. There are four assumptions for establishing causal interpretations of all the effects. Assumption 1 - No unmeasured confounding of the exposure-outcome relationship Assumption 2 - No unmeasured confounding of the mediator-outcome relationship Assumption 3 - No unmeasured confounding of the exposure-mediator relationship Assumption 4 - No mediator-outcome confounder that is dependent on the exposure The first two assumptions implies that the covariates included in the model have to be sufficient to control for the confounding relations between the exposure and the outcome, 17

19 and between the mediator and the outcome. Assumption 1 can be fulfilled by randomization in assignment of exposure however this is not always the case for assumption 2. Assumption 1 and 2 to are necessary and sufficient for controlled causal effects. Assumption 1 and 2 are also necessary for the natural effects, however two additional assumptions are needed to ensure the causal interpretations of the natural effects. The third assumption means that the variables influencing the level of both the exposure and the mediator must be controlled. The final fourth assumption is often viewed as a strong assumption since it means that all confounders of the mediator and the outcome must be independent of the exposure. It is important to recognize that randomization does not make all assumptions in mediation fulfilled. This was emphasised by Judd and Kenny 1981 and James and Brett 1984, but not in Baron and Kenny 1986, thus being a notion less widely spread. This also implies that data collection and caution about controlling for confounders is particularly important for causal mediation models to be reliable. If all four assumptions are fulfilled, the effects defined above are said to have causal interpretations. However, the causality relies on some additional implicit assumptions of temporal ordering. The temporal ordering maybe implied by the word causal, but is worth pointing out as mediation analysis is often preformed on cross sectional data. Even though causal interpretations in some cases can be made with cross sectional data this is heavily relying on assumptions. With that said, Hayes 2013, pp. 89 makes a statement regarding assumptions interpretations and analysis in general, and even though it is not covering counterfactuals, is is still worth quoting: Sometimes theory and solid arguments is the only foundation upon which a causal claim can be built given limitations of our data. But I see no problem conduction the kind of analysis I describe in the following chapters even when causal claims rest on shaky grounds. It is our brains that interpret the place and meaning on the mathematical procedures used, not the procedures themselves.. The importance of assumptions is, according to Hayes, not necessarily to always fulfil them but to understand and acknowledge the limitations they impose on the interpretations. Assumption 1-4 are not testable, and of course an analyst can never know if all relevant confounding relations are captured. In order to make these effects useful without too much doubt, sensitivity analysis is suggested by many Imai et al., 2010; Pearl, 2001; VanderWeele, The idea is to explicitly display how much the result of a mediation analysis relies on the assumption. This is asserted e.g. by showing how large the effect 18

20 an unmeasured confounder on the exposure and the outcome must be, to fully explain the effect of the exposure on the outcome. This can be done by trying different effect sizes of the unmeasured confounder on the exposure and the mediator. Although these values are arbitrary in some sense, they give a good indication of how strong the estimated effects are relative to the assumption. If it only takes a weak confounder to account for the indirect effects then the assertion of the assumptions is crucial for reliable interpretations. On the other hand, if it takes a huge, non-plausible, effect of the confounder on the exposure and the mediator to account for the indirect effect then the interpretations may be more reliable. This kind of sensitivity analysis procedures are available for all four assumptions VanderWeele, In his book from 2015, VanderWeele strongly promotes that sensitivity analysis should always be presented together with mediation analysis and causal effect interpretations. It would arguably create a good standard for reporting mediation results as well as making mediation analysis less prone to accusations of relying on unreasonable assumptions. For a recent intuitive, less technical, review of mediation with causal effects and assumptions see also Keele Mediation, two-part M 3.1 Model formulation If the mediator M is limited, the regression of M on the exposure X is affected. If this would have been a single regression outside the mediation, all the limitations discussed in Section 2.3, would apply. The question is whether the gains found accounting for censoring in regression transfers into the case of a mediation model with a limited mediator. In Figure 4 a mediation model similar to that in Figure 3 is expanded to a two-part M model, to account for a limited mediator. The mediator M is separated into a binary zero/non-zero part measured with a dummy M*, and one non-zero continuous part M. That is, if M* = 1 then the observation is not censored and also has a continuous value, if M*=0 the observation is censored and has no continuous value. The two-part model of M on X and C is constructed from one probit model of M*, modelling the probability of being in the censoring point zero or not recall that a floor effect at zero is assumed without loss of generality. M* is assumed to be generated from a dichotomized normal distribution. Additionally one linear model accounts for the continuous part of the vari- 19

21 able. Both the binary and the continuous variable for M are then brought into the linear regression model of Y. Moreover, a two-group regression model of Y is used, one for the censored group and one for the uncensored group. In the group where M=0, M is not a covariate since M is fixed. Since distributional assumptions have to be made usually normal for the continuous part of M, the logarithm transformation is often suggested to better meet this assumption. Many times the two-part variable has a long right tail and have better resemblance with a normal curve after taking logarithms i.e. the continuous part of M is assumed lognormally distributed. The lognormal assumption case will be the focus of this study. The main reason for this is to make comparisons with earlier twopart model easier. However, the derivations of the causal effects apply also for normal distributed M see details in Appendix A. The implied model formulation is displayed in Equation 10. Y i Mi >0 = β1 0 + β 1 logm i + β 1 2 X i + β 3 logm i X i + β 1 4 C i + ɛ yi 10a Y = i Mi =0 β2 0 + β 2 2 X i + β 2 4 C i + ɛ yi 10b logm i Mi >0 = γ 0 + γ 1 X i + γ 2 C i + ɛ mi probitp rm i > 0 = κ 0 + κ 1 X i + κ 2 C i 10c 10d, where by assumption ɛ y i N0, σ 2 y and ɛ m i N0, σ 2 m. Equation 10a and 10b are the two-group regression of Y. The two-group regression is motivated in detail in Section 2.2. One benefit from using the two-group model is that it allows M and M* to have different intercepts and different slopes for X and C. The interaction between X and the positive part of M is quantified by β 3. The interaction between the zero/non-zero part of the mediator, M*, is somewhat less obvious. If β 1 2 and β 2 2 are allowed to be different in the two-group regression, their difference is a measure of the interaction effect of the binary M*. In the effect derivations β 1 2 and β 2 2 will be unconstrained until the final simplifications, so that the full interaction model with interactions for M and M* can be obtained. However, the focus of the final effect calculations is where only X is allowed to interact with the effect of M on Y through β 3, thus β 1 2 set = β 2 2. The difference in β 1 0 and β 2 0 will capture the mean difference in Y for the two parts of M. This is discussed in detail in Section 2.2. Equation 10b and 10d corresponds to the two-part regression of M on X and 20

22 C. Both the exposure and the covariate are allowed to have different effects on M and M*. The probit model will create a non-linear mediation model. The functional form is of importance since the main objective in mediation analysis is often to estimate the direct and indirect effects of the model. As was shown by Robins and Greenland 1992 and Pearl 2001 the classical product method Baron and Kenny, 1986 for calculating effects from the mediation analysis does not apply for non-linear models. Instead the counterfactual framework will be used to correctly define these effects. Figure 4 shows the path diagram implied by Equation 10. The exposure variable X affects the mediator, where the mediator is two-part and therefore divided into the binary zero/non-zero part M* and the non-zero continuous part M. X is allowed to moderate the effect of M on Y. The covariate C affects M, M* and Y. C M Two-part ɛ M ɛ Y M Y M=0 Two-group ɛ Y X Y M>0 Figure 4: Path diagram for a mediation model with a two-part mediator and two-group outcome, with interaction between the exposure and the mediator. M* is the observed binary variable coding for a observation being censored or not censored. 3.2 Estimation Maximum likelihood ML estimation will be used for all models. The two-part mediator model above has a more complicated likelihood function since the mediator M is a 21

23 combination of a binary variable and a continuous variable. For a sample i = 1,..., N n LM i x i, c i = P rm > 0 x i, c i fm i M i > 0, x i, c i i=1 N 1 P rm > 0 x i, c i 11a i=n+1 n LY i, M i x i, c i = P rm > 0 x i, c i fm i M i > 0, x i, c i fy i M i > 0, x i, c i i=1 N 1 P rm > 0 x i, c i fy i M i = 0, x i, c i 11b i=n+1 In Duan et al the likelihood of a two-part dependent variable is shown to be Equation 11a, which implies the full likelihood of Equation 11b. The expressions fm i... and fm i... are the conditional densities of M and Y. Note that in the two-group modelling of Y the conditional density of Y is not restricted to be the same in the first and the second product in Equation 11b. 3.3 Derivation of effects In this study the effect of X and C on Y are assumed equal for both groups of M. Thus β 1 2 = β 2 2 and β 1 4 = β 2 4, indicated by dropped superscript. Results to form the effects without these restrictions can be found in Appendix A Conditional expected value of Y One of the conditional expectations used to define the causal effects are shown in Equation 12. For a detailed explanation see Section 2.5. E[Y x 1, logmx 0 ] = = β β 1 0 β 2 0 Φκ 0 + κ 1 x 0 + κ 2 c + β 2 x 1 + β 4 c 1 Φκ 0 + κ 1 x 0 + κ 2 c+ Φκ 0 + κ 1 x 0 + κ 2 c + Φκ 0 + κ 1 x 0 + κ 2 c β 1 + β 3 x 1 γ 0 + γ 1 x 0 + γ 2 c = = β β 1 0 β 2 0 Φκ 0 + κ 1 x 0 + κ 2 c + β 2 x 1 + β 4 c+ Φκ 0 + κ 1 x 0 + κ 2 c β 1 + β 3 x 1 γ 0 + γ 1 x 0 + γ 2 c 12 22

24 3.3.2 Causal effects The complete derivation is displayed in Appendix A. The simplified effects are given in Equation The Total Natural Indirect Effect T NIE = E[Y x 1, logmx 1 C = c] E[Y x 1, logmx 0 C = c] = = β 1 0 β 2 0 Φκ 0 + κ 1 x 1 + κ 2 c Φκ 0 + κ 1 x 0 + κ 2 c + β 1 + β 3 x 1 Φκ 0 + κ 1 x 1 + κ 2 c γ 0 + γ 1 x 1 + γ 2 c Φκ 0 + κ 1 x 0 + κ 2 c γ 0 + γ 1 x 0 + γ 2 c 13 The Pure Natural Direct Effect P NDE = E[Y x 1, logmx 0 C = c] E[Y x 0, logmx 0 C = c] = = β 2 x 1 x 0 + Φκ 0 + κ 1 x 0 + κ 2 c γ 0 + γ 1 x 0 + γ 2 c β 3 x 1 x 0 14 The Pure Natural Indirect Effect P NIE = E[Y x 0, logmx 1 C = c] E[Y x 0, logmx 0 C = c] = β 1 0 β 2 0 Φκ 0 + κ 1 x 1 + κ 2 c Φκ 0 + κ 1 x 0 + κ 2 c + β 1 + β 3 x 0 Φκ 0 + κ 1 x 1 + κ 2 c γ 0 + γ 1 x 1 + γ 2 c Φκ 0 + κ 1 x 0 + κ 2 c γ 0 + γ 1 x 0 + γ 2 c 15 The Total Natural Direct Effect T NDE = E[Y x 1, logmx 1 C = c] E[Y x 0, logmx 1 C = c] = = β 2 x 1 x 0 + Φκ 0 + κ 1 x 1 + κ 2 c γ 0 + γ 1 x 1 + γ 2 c β 3 x 1 x 0 16 The Total effect T E = E[Y x 1, logmx 1 C = c] E[Y x 0, logmx 0 C = c] = = β 1 0 β 2 0 Φκ 0 + κ 1 x 1 + κ 2 c Φκ 0 + κ 1 x 0 + κ 2 c + β 2 x 1 x Φκ 0 + κ 1 x 1 + κ 2 c β 1 + β 3 x 1 γ 0 + γ 1 x 1 + γ 2 c 17 Φκ 0 + κ 1 x 0 + κ 2 c β 1 + β 3 x 0 γ 0 + γ 1 x 0 + γ 2 c 23

25 4 Mediation, two-part M and two-part Y In this section the mediation model where both the mediator and the outcome are limited is considered. 4.1 Model formulation If the mediator M and the outcome Y are limited, both the regression of M on X and the regression of Y on M and X are affected. Again the two-part model will be used to account for the censoring in both M and Y. The two-group regression setup for Y implies that two more regressions will be added due to the combination with the two-part model. Both the continuous part of M and the continuous part of Y rely on distributional assumptions. In this study both dependent variables are assumed to follow the lognormal distribution. The model formulation of a two-part M, two-part Y mediation model is displayed in Equation 18. logy i M>0 = β β 1 logm i + β 1 2 X i + β 3 logm i X i + β 1 4 C i + ɛ yi 18a probitp ry i M>0 > 0 = θ θ 1 logm i + θ 1 2 X i + θ 3 logm i X i + θ 1 4 C i 18b logy i M=0 = β β 2 2 X i + β 2 4 C i + ɛ yi 18c probitp ry im=0 > 0 = θ θ 2 2 X i + θ 2 4 C i 18d logm i M > 0 = γ 0 + γ 1 X i + γ 2 C i + ɛ mi probitp rm i > 0 = κ 0 + κ 1 X i + κ 2 C i 18e 18f, where by assumption ɛ y i N0, σy 2 and ɛ m i N0, σm. 2 This is a non-linear model and the counterfactual framework will be used to define the effects. The path diagram of Equation 18 is shown in Figure 5. The mediator M and the outcome Y are both separated into one zero/non-zero part M* and Y*, and one non-zero corresponding continuous 24

26 part M and Y. If M =1 then the observation is not censored and the observation has a corresponding continuous value M. If M =0 then the observation is censored and has no continuous value. The same goes for Y and Y. M* and Y* are assumed to be generated from dichotomized normal distributions. The exposure X affects M, M, Y and Y. M affects only the two-part Y belonging to the group M>0. Moreover, X are allowed to moderate the effect of M on Y. Additionally, a covariate measured at the same time point as X is allowed to affect both parts of the mediator and the outcome. The full model implied by Figure 5 is kept throughout the derivations, however in the last step some parameters will be restricted to limit the scope of the Monte Carlo simulation study in Section 5. Expressions for calculating unrestricted effects are available in Appendix B. This model allows for moderation between the zero/non-zero part of the mediator M*, and the effect of X on Y, since the slopes of the two group analysis of Y is not restricted. That is, the difference in between the slope of Y on X in the two groups is a measure of the interaction effect. The detailed motivation of this model formulation is discussed in Section 2. As in the two-part M model in Equation 10, the zero/non-zero parts are modelled with probit regression, and all the linear parts are modelled with standard linear regressions. 4.2 Estimation The likelihood function in Equation 11 is extended in Equation 19 to account for twopart modelling of both M and Y. The four combinations of zero/non-zero M and Y is represented by one product each in Equation 19. The expressions fm i... and fm i... 25

27 C M Two-part ɛ M Y M=0 M ɛ Y Two-part, M = 0 Y M=0 Two-group X Y M>0 ɛ Y Two-part, M > 0 Y M>0 Figure 5: Path diagram for a mediation model with a two-part mediator M and two-part outcome Y, combined with a two-group model of Y. M* and Y* are the binary observed variable coding for a observation being censored or not censored. are the conditional densities of M and Y. Let n be a random sample such that n = n g1 + n g2 + n g3 + n g4 and i = 1,..., n LY i, M i x i, c i = i g 1 P rm i > 0 x i, c i P ry i > 0 M i > 0, x i, c i fm i M i > 0, x i, c i fy i Y i > 0, M i > 0, x i, c i i g 2 1 P rm > 0 x i, c i P ry i > 0 M i = 0, x i, c i fy i Y i > 0, M i = 0, x i, c i i g 3 P rm i > 0 x i, c i 1 P ry i > 0 M i > 0, x i, c i fm i M i > 0, x i, c i i g 4 1 P rm i > 0 x i, c i 1 P ry i > 0 M i > 0, x i, c i 19 26

28 4.3 Derivation of causal effects In this study the effect of X and C on Y are assumed equal for both groups of M. Thus β 1 2 = β 2 2, β 1 4 = β 2 4, θ 1 2 = θ 2 2 and θ 1 4 = θ 2 4 indicated by dropped superscript. Results to form the unrestricted effects can be found in the detailed derivation in Appendix B Conditional expected values One of the conditional expectations used to define the causal effects are shown in Equation 20. For detailed explanation of these conditional expected values see Section 2.5. E[Y x 0, logmx 0 ] = φ exp β 2 x 0 + β 4 c exp exp β µ2 b 2 1 2σM 2 β 2 0 Φ θ θ 2 x 0 + θ 4 c 1 Φ κ 0 + κ 1 x 0 + κ 2 c + Φ κ 0 + κ 1 x 0 + κ 2 c Φ θ1 0 + θ 2 x 0 + θ 4 c + θ 1 + θ 3 x 0 bµ θ 1 + θ 3 x 0 2 σm Causal effects The complete derivation is displayed in Appendix B. The simplified effects given the restrictions mentioned above are given in Equation Note that b and µ in Equation 20 are substituted see details in Appendix B. Some further simplifications are possible, however without gaining simplicity and with loss of intuition. 27

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Hierarchical Generalized Linear Models Measurement Incorporated Hierarchical Linear Models Workshop Hierarchical Generalized Linear Models So now we are moving on to the more advanced type topics. To begin