Causal effects in mediation analysis with limited-dependent variables
|
|
- Elvin Gallagher
- 5 years ago
- Views:
Transcription
1 Causal effects in mediation analysis with limited-dependent variables By: Mårten Schultzberg Department of Statistics Uppsala University Supervisor: Fan Yang-Wallentin 2016
2 Contents 1 Introduction Mediation analysis in general Direct and indirect effects Counterfactual-based causal effects in mediation analysis Limited-dependent variable Research questions Methodology Mediation analysis The simple mediation model and its motivation Adding relations to the simple mediation model Estimation Two-group regression Limited-dependent variable analysis The Two-part model Mediation analysis with limited-dependent variables The counterfactual framework Effect notation and calculations for mediation Assumptions for causal effect of mediation models Mediation, two-part M Model formulation Estimation Derivation of effects Conditional expected value of Y Causal effects Mediation, two-part M and two-part Y Model formulation Estimation Derivation of causal effects Conditional expected values
3 4.3.2 Causal effects Monte Carlo simulations Synthetic models and true data generating processes Weak and Moderately strong effects model Censoring Model 1 - Two-part M Model 1 - Two-part M, Weak Model 1 - Two-part M, Moderately strong Model 2 - Two-part M, Two-part Y Model 2 - Two-part M, Two-part Y, Weak Model 2 - Two-part M, Two-part Y, Moderately strong Estimation Results Outcome variables Two-part M Two-part M, Two-part Y Sensitivity analysis Discussion 52 7 Conclusion 54 A Appendix - Derivation - Two-part M 59 B Appendix - Derivation - Twopart M, twopart Y 65 C Appendix - Mplus syntax 80 2
4 Abstract Mediation is used to separate direct and indirect effects of an exposure variable on an outcome variable. In this thesis, a mediation model is extended to account for censored mediator and outcome variable. The two-part framework is used to account for the censoring. The counterfactual based causal effects of this model are derived. A Monte Carlo study is performed to evaluate the behaviour of the causal effects accounting for censoring, together with a comparison with methods for estimating the causal effects without accounting for censoring. The results of the Monte Carlo study show that the effects accounting for censoring have substantially smaller bias when censoring is present. The proposed effects also seem to have a low cost with unbiased estimates for sample sizes as small as 100 for the two-part mediator model. In the case of limited mediator and outcome, sample sizes larger than 300 is required for reliable improvements. A small sensitivity analysis stresses the need of further development of the two-part models. Keywords: counterfactuals, two-part model, potential outcome 3
5 1 Introduction The introduction of this study will give a quick overview motivation of the study followed by the research questions. 1.1 Mediation analysis in general Mediation analysis is used to quantify the effects that an exposure variable has on an outcome variable, mediated by some intermediate variable. For example a gene that causes cancer, also causes increased cigarette usage that in turn causes cancer. The effect of the gene on cancer is mediated by cigarette usage. The intermediate variable cigarette usage is often called the mediator variable, or the mediator. The hypothesised relationships of a simple mediation model is that an exposure variable X causes some change in a mediator variable M that in turn causes a change in an outcome variable Y Hayes, The mediation analysis has become widely used in social sciences and biomedical studies especially since the influential paper by Baron and Kenny The claim of a possibility to open up the black-box, answering question such as Through what mechanism does X affect Y? or How does a change in X affect Y? is probably an explanation of the vast usage. For a thorough overview of traditional mediation analysis see Hayes In recent years the causal claims of these models and their limitations has been investigated in detail. The potential outcome and the counterfactual framework has been developed and applied contributing to general definitions of causal effects and inference of mediation analysis. The causal mediation literature has also focused on acknowledging and assessing the strong assumptions on which the causal interpretations of these effects rely Imai et al., 2010; Pearl, 2001; Robins and Greenland, 1992; VanderWeele, Direct and indirect effects The need to separate direct and indirect effects in mediation is essentially a tool to make complex relations comprehensible. If a variable X affects both M and Y, but M also affects Y, then how should the effect of X on Y be separated from X on Y trough M? The corresponding question in the the cancer example would be How is the direct effect of the gene on cancer separated from the effect of the gene through 4
6 cigarette usage on cancer? As will be demonstrated several research questions can be answered once the set of causal effects are defined. The traditional way of calculating the indirect effect is called the product method and is credited to Baron and Kenny 1986, oftentimes the method is even referred to as the Baron and Kenny-method. Baron and Kenny 1986 has been one of the most influential papers in the mediation field, making the product method commonly applied. The product method is adequate for linear mediation models with continuous mediators and outcome. In some research areas this is the most commonly applied mediation model Rucker et al., However, as Robins and Greenland 1992 and Pearl 2001 pointed out the product method is unable to account for non-continuous mediators and outcomes, as well as mediation models with moderation and other non-linear functional forms. General effect definitions, building on the potential outcome framework Rubin, 1974 was suggested by Robins and Greenland 1992 and Pearl Counterfactual-based causal effects in mediation analysis The causal effects based on counterfactuals offer general causal effects definition. The definition does not assume any functional form or model and can be applied to a wide range of mediation models with varying complexity Pearl, More recently, causal effect in many special cases of mediation models has been derived from these definitions Muthén et al., 2016; Vanderweele, 2012; VanderWeele and Vansteelandt, 2010; Wang and Albert, Limited-dependent variable In many research situations limited dependent variables are encountered. Figure 1 shows an example of a sample from a limited variable, censored from below. This is characteristic histogram for a censored variable with many observations at one point and no observations below that point, as if the range of the observations was limited by something. The importance of accounting for censoring has been pointed out by e.g Tobin 1958, Cragg 1971, Jones 1989 and Brown et al If limited-dependent variables are not handled, model estimates will be biased. Thus, to estimate effects without bias in a mediation model where some dependent variable is limited, special methods are required. Limited dependent outcome variables in mediation is recently handled in Muthén et al. 5
7 2016. However, also the mediator in a mediation analysis is dependent in the regression on the exposure. Figure 1: A sample of 1000 observation from a limited normal variable with mean and variance equal to 1. Censored in the point If the bias found in regression analysis with limited-dependent variables transfers to the mediation analysis this might have severe consequences on the effect estimates and conclusions from mediation analysis. It is of interest to investigate the impact of the ability to account for limited dependent variables as mediators and outcome variables. 1.5 Research questions The aims of this study is to answer the following questions: 1. a How can the mediation model be formulated to account for a limited mediator and/or outcome variable? b What are the additional assumptions for the two-part mediation models compared to the simple mediation model? 2. How are the counterfactual based causal effects for the two-part mediation models derived? 3. Does acknowledging and accounting for limited mediators and/or outcome variable improve the accuracy of the causal effect estimates? 4. What are the sampling behaviours of the causal effects for the two-part mediation models? 6
8 The remaining parts of the thesis will have the following structure. Section 2 will give a detailed overview of the methods and motivations for the formulation of the limiteddependent variable models. Section 3 contains the model formulation and causal effects derivations for the two-part M model. Section 4 contains the model formulation and causal effects derivations for the two-part M, two-part Y model. In Section 5 Monte Carlo simulations are performed to evaluate the small sample properties for these models. Section 6 and 7 contain discussion and conclusions of the study. 2 Methodology In this section all the parts necessary to construct the two-part mediation models is introduced and motivated in detail. 2.1 Mediation analysis In this section the development and properties of mediation analysis are presented. The simple mediation model is presented and extended to become more suitable for this study The simple mediation model and its motivation The simple mediation model is illustrated in Figure 2. The exposure X affects the outcome Y, both directly and indirectly mediated by the mediator M. Rather than to focus on the size of the total effect of an exposure on an outcome, mediation analysis directs special attention to the How part. That is, how or by what means, does the exposure affect the outcome? Through what intermediate steps does the exposure affect the outcome? An easy way to motivate the need of answers to this kind of questions is through the perspective of policy makers. In many situations an exposure cannot be regulated by policies, however some mediators might. Drawing on the example from VanderWeele 2015, originally analysed in Vanderweele 2012, the risk of lung cancer is investigated. A genetic variant of a chromosome X is believed to affect the risk of lung cancer Y. Moreover, evidence has shown that this genetic variant affect smoking behaviour, making carriers of the genetic variant smoke more. It is known that smoking cigarettes increases the risk of lung cancer. It is possible that the genetic variant of the chromosome is causing cancer only through its effect on cigarette usage. In that case, the policy makers 7
9 can try to reduce the cigarette usage by laws and taxes in order to decrease the number of lung cancer patients. It is also possible that the indirect effect of the gene through cigarette usage on the risk of lung cancer is small, and the gene directly causes cancer. In the latter scenario, it might be difficult for the policy makers to take effective actions to decrease the number of patients diagnosed with lung cancer. This over-simplified example illustrates the importance of understanding the role which mechanisms themselves play in effective policy making. If one can quantify and compare the importance of single mediators, resources can be directed more effectively. This way of coming at questions transfers to a wide range of situations, in various kinds of research fields. In Equation 1 ɛ M M ɛ Y X Y Figure 2: The simple mediation model. X is the exposure variable, M the mediator and Y the outcome. the model formulation of the simple mediation model is displayed. M i = γ 0 + γ 1 X i + ɛ Mi Y i = β 0 + β 1 M i + β 2 X i + ɛ Y i 1, X i is the exposure variable for individual i. M i and Y i are mediator and outcome of individual i, both assumed continuous. ɛ i is the error term. The error terms are most commonly assumed to be iid normally distributed with mean zero and uncorrelated with X and M. The model is constructed by two linear models. One model where the mediator is modelled by the exposure, and one where the outcome is modelled by the mediator and the exposure Adding relations to the simple mediation model In most situations the simple mediation model is too parsimonious to capture relevant mechanisms. For example the assumptions of no unmeasured confounder between M 8
10 and Y see Section for details cannot be guaranteed to be fulfilled, but by adding relevant covariates the violation might be substantially reduced. Moreover, interaction between M and X is common; VanderWeele 2015 even suggest that it might generally be better to keep interaction terms in analysis even when non-significant interaction estimates are found. In Figure 3, a path diagram for a mediation model with a covariate affecting M and Y is displayed. Additionally, interaction between M and X is visualized by a path from X to the path between M and Y. In social sciences interaction is more often referred to as moderation. The interaction term poses no problems in estimation, however the traditional product method-based direct and indirect effects are no longer applicable Pearl, The model including the interaction and covariate can be written as in Equation 2. C ɛ M M ɛ Y X Y Figure 3: Mediation model with covariate and interaction between M and X. C is the covariate, X the exposure variable, M the mediator and Y the outcome. M i = γ 0 + γ 1 X i + γ 2 C i + ɛ Mi 2 Y i = β 0 + β 1 M i + β 2 X i + β 3 M i X i + β 4 C i + ɛ Y i The model formulation in Equation 2 is similar to that of the simple mediation model in Equation 1. The covariate C and the interaction term MX is added. X i and C i are the exposure and covariate variable for individual i. M i and Y i are the mediator and outcome for individual i. Again the error term, ɛ are usually assumed iid normally distributed with mean zero, uncorrelated with X, C and M. 9
11 2.1.3 Estimation The simple mediation model Equation 1 and the extended mediation model Equation 2 are estimated with Maximum Likelihood ML estimation or ordinary least square OLS. If the error terms are independent normally distributed, the OLS estimation of the two regression models one by one give the same result as the ML estimation of the whole system simultaneously. The likelihood function of these models is given in Equation 3. The right hand expression in Equation 3 implies that if there are no common parameters, as in typical cases, the terms can be maximized separately. The likelihood can be expressed n n n log L = log[y i, m i x i, c i ] = log[y i m i, x i, c i ] + log[m i x i, c i ] 3 i=1 i=1 i=1, where log[...] is the log of the conditional density function. 2.2 Two-group regression Two-group regression is special case of multi-group regression, fitting two different regressions to two subgroups within a sample. This can be compared to a single model with a dummy variable estimating the mean differences between two subgroups in a sample. The main difference is that two-group regression allows for different covariates in the two regressions. For the variables that are common for the two regressions the coefficients can be constrained to be the same, or to have different values, between the groups. The mean difference between the subgroups that a dummy coefficient would estimate in a one regression model, is estimated also in the two-group setting by the intercept difference. If the two models include the same covariates and all coefficients are constrained to be equal between the models, the intercept difference will be exactly the same as a dummy variable coefficient in a single model. If some different covariates are included and/or common covariates are not constrained the intercept difference will not be the same as the dummy coefficient. Additionally the two-group regression makes it possible to use different transformations of the same variable, between the groups. The technical difference between two separate regressions and a two-group regression is that common parameters constrained to be equal in the two regressions can be estimated using all available information from both subsets of the sample. However, if no parameter 10
12 is set to be common, two separate regressions and the two-group regression will give the exact same estimates as two-group. Hence, two-group regression is motivated when two subgroups of a sample are believed to have substantially different relationships between the covariates and the outcome for some covariates but equal for others. As in the case with the limited mediator M see details in Section 3, it might be believed that the relation between the exposure and the outcome for the M=0 and the M>0 group is similar. However, the M>0 group might have a relation with Y, that the fixed M=0 group will not have. Two-group regression makes it possible to fit one linear regression of Y on C and X, for the M=0 part, another linear regression for the M>0 part where the logarithm of M can be added to the independent variables C and X. The estimation of the slope of Y on X can be constrained to be the same for both regressions, allowing the estimates of these parameters to be based on the full data set. The reasons mentioned above indicate that a two-group regression of the outcome in the two-part mediator models gives a flexible model. The possibility to constrain slopes of common variables is preserved, still allowing for different covariates in the regressions. If all the slopes are constrained to be equal, the regression collapses back into a single linear regression. Similarly, if it is chosen not to constrain any parameter it will simply be two separate regressions. Two-group regression of Y will make possible general applications of the derived causal effects. 2.3 Limited-dependent variable analysis Limited-dependent variables are referred to as many things depending on the context e.g. two-part, hurdle, corner solution outcome or censored variable. A limited variable is a variable that for some reason is censored from above and/or below, having a point mass at the limit the case of truncated variables are beyond the scope of this study, see Figure 1. Sometimes such variables are referred to as suffering from ceiling respectively floor effects. This is probably due to the fact that in histograms of such variables it looks like the observations hit the ceiling and/or floor with a lot of observations on one value and no values above/below. To describe the principle of how to handle limiteddependent variables it is useful to consider only one type of censoring, even though all results can be used for both censoring from above and below. For the current study only censoring from below at zero will be considered to simplify examples and derivations, 11
13 without loss of generality. There are different methods of handling limited-dependent variables. Most methods have in common that the variable is split into two parts; one binary part handling the large number of zeros and one continuous part for the non-zero part of the variable. Usually this is modelled by one binary regression and one standard linear regression, using the same covariates in both regressions. One of the first ways to handle limited-dependent variables was proposed in Tobin His method, today known as the Tobit model, is widely applied. One limitation of the Tobit model is that it only allows equal signs for the corresponding parameters in the two regressions Wooldridge, If the binary part has a substantially different data generating process than the positive part it is in some cases also reasonable that effects of certain independent variables has different signs on the two parts of the dependent variable. Cragg 1971 suggested two extensions which solves the limitation of the Tobit model, the truncated normal hurdle and the log-normal hurdle. In these models the regressions of the binary and the continuous part of the limited-dependent variable is estimated independently. Thus, the coefficients of the independent variables are allowed to have different signs and sizes on the two different parts of the dependent variable. Throughout this study these models will be referred to as two-part models. For a thorough overview and comparison between different ways of handling limited-dependent variables and how they differ from sample selection problems, see Wooldridge 2002 and Greene The Two-part model Two-part modelling splits the limited-dependent variable into two parts. One binary zero/non-zero part and one positive part. The intuition is that first a mechanism decides if the variable will take a positive value or not, and if that value is non-zero; a second mechanism decides what positive number it will take. For example Will a individual smoke or not?, if yes; How much will the individual smoke?. Zero in this setting is viewed as a category and not the continuous numeric value. That is, a person that smokes zero cigarettes a day is simply a non-smoker. The zero indicates that the person belongs to the group non-smokers, rather than the amount. It might seem as an unimportant distinction, however to understand why it is not is crucial for the motivation of the two-part model. The two-part model is based on the idea that there might be a more 12
14 substantial difference between a non-smoker and a smoker, than between a one cigarette a day -smoker and a two cigarettes a day -smoker. Even though the difference in number of smoked cigarettes between the zero cigarettes a day -smoker and a one cigarette a day -smoker is the same as that between the one cigarette a day -smoker and two cigarette a day -smoker, the two that actually smokes might have more characteristics in common. It is likely that different mechanisms explain if you choose to smoke or not, and how much you choose to smoke. This reasoning implies that there are situations where the dependent variable has a point mass at zero but two-part analysis is not suitable. If the group in the point mass is not viewed as a group of observations with substantial different characteristics than the other observations, then the variable is not suitable for two-part analysis. In practice the zero/non-zero part will be estimated with a binary regression and the continuous part with standard linear regression. The probit and logit model are naturally considered for the binary part. Given the small difference in estimation result between the two Gill, 2000, probit is chosen to make the derivations in Appendix A and B simpler. Even though in theory, the two-part model collapses back to the standard regression for small amounts of censoring, there has to be a certain amount of censoring for the estimation procedure to work well. The binary regression will behave badly if a too small amount of the observations belongs to one group. Hence, the estimated coefficients, and therefore casual effects, from a two-part model will never coincide exactly with the classical estimates. The probit estimation will break down more severely the closer the censoring gets to zero. This estimation limitation is discussed in detail in Section For the continuous part of the two-part variable a distributional assumption has to be made. This is crucial for the derivations of the effects. The density function of the continuous part of the two-part variables has an important role in the derivations. The most common assumption is that the continuous part of the two-part dependent variable is normal or lognormal distributed. The experience of the author is that this is often a somewhat strong assumption not likely to be fulfilled. The sensitivity of this assumption has, to the best knowledge of the author, not been investigated in detailed. Sensitivity is discussed further in Section
15 2.4 Mediation analysis with limited-dependent variables The mediation analysis is special regarding independent/dependent relations. As can be seen in Figure 2, even the simple mediation model implies two dependent variables. M is dependent of X, but Y is also dependent on X and M. This means that important considerations normally investigated for the dependent variable should be investigated for at least two variables, in a mediation setting. The focus of this study is to establish the importance of accounting for limited-dependent variables, in mediation analysis. In order to cover all cases for limited-dependent variables in a simple mediation setting, three cases need to be considered. In Case 1 the outcome Y is limited, in Case 2 the mediator M is limited and in Case 3 both Y and M are limited. Case 1 is the most obvious since the outcome Y is what would be viewed as the only dependent variable in most regression settings. There are many ways suggested in literature to handle Case 1 in regression analysis Cragg, 1971; Duan et al., Limited-dependent variables in mediation analysis is recently handled in Muthén et al. 2016, where causal effects for the two-part approach for mediation with limited outcome are derived. A related approach is given in Wang and Albert 2012 where causal effects in mediation with limited counts is handled. The second and third case is, to the best knowledge of the author, not investigated. The second and third case is covered in detail under Section 3 and 4. First the counterfactual framework, used to define the causal effects of these models, is presented. 2.5 The counterfactual framework In order to understand the counterfactual framework it is helpful to use an simple example, where the exposure variable is a dichotomous treatment variable. An individual can be given the treatment, or not given the treatment at the arbitrary time point t. The outcome, say health on a continuous scale, is measured after the exposure at time point t+1. The desired effect to measure is the difference between the individuals health at time point t+1 if given the treatment, and the individuals health at time point t+1 if not be given the treatment. This is of course impossible to retrieve since on person can at time t only receive one treatment, and can thus at time t+1, only have received either the treatment or not. This is the effect of interest since this effect is the true treatment effect 14
16 i.e. true in the sense that the healing effect of time would not distort the measure of the treatment effect. Unfortunately, this cannot be resolved by giving both treatments after each other to one individual, due to to carry-over effects, the time points would also not be the same. Rubin 1974 started with a similar setup as above and suggested that since only one outcome can be observed for each individual, the unobserved outcome could be called the potential outcome. That is, the health that an individual would potentially have had at time t+1 if given the other treatment. Rubin suggested that focus should be shifted from the, by logic impossible to retrieve, individual effect, to instead look at the effects on group level. The expected value for an individual conditioned on being given treatment or not, could then be calculated. The difference between the expected value of health given treatment and the expected value of health not given treatment, can then be used as an estimate of the desired treatment effect. This was soon adopted and refined by a large number of researchers in different fields Imbens and Angrist, 1994; Pearl, 1995, 2001; Robins and Greenland, 1992; Spirtes et al., 1993 see Wooldridge 2002 and VanderWeele 2015 for recent overviews. Attempting to generalize the causal effect definitions in the mediation field, Robins and Greenland 1992 and Pearl 2001 suggested counterfactual-based effect definitions as a complement to the traditionally used product method Baron and Kenny, Effect notation and calculations for mediation To present the effect definition from Robins and Greenland 1992 and Pearl 2001 some convenient notation is first defined. Consider the mediation model in Figure 2, and for simplicity let X be dichotomous. Let Y 0 be the outcome of an individual who was exposed to X=0 and Y 1 the outcome of X=1 respectively. Additionally let Y 1m be the outcome of an individual that was exposed to X=1 and where M was set to the value m. Now let M0 be M conditioned on X=x 0. This means that Y 1M0 or short Y 1,0 is the outcome of a individual with X=1, however with M set to whatever it would have been conditioned on X=0. This can be generalized to non-dichotomous X, where the last expression would be Y x1,mx 0 for arbitrary chosen points x 1 and x 0. The Controlled Direct Effect CDE, for when X changes from x 0 to x 1, is defined CDEm=Y x1 m Y x0 m. However, since two different values can never be observed for one individual,the average effect is being 15
17 considered for all effects presented below. The Average of CDE is defined as CDEm = E[Y x1 m Y x0 m] 4 where Y x1 m is Y conditioned on X=x 1 and M=m, Y x0 m is Y conditioned on X=x 0 and M=m. CDE can be interpreted as the effect X has on Y when it changes from x 0 to x 1 when M is fixed to m. If returning to the example of lung cancer in from VanderWeele 2015, then CDE10= Y 1,10 Y 2,10, corresponds to the effect of moving from not having, to having the genetic variant of chromosome gene, if smoking 10 cigarettes per day. Even though this kind of effect is often interesting, the fixed m=10 does not correspond to a natural situation. A more natural situation might be considered if instead of fixing M letting it take the value it would have taken conditioned on the x 0 considered. The Pure Natural Direct Effect PNDE is defined as P NDEm = E[Y x1 Mx 0 Y x0 Mx 0 ] 5 and can be interpreted as effect X has on Y when it changes from x 0 to x 1 when M takes the value it would take on average for X = x 0. Applying this to the lung cancer example will give the effect of moving from not having, to having the genetic variant, given smoking the amount of cigarettes the average individual does in the absence of the gene. This effect is natural in the sense that M takes a value it would naturally do on average for one of the values of X considered. A corresponding Total Natural Indirect Effect TNIE, with the average TNIE being defined as T NIEm = E[Y x1 Mx 1 Y x1,mx 0 ] 6 and can be interpreted as the effect X have trough M on Y when X changes from x 0 to x 1. In the lung cancer example this would correspond to the effect of moving from not having to having the gene has on the risk of lung cancer only by affecting the number of cigarettes smoked per day. In addition to the Pure Natural effects there is also Total Natural effects; the Total Natural Direct Effect and the Total Natural Indirect Effect. The 16
18 difference is on which value of X, Y or M is conditioned. T NDEm = E[Y x1 Mx 1 Y x0,mx 1 ] 7 P NIEm = E[Y x0 Mx 1 Y x0,mx 0 ] 8 In linear mediation models there is no difference between the Pure Natural and the Total Natural effects, however for non-linear models the difference can be substantial. The counterfactual based causal effects of course also covers the usual treatment effect. That is, the difference on the outcome if given the treatment or not, or in our continuous exposure case: The difference on the outcome if exposed to x 0 or x 1. This is the total effect of the exposure on the outcome, the sum of all indirect and direct effects of X on Y. The Total effect is defined T Em = E[Y x1 Mx 1 Y x0,mx 0 ] 9 One important aspect of these counterfactual based effects is the fact that they always fulfil the relation TE = TNIE+PNDE. This property is obvious from the definition and is an important key to why these effects does not rely on any specific functional form. The product method effects only fulfil this relation for linear models Assumptions for causal effect of mediation models The review of the assumptions is based on that off VanderWeele 2015, which offers a thorough overview. There are four assumptions for establishing causal interpretations of all the effects. Assumption 1 - No unmeasured confounding of the exposure-outcome relationship Assumption 2 - No unmeasured confounding of the mediator-outcome relationship Assumption 3 - No unmeasured confounding of the exposure-mediator relationship Assumption 4 - No mediator-outcome confounder that is dependent on the exposure The first two assumptions implies that the covariates included in the model have to be sufficient to control for the confounding relations between the exposure and the outcome, 17
19 and between the mediator and the outcome. Assumption 1 can be fulfilled by randomization in assignment of exposure however this is not always the case for assumption 2. Assumption 1 and 2 to are necessary and sufficient for controlled causal effects. Assumption 1 and 2 are also necessary for the natural effects, however two additional assumptions are needed to ensure the causal interpretations of the natural effects. The third assumption means that the variables influencing the level of both the exposure and the mediator must be controlled. The final fourth assumption is often viewed as a strong assumption since it means that all confounders of the mediator and the outcome must be independent of the exposure. It is important to recognize that randomization does not make all assumptions in mediation fulfilled. This was emphasised by Judd and Kenny 1981 and James and Brett 1984, but not in Baron and Kenny 1986, thus being a notion less widely spread. This also implies that data collection and caution about controlling for confounders is particularly important for causal mediation models to be reliable. If all four assumptions are fulfilled, the effects defined above are said to have causal interpretations. However, the causality relies on some additional implicit assumptions of temporal ordering. The temporal ordering maybe implied by the word causal, but is worth pointing out as mediation analysis is often preformed on cross sectional data. Even though causal interpretations in some cases can be made with cross sectional data this is heavily relying on assumptions. With that said, Hayes 2013, pp. 89 makes a statement regarding assumptions interpretations and analysis in general, and even though it is not covering counterfactuals, is is still worth quoting: Sometimes theory and solid arguments is the only foundation upon which a causal claim can be built given limitations of our data. But I see no problem conduction the kind of analysis I describe in the following chapters even when causal claims rest on shaky grounds. It is our brains that interpret the place and meaning on the mathematical procedures used, not the procedures themselves.. The importance of assumptions is, according to Hayes, not necessarily to always fulfil them but to understand and acknowledge the limitations they impose on the interpretations. Assumption 1-4 are not testable, and of course an analyst can never know if all relevant confounding relations are captured. In order to make these effects useful without too much doubt, sensitivity analysis is suggested by many Imai et al., 2010; Pearl, 2001; VanderWeele, The idea is to explicitly display how much the result of a mediation analysis relies on the assumption. This is asserted e.g. by showing how large the effect 18
20 an unmeasured confounder on the exposure and the outcome must be, to fully explain the effect of the exposure on the outcome. This can be done by trying different effect sizes of the unmeasured confounder on the exposure and the mediator. Although these values are arbitrary in some sense, they give a good indication of how strong the estimated effects are relative to the assumption. If it only takes a weak confounder to account for the indirect effects then the assertion of the assumptions is crucial for reliable interpretations. On the other hand, if it takes a huge, non-plausible, effect of the confounder on the exposure and the mediator to account for the indirect effect then the interpretations may be more reliable. This kind of sensitivity analysis procedures are available for all four assumptions VanderWeele, In his book from 2015, VanderWeele strongly promotes that sensitivity analysis should always be presented together with mediation analysis and causal effect interpretations. It would arguably create a good standard for reporting mediation results as well as making mediation analysis less prone to accusations of relying on unreasonable assumptions. For a recent intuitive, less technical, review of mediation with causal effects and assumptions see also Keele Mediation, two-part M 3.1 Model formulation If the mediator M is limited, the regression of M on the exposure X is affected. If this would have been a single regression outside the mediation, all the limitations discussed in Section 2.3, would apply. The question is whether the gains found accounting for censoring in regression transfers into the case of a mediation model with a limited mediator. In Figure 4 a mediation model similar to that in Figure 3 is expanded to a two-part M model, to account for a limited mediator. The mediator M is separated into a binary zero/non-zero part measured with a dummy M*, and one non-zero continuous part M. That is, if M* = 1 then the observation is not censored and also has a continuous value, if M*=0 the observation is censored and has no continuous value. The two-part model of M on X and C is constructed from one probit model of M*, modelling the probability of being in the censoring point zero or not recall that a floor effect at zero is assumed without loss of generality. M* is assumed to be generated from a dichotomized normal distribution. Additionally one linear model accounts for the continuous part of the vari- 19
21 able. Both the binary and the continuous variable for M are then brought into the linear regression model of Y. Moreover, a two-group regression model of Y is used, one for the censored group and one for the uncensored group. In the group where M=0, M is not a covariate since M is fixed. Since distributional assumptions have to be made usually normal for the continuous part of M, the logarithm transformation is often suggested to better meet this assumption. Many times the two-part variable has a long right tail and have better resemblance with a normal curve after taking logarithms i.e. the continuous part of M is assumed lognormally distributed. The lognormal assumption case will be the focus of this study. The main reason for this is to make comparisons with earlier twopart model easier. However, the derivations of the causal effects apply also for normal distributed M see details in Appendix A. The implied model formulation is displayed in Equation 10. Y i Mi >0 = β1 0 + β 1 logm i + β 1 2 X i + β 3 logm i X i + β 1 4 C i + ɛ yi 10a Y = i Mi =0 β2 0 + β 2 2 X i + β 2 4 C i + ɛ yi 10b logm i Mi >0 = γ 0 + γ 1 X i + γ 2 C i + ɛ mi probitp rm i > 0 = κ 0 + κ 1 X i + κ 2 C i 10c 10d, where by assumption ɛ y i N0, σ 2 y and ɛ m i N0, σ 2 m. Equation 10a and 10b are the two-group regression of Y. The two-group regression is motivated in detail in Section 2.2. One benefit from using the two-group model is that it allows M and M* to have different intercepts and different slopes for X and C. The interaction between X and the positive part of M is quantified by β 3. The interaction between the zero/non-zero part of the mediator, M*, is somewhat less obvious. If β 1 2 and β 2 2 are allowed to be different in the two-group regression, their difference is a measure of the interaction effect of the binary M*. In the effect derivations β 1 2 and β 2 2 will be unconstrained until the final simplifications, so that the full interaction model with interactions for M and M* can be obtained. However, the focus of the final effect calculations is where only X is allowed to interact with the effect of M on Y through β 3, thus β 1 2 set = β 2 2. The difference in β 1 0 and β 2 0 will capture the mean difference in Y for the two parts of M. This is discussed in detail in Section 2.2. Equation 10b and 10d corresponds to the two-part regression of M on X and 20
22 C. Both the exposure and the covariate are allowed to have different effects on M and M*. The probit model will create a non-linear mediation model. The functional form is of importance since the main objective in mediation analysis is often to estimate the direct and indirect effects of the model. As was shown by Robins and Greenland 1992 and Pearl 2001 the classical product method Baron and Kenny, 1986 for calculating effects from the mediation analysis does not apply for non-linear models. Instead the counterfactual framework will be used to correctly define these effects. Figure 4 shows the path diagram implied by Equation 10. The exposure variable X affects the mediator, where the mediator is two-part and therefore divided into the binary zero/non-zero part M* and the non-zero continuous part M. X is allowed to moderate the effect of M on Y. The covariate C affects M, M* and Y. C M Two-part ɛ M ɛ Y M Y M=0 Two-group ɛ Y X Y M>0 Figure 4: Path diagram for a mediation model with a two-part mediator and two-group outcome, with interaction between the exposure and the mediator. M* is the observed binary variable coding for a observation being censored or not censored. 3.2 Estimation Maximum likelihood ML estimation will be used for all models. The two-part mediator model above has a more complicated likelihood function since the mediator M is a 21
23 combination of a binary variable and a continuous variable. For a sample i = 1,..., N n LM i x i, c i = P rm > 0 x i, c i fm i M i > 0, x i, c i i=1 N 1 P rm > 0 x i, c i 11a i=n+1 n LY i, M i x i, c i = P rm > 0 x i, c i fm i M i > 0, x i, c i fy i M i > 0, x i, c i i=1 N 1 P rm > 0 x i, c i fy i M i = 0, x i, c i 11b i=n+1 In Duan et al the likelihood of a two-part dependent variable is shown to be Equation 11a, which implies the full likelihood of Equation 11b. The expressions fm i... and fm i... are the conditional densities of M and Y. Note that in the two-group modelling of Y the conditional density of Y is not restricted to be the same in the first and the second product in Equation 11b. 3.3 Derivation of effects In this study the effect of X and C on Y are assumed equal for both groups of M. Thus β 1 2 = β 2 2 and β 1 4 = β 2 4, indicated by dropped superscript. Results to form the effects without these restrictions can be found in Appendix A Conditional expected value of Y One of the conditional expectations used to define the causal effects are shown in Equation 12. For a detailed explanation see Section 2.5. E[Y x 1, logmx 0 ] = = β β 1 0 β 2 0 Φκ 0 + κ 1 x 0 + κ 2 c + β 2 x 1 + β 4 c 1 Φκ 0 + κ 1 x 0 + κ 2 c+ Φκ 0 + κ 1 x 0 + κ 2 c + Φκ 0 + κ 1 x 0 + κ 2 c β 1 + β 3 x 1 γ 0 + γ 1 x 0 + γ 2 c = = β β 1 0 β 2 0 Φκ 0 + κ 1 x 0 + κ 2 c + β 2 x 1 + β 4 c+ Φκ 0 + κ 1 x 0 + κ 2 c β 1 + β 3 x 1 γ 0 + γ 1 x 0 + γ 2 c 12 22
24 3.3.2 Causal effects The complete derivation is displayed in Appendix A. The simplified effects are given in Equation The Total Natural Indirect Effect T NIE = E[Y x 1, logmx 1 C = c] E[Y x 1, logmx 0 C = c] = = β 1 0 β 2 0 Φκ 0 + κ 1 x 1 + κ 2 c Φκ 0 + κ 1 x 0 + κ 2 c + β 1 + β 3 x 1 Φκ 0 + κ 1 x 1 + κ 2 c γ 0 + γ 1 x 1 + γ 2 c Φκ 0 + κ 1 x 0 + κ 2 c γ 0 + γ 1 x 0 + γ 2 c 13 The Pure Natural Direct Effect P NDE = E[Y x 1, logmx 0 C = c] E[Y x 0, logmx 0 C = c] = = β 2 x 1 x 0 + Φκ 0 + κ 1 x 0 + κ 2 c γ 0 + γ 1 x 0 + γ 2 c β 3 x 1 x 0 14 The Pure Natural Indirect Effect P NIE = E[Y x 0, logmx 1 C = c] E[Y x 0, logmx 0 C = c] = β 1 0 β 2 0 Φκ 0 + κ 1 x 1 + κ 2 c Φκ 0 + κ 1 x 0 + κ 2 c + β 1 + β 3 x 0 Φκ 0 + κ 1 x 1 + κ 2 c γ 0 + γ 1 x 1 + γ 2 c Φκ 0 + κ 1 x 0 + κ 2 c γ 0 + γ 1 x 0 + γ 2 c 15 The Total Natural Direct Effect T NDE = E[Y x 1, logmx 1 C = c] E[Y x 0, logmx 1 C = c] = = β 2 x 1 x 0 + Φκ 0 + κ 1 x 1 + κ 2 c γ 0 + γ 1 x 1 + γ 2 c β 3 x 1 x 0 16 The Total effect T E = E[Y x 1, logmx 1 C = c] E[Y x 0, logmx 0 C = c] = = β 1 0 β 2 0 Φκ 0 + κ 1 x 1 + κ 2 c Φκ 0 + κ 1 x 0 + κ 2 c + β 2 x 1 x Φκ 0 + κ 1 x 1 + κ 2 c β 1 + β 3 x 1 γ 0 + γ 1 x 1 + γ 2 c 17 Φκ 0 + κ 1 x 0 + κ 2 c β 1 + β 3 x 0 γ 0 + γ 1 x 0 + γ 2 c 23
25 4 Mediation, two-part M and two-part Y In this section the mediation model where both the mediator and the outcome are limited is considered. 4.1 Model formulation If the mediator M and the outcome Y are limited, both the regression of M on X and the regression of Y on M and X are affected. Again the two-part model will be used to account for the censoring in both M and Y. The two-group regression setup for Y implies that two more regressions will be added due to the combination with the two-part model. Both the continuous part of M and the continuous part of Y rely on distributional assumptions. In this study both dependent variables are assumed to follow the lognormal distribution. The model formulation of a two-part M, two-part Y mediation model is displayed in Equation 18. logy i M>0 = β β 1 logm i + β 1 2 X i + β 3 logm i X i + β 1 4 C i + ɛ yi 18a probitp ry i M>0 > 0 = θ θ 1 logm i + θ 1 2 X i + θ 3 logm i X i + θ 1 4 C i 18b logy i M=0 = β β 2 2 X i + β 2 4 C i + ɛ yi 18c probitp ry im=0 > 0 = θ θ 2 2 X i + θ 2 4 C i 18d logm i M > 0 = γ 0 + γ 1 X i + γ 2 C i + ɛ mi probitp rm i > 0 = κ 0 + κ 1 X i + κ 2 C i 18e 18f, where by assumption ɛ y i N0, σy 2 and ɛ m i N0, σm. 2 This is a non-linear model and the counterfactual framework will be used to define the effects. The path diagram of Equation 18 is shown in Figure 5. The mediator M and the outcome Y are both separated into one zero/non-zero part M* and Y*, and one non-zero corresponding continuous 24
26 part M and Y. If M =1 then the observation is not censored and the observation has a corresponding continuous value M. If M =0 then the observation is censored and has no continuous value. The same goes for Y and Y. M* and Y* are assumed to be generated from dichotomized normal distributions. The exposure X affects M, M, Y and Y. M affects only the two-part Y belonging to the group M>0. Moreover, X are allowed to moderate the effect of M on Y. Additionally, a covariate measured at the same time point as X is allowed to affect both parts of the mediator and the outcome. The full model implied by Figure 5 is kept throughout the derivations, however in the last step some parameters will be restricted to limit the scope of the Monte Carlo simulation study in Section 5. Expressions for calculating unrestricted effects are available in Appendix B. This model allows for moderation between the zero/non-zero part of the mediator M*, and the effect of X on Y, since the slopes of the two group analysis of Y is not restricted. That is, the difference in between the slope of Y on X in the two groups is a measure of the interaction effect. The detailed motivation of this model formulation is discussed in Section 2. As in the two-part M model in Equation 10, the zero/non-zero parts are modelled with probit regression, and all the linear parts are modelled with standard linear regressions. 4.2 Estimation The likelihood function in Equation 11 is extended in Equation 19 to account for twopart modelling of both M and Y. The four combinations of zero/non-zero M and Y is represented by one product each in Equation 19. The expressions fm i... and fm i... 25
27 C M Two-part ɛ M Y M=0 M ɛ Y Two-part, M = 0 Y M=0 Two-group X Y M>0 ɛ Y Two-part, M > 0 Y M>0 Figure 5: Path diagram for a mediation model with a two-part mediator M and two-part outcome Y, combined with a two-group model of Y. M* and Y* are the binary observed variable coding for a observation being censored or not censored. are the conditional densities of M and Y. Let n be a random sample such that n = n g1 + n g2 + n g3 + n g4 and i = 1,..., n LY i, M i x i, c i = i g 1 P rm i > 0 x i, c i P ry i > 0 M i > 0, x i, c i fm i M i > 0, x i, c i fy i Y i > 0, M i > 0, x i, c i i g 2 1 P rm > 0 x i, c i P ry i > 0 M i = 0, x i, c i fy i Y i > 0, M i = 0, x i, c i i g 3 P rm i > 0 x i, c i 1 P ry i > 0 M i > 0, x i, c i fm i M i > 0, x i, c i i g 4 1 P rm i > 0 x i, c i 1 P ry i > 0 M i > 0, x i, c i 19 26
28 4.3 Derivation of causal effects In this study the effect of X and C on Y are assumed equal for both groups of M. Thus β 1 2 = β 2 2, β 1 4 = β 2 4, θ 1 2 = θ 2 2 and θ 1 4 = θ 2 4 indicated by dropped superscript. Results to form the unrestricted effects can be found in the detailed derivation in Appendix B Conditional expected values One of the conditional expectations used to define the causal effects are shown in Equation 20. For detailed explanation of these conditional expected values see Section 2.5. E[Y x 0, logmx 0 ] = φ exp β 2 x 0 + β 4 c exp exp β µ2 b 2 1 2σM 2 β 2 0 Φ θ θ 2 x 0 + θ 4 c 1 Φ κ 0 + κ 1 x 0 + κ 2 c + Φ κ 0 + κ 1 x 0 + κ 2 c Φ θ1 0 + θ 2 x 0 + θ 4 c + θ 1 + θ 3 x 0 bµ θ 1 + θ 3 x 0 2 σm Causal effects The complete derivation is displayed in Appendix B. The simplified effects given the restrictions mentioned above are given in Equation Note that b and µ in Equation 20 are substituted see details in Appendix B. Some further simplifications are possible, however without gaining simplicity and with loss of intuition. 27
Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop
Hierarchical Generalized Linear Models Measurement Incorporated Hierarchical Linear Models Workshop Hierarchical Generalized Linear Models So now we are moving on to the more advanced type topics. To begin
More informationدرس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی
یادگیري ماشین توزیع هاي نمونه و تخمین نقطه اي پارامترها Sampling Distributions and Point Estimation of Parameter (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی درس هفتم 1 Outline Introduction
More informationAustralian Journal of Basic and Applied Sciences. Conditional Maximum Likelihood Estimation For Survival Function Using Cox Model
AENSI Journals Australian Journal of Basic and Applied Sciences Journal home page: wwwajbaswebcom Conditional Maximum Likelihood Estimation For Survival Function Using Cox Model Khawla Mustafa Sadiq University
More informationOmitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations
Journal of Statistical and Econometric Methods, vol. 2, no.3, 2013, 49-55 ISSN: 2051-5057 (print version), 2051-5065(online) Scienpress Ltd, 2013 Omitted Variables Bias in Regime-Switching Models with
More informationExtend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty
Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty George Photiou Lincoln College University of Oxford A dissertation submitted in partial fulfilment for
More informationMarket Risk: FROM VALUE AT RISK TO STRESS TESTING. Agenda. Agenda (Cont.) Traditional Measures of Market Risk
Market Risk: FROM VALUE AT RISK TO STRESS TESTING Agenda The Notional Amount Approach Price Sensitivity Measure for Derivatives Weakness of the Greek Measure Define Value at Risk 1 Day to VaR to 10 Day
More informationDiscussion of: On the Aggregation and Valuation of Deferred Taxes
C Review of Accounting Studies, 6, 299 304, 2001 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Discussion of: On the Aggregation and Valuation of Deferred Taxes RUSSELL J. LUNDHOLM
More informationEffect size measures for mediation models: a critical evaluation of κ 2
Faculty of Sciences Effect size measures for mediation models: a critical evaluation of κ 2 Wouter Talloen Master dissertation submitted to obtain the degree of Master of Statistical Data Analysis Promotor:
More informationYannan Hu 1, Frank J. van Lenthe 1, Rasmus Hoffmann 1,2, Karen van Hedel 1,3 and Johan P. Mackenbach 1*
Hu et al. BMC Medical Research Methodology (2017) 17:68 DOI 10.1186/s12874-017-0317-5 RESEARCH ARTICLE Open Access Assessing the impact of natural policy experiments on socioeconomic inequalities in health:
More informationA Probabilistic Approach to Determining the Number of Widgets to Build in a Yield-Constrained Process
A Probabilistic Approach to Determining the Number of Widgets to Build in a Yield-Constrained Process Introduction Timothy P. Anderson The Aerospace Corporation Many cost estimating problems involve determining
More informationOn Sensitivity Value of Pair-Matched Observational Studies
On Sensitivity Value of Pair-Matched Observational Studies Qingyuan Zhao Department of Statistics, University of Pennsylvania August 2nd, JSM 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.
More informationRisk Management and Time Series
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Risk Management and Time Series Time series models are often employed in risk management applications. They can be used to estimate
More information9. Logit and Probit Models For Dichotomous Data
Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar
More informationModelling Returns: the CER and the CAPM
Modelling Returns: the CER and the CAPM Carlo Favero Favero () Modelling Returns: the CER and the CAPM 1 / 20 Econometric Modelling of Financial Returns Financial data are mostly observational data: they
More informationStock Price Sensitivity
CHAPTER 3 Stock Price Sensitivity 3.1 Introduction Estimating the expected return on investments to be made in the stock market is a challenging job before an ordinary investor. Different market models
More informationEconomics 742 Brief Answers, Homework #2
Economics 742 Brief Answers, Homework #2 March 20, 2006 Professor Scholz ) Consider a person, Molly, living two periods. Her labor income is $ in period and $00 in period 2. She can save at a 5 percent
More informationPractical example of an Economic Scenario Generator
Practical example of an Economic Scenario Generator Martin Schenk Actuarial & Insurance Solutions SAV 7 March 2014 Agenda Introduction Deterministic vs. stochastic approach Mathematical model Application
More informationYao s Minimax Principle
Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,
More informationMuch of what appears here comes from ideas presented in the book:
Chapter 11 Robust statistical methods Much of what appears here comes from ideas presented in the book: Huber, Peter J. (1981), Robust statistics, John Wiley & Sons (New York; Chichester). There are many
More informationAlternative VaR Models
Alternative VaR Models Neil Roeth, Senior Risk Developer, TFG Financial Systems. 15 th July 2015 Abstract We describe a variety of VaR models in terms of their key attributes and differences, e.g., parametric
More informationA Two-Step Estimator for Missing Values in Probit Model Covariates
WORKING PAPER 3/2015 A Two-Step Estimator for Missing Values in Probit Model Covariates Lisha Wang and Thomas Laitila Statistics ISSN 1403-0586 http://www.oru.se/institutioner/handelshogskolan-vid-orebro-universitet/forskning/publikationer/working-papers/
More informationLabor Economics Field Exam Spring 2014
Labor Economics Field Exam Spring 2014 Instructions You have 4 hours to complete this exam. This is a closed book examination. No written materials are allowed. You can use a calculator. THE EXAM IS COMPOSED
More informationSample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method
Meng-Jie Lu 1 / Wei-Hua Zhong 1 / Yu-Xiu Liu 1 / Hua-Zhang Miao 1 / Yong-Chang Li 1 / Mu-Huo Ji 2 Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method Abstract:
More informationList of tables List of boxes List of screenshots Preface to the third edition Acknowledgements
Table of List of figures List of tables List of boxes List of screenshots Preface to the third edition Acknowledgements page xii xv xvii xix xxi xxv 1 Introduction 1 1.1 What is econometrics? 2 1.2 Is
More informationVolume 37, Issue 2. Handling Endogeneity in Stochastic Frontier Analysis
Volume 37, Issue 2 Handling Endogeneity in Stochastic Frontier Analysis Mustafa U. Karakaplan Georgetown University Levent Kutlu Georgia Institute of Technology Abstract We present a general maximum likelihood
More informationSmall Sample Bias Using Maximum Likelihood versus. Moments: The Case of a Simple Search Model of the Labor. Market
Small Sample Bias Using Maximum Likelihood versus Moments: The Case of a Simple Search Model of the Labor Market Alice Schoonbroodt University of Minnesota, MN March 12, 2004 Abstract I investigate the
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 7, June 13, 2013 This version corrects errors in the October 4,
More informationCorrecting for Survival Effects in Cross Section Wage Equations Using NBA Data
Correcting for Survival Effects in Cross Section Wage Equations Using NBA Data by Peter A Groothuis Professor Appalachian State University Boone, NC and James Richard Hill Professor Central Michigan University
More informationThe Two-Sample Independent Sample t Test
Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 The General Formula The Equal-n Formula 4 5 6 Independence Normality Homogeneity of Variances 7 Non-Normality Unequal
More informationQuestions of Statistical Analysis and Discrete Choice Models
APPENDIX D Questions of Statistical Analysis and Discrete Choice Models In discrete choice models, the dependent variable assumes categorical values. The models are binary if the dependent variable assumes
More informationINTERTEMPORAL ASSET ALLOCATION: THEORY
INTERTEMPORAL ASSET ALLOCATION: THEORY Multi-Period Model The agent acts as a price-taker in asset markets and then chooses today s consumption and asset shares to maximise lifetime utility. This multi-period
More informationFE670 Algorithmic Trading Strategies. Stevens Institute of Technology
FE670 Algorithmic Trading Strategies Lecture 4. Cross-Sectional Models and Trading Strategies Steve Yang Stevens Institute of Technology 09/26/2013 Outline 1 Cross-Sectional Methods for Evaluation of Factor
More informationCHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
More informationStatistical Analysis of Life Insurance Policy Termination and Survivorship
Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Sunway University, Malaysia Kuala
More informationNote on Cost of Capital
DUKE UNIVERSITY, FUQUA SCHOOL OF BUSINESS ACCOUNTG 512F: FUNDAMENTALS OF FINANCIAL ANALYSIS Note on Cost of Capital For the course, you should concentrate on the CAPM and the weighted average cost of capital.
More informationAppendix A (Pornprasertmanit & Little, in press) Mathematical Proof
Appendix A (Pornprasertmanit & Little, in press) Mathematical Proof Definition We begin by defining notations that are needed for later sections. First, we define moment as the mean of a random variable
More informationEconometric Methods for Valuation Analysis
Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 25 Outline We will consider econometric
More informationSensitivity Analysis for Unmeasured Confounding: Formulation, Implementation, Interpretation
Sensitivity Analysis for Unmeasured Confounding: Formulation, Implementation, Interpretation Joseph W Hogan Department of Biostatistics Brown University School of Public Health CIMPOD, February 2016 Hogan
More informationCharacterization of the Optimum
ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing
More informationIntroduction to the Maximum Likelihood Estimation Technique. September 24, 2015
Introduction to the Maximum Likelihood Estimation Technique September 24, 2015 So far our Dependent Variable is Continuous That is, our outcome variable Y is assumed to follow a normal distribution having
More informationPoint Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage
6 Point Estimation Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Point Estimation Statistical inference: directed toward conclusions about one or more parameters. We will use the generic
More informationBlack-Litterman Model
Institute of Financial and Actuarial Mathematics at Vienna University of Technology Seminar paper Black-Litterman Model by: Tetyana Polovenko Supervisor: Associate Prof. Dipl.-Ing. Dr.techn. Stefan Gerhold
More informationGPD-POT and GEV block maxima
Chapter 3 GPD-POT and GEV block maxima This chapter is devoted to the relation between POT models and Block Maxima (BM). We only consider the classical frameworks where POT excesses are assumed to be GPD,
More informationPARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS
PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS Melfi Alrasheedi School of Business, King Faisal University, Saudi
More informationAnalyzing Oil Futures with a Dynamic Nelson-Siegel Model
Analyzing Oil Futures with a Dynamic Nelson-Siegel Model NIELS STRANGE HANSEN & ASGER LUNDE DEPARTMENT OF ECONOMICS AND BUSINESS, BUSINESS AND SOCIAL SCIENCES, AARHUS UNIVERSITY AND CENTER FOR RESEARCH
More informationINTEREST RATES AND FX MODELS
INTEREST RATES AND FX MODELS 7. Risk Management Andrew Lesniewski Courant Institute of Mathematical Sciences New York University New York March 8, 2012 2 Interest Rates & FX Models Contents 1 Introduction
More informationMaximum Likelihood Estimates for Alpha and Beta With Zero SAIDI Days
Maximum Likelihood Estimates for Alpha and Beta With Zero SAIDI Days 1. Introduction Richard D. Christie Department of Electrical Engineering Box 35500 University of Washington Seattle, WA 98195-500 christie@ee.washington.edu
More informationIntro to GLM Day 2: GLM and Maximum Likelihood
Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the
More informationA MODIFIED MULTINOMIAL LOGIT MODEL OF ROUTE CHOICE FOR DRIVERS USING THE TRANSPORTATION INFORMATION SYSTEM
A MODIFIED MULTINOMIAL LOGIT MODEL OF ROUTE CHOICE FOR DRIVERS USING THE TRANSPORTATION INFORMATION SYSTEM Hing-Po Lo and Wendy S P Lam Department of Management Sciences City University of Hong ong EXTENDED
More informationModelling the Sharpe ratio for investment strategies
Modelling the Sharpe ratio for investment strategies Group 6 Sako Arts 0776148 Rik Coenders 0777004 Stefan Luijten 0783116 Ivo van Heck 0775551 Rik Hagelaars 0789883 Stephan van Driel 0858182 Ellen Cardinaels
More informationNPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling
1 P age NPTEL Project Econometric Modelling Vinod Gupta School of Management Module 16: Qualitative Response Regression Modelling Lecture 20: Qualitative Response Regression Modelling Rudra P. Pradhan
More informationFinancial Economics Field Exam August 2011
Financial Economics Field Exam August 2011 There are two questions on the exam, representing Macroeconomic Finance (234A) and Corporate Finance (234C). Please answer both questions to the best of your
More informationDynamic Replication of Non-Maturing Assets and Liabilities
Dynamic Replication of Non-Maturing Assets and Liabilities Michael Schürle Institute for Operations Research and Computational Finance, University of St. Gallen, Bodanstr. 6, CH-9000 St. Gallen, Switzerland
More informationA potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples
1.3 Regime switching models A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples (or regimes). If the dates, the
More informationBakke & Whited [JF 2012] Threshold Events and Identification: A Study of Cash Shortfalls Discussion by Fabian Brunner & Nicolas Boob
Bakke & Whited [JF 2012] Threshold Events and Identification: A Study of Cash Shortfalls Discussion by Background and Motivation Rauh (2006): Financial constraints and real investment Endogeneity: Investment
More informationIntroduction to Sequential Monte Carlo Methods
Introduction to Sequential Monte Carlo Methods Arnaud Doucet NCSU, October 2008 Arnaud Doucet () Introduction to SMC NCSU, October 2008 1 / 36 Preliminary Remarks Sequential Monte Carlo (SMC) are a set
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis These models are appropriate when the response
More informationImplementing Personalized Medicine: Estimating Optimal Treatment Regimes
Implementing Personalized Medicine: Estimating Optimal Treatment Regimes Baqun Zhang, Phillip Schulte, Anastasios Tsiatis, Eric Laber, and Marie Davidian Department of Statistics North Carolina State University
More informationChapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29
Chapter 5 Univariate time-series analysis () Chapter 5 Univariate time-series analysis 1 / 29 Time-Series Time-series is a sequence fx 1, x 2,..., x T g or fx t g, t = 1,..., T, where t is an index denoting
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT4 Models Nov 2012 Examinations INDICATIVE SOLUTIONS Question 1: i. The Cox model proposes the following form of hazard function for the th life (where, in keeping
More informationAssicurazioni Generali: An Option Pricing Case with NAGARCH
Assicurazioni Generali: An Option Pricing Case with NAGARCH Assicurazioni Generali: Business Snapshot Find our latest analyses and trade ideas on bsic.it Assicurazioni Generali SpA is an Italy-based insurance
More informationOn Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 8-26-2016 On Some Test Statistics for Testing the Population Skewness and Kurtosis:
More informationAnalysis of truncated data with application to the operational risk estimation
Analysis of truncated data with application to the operational risk estimation Petr Volf 1 Abstract. Researchers interested in the estimation of operational risk often face problems arising from the structure
More informationLecture 10: Alternatives to OLS with limited dependent variables, part 1. PEA vs APE Logit/Probit
Lecture 10: Alternatives to OLS with limited dependent variables, part 1 PEA vs APE Logit/Probit PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample
More informationThe following content is provided under a Creative Commons license. Your support
MITOCW Recitation 6 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make
More informationIntroductory Econometrics for Finance
Introductory Econometrics for Finance SECOND EDITION Chris Brooks The ICMA Centre, University of Reading CAMBRIDGE UNIVERSITY PRESS List of figures List of tables List of boxes List of screenshots Preface
More informationIEOR E4602: Quantitative Risk Management
IEOR E4602: Quantitative Risk Management Basic Concepts and Techniques of Risk Management Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationGeographical and Temporal Variations in the Effects of Right-to-Carry Laws on Crime
Geographical and Temporal Variations in the Effects of Right-to-Carry Laws on Crime Florenz Plassmann Department of Economics, SUNY Binghamton, Binghamton, NY 13902-6000 T. Nicolaus Tideman Department
More informationSIMULATION OF ELECTRICITY MARKETS
SIMULATION OF ELECTRICITY MARKETS MONTE CARLO METHODS Lectures 15-18 in EG2050 System Planning Mikael Amelin 1 COURSE OBJECTIVES To pass the course, the students should show that they are able to - apply
More informationThe Multinomial Logit Model Revisited: A Semiparametric Approach in Discrete Choice Analysis
The Multinomial Logit Model Revisited: A Semiparametric Approach in Discrete Choice Analysis Dr. Baibing Li, Loughborough University Wednesday, 02 February 2011-16:00 Location: Room 610, Skempton (Civil
More informationLecture 5 Theory of Finance 1
Lecture 5 Theory of Finance 1 Simon Hubbert s.hubbert@bbk.ac.uk January 24, 2007 1 Introduction In the previous lecture we derived the famous Capital Asset Pricing Model (CAPM) for expected asset returns,
More informationApproximating the Confidence Intervals for Sharpe Style Weights
Approximating the Confidence Intervals for Sharpe Style Weights Angelo Lobosco and Dan DiBartolomeo Style analysis is a form of constrained regression that uses a weighted combination of market indexes
More informationFactors in Implied Volatility Skew in Corn Futures Options
1 Factors in Implied Volatility Skew in Corn Futures Options Weiyu Guo* University of Nebraska Omaha 6001 Dodge Street, Omaha, NE 68182 Phone 402-554-2655 Email: wguo@unomaha.edu and Tie Su University
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #6 EPSY 905: Maximum Likelihood In This Lecture The basics of maximum likelihood estimation Ø The engine that
More informationLECTURE 2: MULTIPERIOD MODELS AND TREES
LECTURE 2: MULTIPERIOD MODELS AND TREES 1. Introduction One-period models, which were the subject of Lecture 1, are of limited usefulness in the pricing and hedging of derivative securities. In real-world
More informationEstimation of a parametric function associated with the lognormal distribution 1
Communications in Statistics Theory and Methods Estimation of a parametric function associated with the lognormal distribution Jiangtao Gou a,b and Ajit C. Tamhane c, a Department of Mathematics and Statistics,
More informationEmpirical Methods for Corporate Finance. Regression Discontinuity Design
Empirical Methods for Corporate Finance Regression Discontinuity Design Basic Idea of RDD Observations (e.g. firms, individuals, ) are treated based on cutoff rules that are known ex ante For instance,
More informationGetting Started with CGE Modeling
Getting Started with CGE Modeling Lecture Notes for Economics 8433 Thomas F. Rutherford University of Colorado January 24, 2000 1 A Quick Introduction to CGE Modeling When a students begins to learn general
More informationModels of Multinomial Qualitative Response
Models of Multinomial Qualitative Response Multinomial Logit Models October 22, 2015 Dependent Variable as a Multinomial Outcome Suppose we observe an economic choice that is a binary signal from amongst
More informationMarkowitz portfolio theory
Markowitz portfolio theory Farhad Amu, Marcus Millegård February 9, 2009 1 Introduction Optimizing a portfolio is a major area in nance. The objective is to maximize the yield and simultaneously minimize
More informationWeek 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals
Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg :
More informationCHAPTER II LITERATURE STUDY
CHAPTER II LITERATURE STUDY 2.1. Risk Management Monetary crisis that strike Indonesia during 1998 and 1999 has caused bad impact to numerous government s and commercial s bank. Most of those banks eventually
More informationChoice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.
1/31 Choice Probabilities Basic Econometrics in Transportation Logit Models Amir Samimi Civil Engineering Department Sharif University of Technology Primary Source: Discrete Choice Methods with Simulation
More informationChapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi
Chapter 4: Commonly Used Distributions Statistics for Engineers and Scientists Fourth Edition William Navidi 2014 by Education. This is proprietary material solely for authorized instructor use. Not authorized
More information4: SINGLE-PERIOD MARKET MODELS
4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period
More informationChapter 1 Microeconomics of Consumer Theory
Chapter Microeconomics of Consumer Theory The two broad categories of decision-makers in an economy are consumers and firms. Each individual in each of these groups makes its decisions in order to achieve
More informationPoint Estimation. Some General Concepts of Point Estimation. Example. Estimator quality
Point Estimation Some General Concepts of Point Estimation Statistical inference = conclusions about parameters Parameters == population characteristics A point estimate of a parameter is a value (based
More informationEconometrics and Economic Data
Econometrics and Economic Data Chapter 1 What is a regression? By using the regression model, we can evaluate the magnitude of change in one variable due to a certain change in another variable. For example,
More informationEquity, Vacancy, and Time to Sale in Real Estate.
Title: Author: Address: E-Mail: Equity, Vacancy, and Time to Sale in Real Estate. Thomas W. Zuehlke Department of Economics Florida State University Tallahassee, Florida 32306 U.S.A. tzuehlke@mailer.fsu.edu
More informationPoint Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel
STATISTICS Lecture no. 10 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 8. 12. 2009 Introduction Suppose that we manufacture lightbulbs and we want to state
More information1 Roy model: Chiswick (1978) and Borjas (1987)
14.662, Spring 2015: Problem Set 3 Due Wednesday 22 April (before class) Heidi L. Williams TA: Peter Hull 1 Roy model: Chiswick (1978) and Borjas (1987) Chiswick (1978) is interested in estimating regressions
More informationCourse information FN3142 Quantitative finance
Course information 015 16 FN314 Quantitative finance This course is aimed at students interested in obtaining a thorough grounding in market finance and related empirical methods. Prerequisite If taken
More informationPRE CONFERENCE WORKSHOP 3
PRE CONFERENCE WORKSHOP 3 Stress testing operational risk for capital planning and capital adequacy PART 2: Monday, March 18th, 2013, New York Presenter: Alexander Cavallo, NORTHERN TRUST 1 Disclaimer
More informationInternational Financial Markets 1. How Capital Markets Work
International Financial Markets Lecture Notes: E-Mail: Colloquium: www.rainer-maurer.de rainer.maurer@hs-pforzheim.de Friday 15.30-17.00 (room W4.1.03) -1-1.1. Supply and Demand on Capital Markets 1.1.1.
More informationAdvanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras
Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras Lecture 23 Minimum Cost Flow Problem In this lecture, we will discuss the minimum cost
More informationExercise 14 Interest Rates in Binomial Grids
Exercise 4 Interest Rates in Binomial Grids Financial Models in Excel, F65/F65D Peter Raahauge December 5, 2003 The objective with this exercise is to introduce the methodology needed to price callable
More informationThe Optimization Process: An example of portfolio optimization
ISyE 6669: Deterministic Optimization The Optimization Process: An example of portfolio optimization Shabbir Ahmed Fall 2002 1 Introduction Optimization can be roughly defined as a quantitative approach
More informationThe mean-variance portfolio choice framework and its generalizations
The mean-variance portfolio choice framework and its generalizations Prof. Massimo Guidolin 20135 Theory of Finance, Part I (Sept. October) Fall 2014 Outline and objectives The backward, three-step solution
More informationBloomberg. Portfolio Value-at-Risk. Sridhar Gollamudi & Bryan Weber. September 22, Version 1.0
Portfolio Value-at-Risk Sridhar Gollamudi & Bryan Weber September 22, 2011 Version 1.0 Table of Contents 1 Portfolio Value-at-Risk 2 2 Fundamental Factor Models 3 3 Valuation methodology 5 3.1 Linear factor
More informationRandom Variables and Probability Distributions
Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering
More information