Resampling techniques to determine direction of effects in linear regression models

Size: px
Start display at page:

Download "Resampling techniques to determine direction of effects in linear regression models"

Transcription

1 Resampling techniques to determine direction of effects in linear regression models Wolfgang Wiedermann, Michael Hagmann, Michael Kossmeier, & Alexander von Eye University of Vienna, Department of Psychology Corresponding Author: Wolfgang Wiedermann University of Vienna Unit of Research Methods Liebiggasse 5 A-1010 Vienna, Austria Tel wolfgang.wiedermann@univie.ac.at Acknowledgement The authors are indebted to Rainer W. Alexandrowicz, Ingrid Koller, and Anna P. Nutt. 1

2 Abstract Previous studies have shown that, in the context of linear regression analysis, the cube of the Pearson correlation coefficient can be expressed by the ratio of the third moment of the response variable to the third moment of the explanatory variable (Dodge & Rousson, 2001). This relation implies that the skewness of the response variable is always smaller than the skewness of the explanatory variable, and directional dependency can be determined based on the third moments of variables. The current study extends the concept of directional dependency and focuses on distributional properties of the residuals of two competing linear regression models. It is shown that the residual skewness of the mis-specified regression model is larger than the residual skewness of the true regression model. Based on this result, three significance tests are developed that can be used to determine the direction of dependence in non-normally distributed samples. A Monte-Carlo simulation experiment is performed to analyze robustness and power properties of the proposed tests under various degrees of correlations, sample sizes, and population distributions. Additionally, an empirical example is provided which underlines important assumptions of the proposed resampling procedures. Recommendations are given for making decisions concerning the direction of effects based on the three significance tests. Keywords: direction of effects, directional dependence, permutation, bootstrap, significance test 2

3 Concepts of correlation and linear regression are widely used in observational research. Although both concepts are highly related to each other, important differences exist. A correlation coefficient is usually defined by a symmetric formula which refers to the fact that two variables share the same role (i.e., no statements about causation is made). In contrast, in regression analyses it is necessary to define an explanatory (independent) variable and a response (dependent) variable. Thus, regression analysis requires a theoretical derivation of the causal relation between the variables of interest, which predefines the direction of a potentially observable effect. In many applications the direction of effects is obvious because reversing the direction seems implausible. For example, observing a negative correlation between age and alcohol intake, it is more plausible to assume that growing age causes a reduction of alcohol consumption due to a maturing out effect (e.g., Muthén & Muthén, 2000) than the assumption that alcohol intake rejuvenates individuals. Thus, in a linear regression model it is more plausible to regress alcohol intake on age than vice versa. However, in particular in observational research, examples exist where causal inference is more challenging because of (at least) two alternatives to causation: reverse causation and confounding as alternative explanations for the association between an exposure (X) and an outcome (Y). Reverse causation means that both directions are conceivable to explain the observed effect (i.e., X causes Y vs. Y causes X). Confounding means that a third variable (Z) influences the causal relationship between X and Y (McGue, Osler, & Christensen, 2010; McNamee, 2003). Consider, for example, cannabis use and the occurrence of schizophrenia. Here, it might be less clear whether cannabis use causes schizophrenia (causation) or whether patients with schizophrenia diagnoses are more likely to use cannabis (reverse causation; Arseneault et al., 2004; Degenhardt et al., 2009). In addition, the association between cannabis consumption and schizophrenia might also be confounded 3

4 by additional risk factors such as familial functioning or medical history (Smit, Bolier & Cuijpers, 2004). Standard linear regression or correlational analyses do not allow one to derive conclusions concerning the direction of effects. This can simply be demonstrated by considering that the Pearson correlation of two continuous standardized variables (X and Y) is mathematically identical to the linear regression parameter of X, assuming that Y is regressed on X, and the linear regression parameter of Y (assuming a model where X is regressed on Y) (see for example; Rodgers & Nicewander, 1988; von Eye & DeShon, 2012a). Recently, Dodge and Rousson (2001) discussed asymmetric properties of the correlation coefficient (see also Dodge & Rousson, 2000; Dodge & Yadegari, 2010; Muddapur, 2003; von Eye & DeShon, 2008) and have shown that the correlation coefficient can be expressed as the ratio of the third central moments of the response and the explanatory variable under the assumption that the explanatory variable is not normally distributed an assumption which seems reasonable for data obtained in the social sciences (Micceri, 1989). This asymmetric property implies that statements about the directional dependence can be made based on indicators of skewness. Von Eye and DeShon (2008) and Dodge and Yadegari (2010) have extended this idea to the case of the fourth and fifth order moments. The aim of the present study is to propose statistical inference methods for deciding which regression model seems more appropriate to describe the data generating process (i.e., Y is regressed on X versus X is regressed on Y). The article is structured as follows. First, we give a brief introduction to the concept of directional dependence. Second, instead of focusing on the third moment of the explanatory and the response variable, we focus on properties of the third moments of the residuals in a standard linear regression setting. We show that, under certain assumptions, the skewness of model residuals can serve as a basis to 4

5 determine the direction of effects. Third, based on this proposition, three significance tests are suggested for the comparison of two competing regression models. Fourth, results of a Monte-Carlo simulation are presented to assess the accuracy of the proposed tests. Fifth, we present an empirical example to illustrate important characteristics of the proposed tests. Finally, practical recommendations, limitations, and potential future research directions are discussed. The concept of directional dependence Let X and Y be two continuous variables. Then the two linear regression models (1) and (2) can be estimated to represent the data generating process. Here, and refer to the intercepts of the models, and refer to the slope parameters, and and denote the model residuals. Generally, the residuals are assumed to be normally distributed (with an expected value of zero; [ ] ), homoscedastic, and are independent of the explanatory variable. Without restricting generality and for simplicity, we further assume in the following that both variables X and Y are skewed and positively correlated. Let (3) be the Pearson correlation coefficient, where is the covariance of X and Y and and are the standard deviations of X and Y, respectively. In addition, let 5

6 [ ] (4) and [ ] (5) be the skewness of X and Y, i.e., the third moment of the variables. Dodge and Rousson (2000, 2001) as well as Dodge and Yadegari (2010) have shown that the following relationship between the third moments of the variables holds for the linear regression model given in Equation (1):, (6) where refers to the skewness of the residuals regressing Y on X. Furthermore, assuming symmetrically distributed residuals ( ) and an asymmetric distribution for X ( ) it follows that. Because the range of the correlation is 1 ρ 1, this relation implies that the skewness of the response variable will always be smaller than the skewness of the explanatory variable. Based on this implication, the following decisions about the direction of effects can be derived: (1) if and, then X is the explanatory variable and Y is the response, (2) if and, then Y is the explanatory and X is the response, and (3) if then the direction of the effect cannot be determined (cf., von Eye & DeShon, 2012a). In addition to these descriptive guidelines von Eye and DeShon (2008, 2012a) and Pornprasertmanit and Little (2012) propose using significance tests to determine directional dependence. The basic idea given in von Eye and DeShon (2008 and 2012a) is to use D Agostino s normality test (D Agostino, 1971) to assess whether X and/or Y significantly deviate from normality. Pornprasertmanit and Little (2012) suggest using bootstrapping techniques to evaluate the higher moments of X and Y. 6

7 Some distributional properties of the residuals We can use the concept of directional dependence as a starting point for the following considerations: Instead of focusing on distributional properties of X and Y, we focus on distributional properties of the residuals and. Von Eye and DeShon (2012a) emphasize that if a symmetric error variable (e.g., normally distributed noise) is added to a skewed explanatory variable (X), the resulting variable Y will be less skewed than X. As discussed in the previous section, directional dependency implies that (under the true model) the skewness of the response variable will always be less than the skewness of the explanatory variable. It follows that the residuals of the mis-specified model ( ) will be more asymmetric than the residuals of the correctly specified model ( ) and that the skewness of the residuals ( and ) can be used to decide between the two models. We now describe the first three moments of the residuals of the mis-specified model ( ) in detail. For simplicity and without loss of generality we set the intercepts in both models ( and ) to zero. Considering the relationship (7) and inserting (1) in (2), we obtain the following model for the error :. (8) Assuming that X and are independent, the following relation holds for the variances in Equation (8): ( ). (9) In this fashion we insert the third central moments in Equation (8) and arrive at 7

8 [( ) ] ( ( ) ) [( ) ] ( ) [( ) ]. (10) Now, assuming a symmetric distribution for (i.e., the true model) in (10), we obtain the following interesting implications: 1) The skewness of X and the skewness of have the same sign. 2) Assuming that the remaining terms are fixed, the skewness of increases with increasing skewness of X. 3) Assuming that the remaining terms are fixed, the skewness of decreases with increasing correlation ρ. From these implications, we can conclude that the skewness of model residuals can be used as a valuable indicator to determine the direction of effects. In the following section, we describe three significance tests for evaluating the symmetry of model residuals. Significance tests for deciding on the direction of effects Pornprasertmanit and Little (2012) suggested using bootstrapping techniques (see e.g., Efron & Tibshirani, 1993) to decide whether and deviate from expectancy, and whether the difference in skewness estimates deviates from expectancy. Many previous studies have demonstrated that the application of resampling techniques conserves Type I Error rates and leads to improved power in various settings of statistical inference (Keselman et al., 2008; Wilcox, Keselman, & Kowalchuk, 1998; Wilcox, 2005). In the following section we introduce three resampling approaches to assess whether the observed skewness of the residuals deviates significantly from zero. 8

9 Nonparametric bootstrap test. Previous studies, which investigated the accuracy of bootstrap strategies in the linear regression context, have mainly focused on robust inference on regression coefficients (Afifi et al., 2007; Wilcox, 2003, 2005; Wu, 1986). We suggest the application of a nonparametric bootstrap to construct a bootstrapped cumulative density function (bcdf) of the skewness estimates of model residuals. In detail, bootstrap samples of size n (i.e., the number of observations) are randomly drawn with replacement out of the observed model residuals. Let be the skewness estimate based on a bootstrap sample. This process is repeated m times and the resulting bcdf consists of m bootstrap skewness estimates. In order to test the null hypothesis let be the probability that the bootstrap estimate is greater than the theoretical value of zero, i.e.,. Then can be estimated as the proportion of bootstrap estimates for which exceeds zero, where is an indicator function which takes value 1 if is true, and 0 otherwise. Under the null hypothesis, approaches a uniform distribution as n and m increase (Wilcox, 2005) and the null hypothesis is rejected if is smaller than the nominal significance level α. Parametric bootstrap test. Second, instead of a nonparametric bootstrap, a parametric approach can be applied as well. Algorithmically, model residuals are used to first estimate the variance of the error distribution ( ). Next, m samples of size n are drawn from a normal distribution with zero expectancy and estimated variance. Again, for each bootstrap sample the skewness is estimated. The proportion of bootstrap samples for which exceeds the original skewness estimate can be used as an estimate for the probability of given the assumption of a symmetric error distribution:, where is an indicator function which takes value 1 if is true, and 0 otherwise. If is smaller than a chosen nominal significance level (α) the hypothesis of a 9

10 symmetric error is rejected. Obviously, the proposed approach is not restricted to the normal distribution. Any symmetrical distribution with an expected value of zero could be considered as well. However, the assumption of normally distributed errors is commonly met in the context of linear regression. Thus, we decided to consider this special case. Permutation test. Third, we suggest applying principles of permutation (Fisher, 1935; Neyman, 1923; Pitman, 1937) to assess the direction of effects. Permutation techniques have successfully been used to evaluate the impact of explanatory variables in linear regression settings (Anderson, 2001; Freedman & Lane, 1983; ter Braak, 1992). We propose the following procedure: Under the true model we assume a symmetric error distribution ( ) with an expected value of zero. In other words, for n residuals we would expect n/2 residuals being less than and n/2 being greater than zero. Given the assumption of exchangeability under, the signs of the residuals can be randomly shuffled. First, permutations of the absolute values of residuals are generated, so that n/2 residuals are assigned to the group with negative sign and the remaining n/2 residuals are assigned to the group with positive sign. Second, for each permutation sample step is repeated m times. The proportion of permutations where is estimated. Again, this exceeds the originally observed skewness of the residuals can again be used as an estimate for the probability under the assumption of a symmetrically distributed error. If is smaller than the chosen significance level α, the null hypothesis of symmetry is rejected. Again, is an indicator function which takes value 1 if is true, and 0 otherwise. It is important to note that the proposed test is only asymptotically equivalent to an exact permutation procedure (complete enumeration). For large sample sizes it seems impossible to establish the null distribution using all possible permutations (Dwass, 1957). The current approach of randomly sampling permutations is also known as Monte Carlo sampling (Hope, 1968; Nichols & Holmer, 2001). However, we 10

11 decide to use the term permutation test to avoid confusions with the Monte Carlo simulation experiment described below. Based on these resampling tests we suggest the following steps in deciding about the direction of causation: 1. Decide whether X or Y is the explanatory variable based on theoretical considerations and estimate parameters of the corresponding regression model (e.g., Model 1: ). 2. Examine distributional properties of Model 1-residuals ( ) using the resampling tests (i.e., test whether the null hypothesis of residual symmetry,, can be retained). 3. Estimate the parameters of the competing regression model (Model 2: ) to obtain the corresponding residuals. 4. Examine distributional properties of Model 2-residuals ( ) using the resampling tests (i.e., test whether the null hypothesis of residual symmetry,, can be retained). 5. If can be retained and can be rejected, Y is the response variable and X is the explanatory variable. If can be rejected and can be retained, X is the response variable and Y is the explanatory variable. If both null hypotheses ( and ) are rejected (or retained), no decision concerning the direction of the effect can be made, based on the methodology proposed in this article. To evaluate the performance of the proposed resampling tests for the identification of the regression model of the true data generating process, a Monte Carlo simulation experiment was conducted, which is explained in detail in the following section. 11

12 Methods To assess the performance of the proposed methods, a Monte Carlo simulation was conducted using the R statistical environment (R Core Team, 2013). The following simulation parameters were varied: Skewness of the explanatory variable (, magnitude of correlation (ρ), sample size (n), and type of error distribution. These experimental factors are described in the following paragraphs. Skewness of the explanatory variable. The explanatory variable X followed a gamma distribution with pre-specified shape and scale parameters c and d, respectively. The scale parameter was fixed at d = 1 throughout the simulation study. The shape parameter c, which is related to the skewness of X through (see, for example, Evans, Hastings, & Peacock, 2000) was chosen to obtain the desired skewness values of = 0.5, 1, 2, 3, and 4. Magnitude of correlation. To ensure that the predictor (X) and the response variable (Y) show the desired magnitude of correlation, Y was generated according to Equation (1), i.e., constitutes the true underlying model throughout the study. The slope parameter (b) was obtained by and chosen to generate the desired correlations of ρ = 0, 0.1, (0.1), 0.9. Here, denotes the residual variance, which was set to. Distribution of the residuals. The parametric bootstrap test relies on the assumption of normally distributed residuals. To analyze the behavior of the procedures when this assumption is violated, we considered two cases for the distribution of the residuals. In case A, the residuals followed a standard normal distribution N(0, 1), which is congruent with the distributional assumption of the parametric test. In case B, to mimic a symmetrically distributed error which violates the assumption of normality, residuals were sampled from a 12

13 Laplace distribution with zero mean and unit variance L(0, 1). Laplace distributed variates are expected to show a skewness of 0 and a kurtosis of 6 (Evans, Hastings, & Peacock, 2000). Number of observations. Sample sizes (n) were 50, 100, 150, 200, 250, and 500. For each pair of variables X and Y, two linear regression models were estimated according to Equations (1) and (2). In the first model Y was regressed on X, which corresponds to the true data generating process (i.e., the true model). In the second model X was regressed on Y, which contradicts the true underlying model. Next, the residuals of the competing models ( and ) were evaluated using the parametric bootstrap test, the nonparametric bootstrap test, and the permutation test. The number of resamplings for each test was set to m = 2000 to obtain reliable estimates for the significance of skewness deviations. For each experimental cell of the 5 (skewness of X) 10 (magnitude of correlation) 6 (sample size) 2 (distribution of the residuals) design 1000 repetitions were realized. The rates of false rejections of the true model (i.e., correctly rejecting the false model (i.e., is rejected) and the rates of is rejected) were chosen as descriptive measures for each combination of simulation parameters. Obviously, we expected the first one (the Type I error rate) to be near the chosen nominal significance level (α), which was set at α =.05. The second rate corresponds to the power of the procedure to identify the true regression model. To evaluate the robustness of Type I error rates, a 20 % robustness criterion was chosen. Thus, given α =.05, a test was considered robust if the empirical Type I error rates do not exceed the interval To evaluate the influence of simulation parameters on the test performances, standard ANOVA techniques were applied. Instead of a classical test statistic (such as a t-, or F-value) the proposed tests provide p-values based on the resampling procedures. Thus, logit-transformed p-values (i.e., ) were used as the dependent variable to avoid floor/ceiling effects due to the bounded interval 13

14 of [0,1] 1. The implemented R program is freely available upon request. The source codes for the proposed tests as well as an example of application are given in the appendix. Results Case A: Normally-distributed residuals Type I Error Figure 1 gives the Type I error rates (i.e., rejecting the null hypothesis of residual symmetry of the correctly specified regression model) of the three procedures as a function of the correlation (ρ) and the skewness of X. Overall, the Type I error rates do not systematically vary by magnitude of correlations and levels of skewness. The permutation test tends to suggest overly liberal decisions, for all simulated scenarios. That is, the Type I error rates are constantly above the robustness boundary of 6 %. As expected, the parametric bootstrap test performs best in protecting the nominal significance level. In the majority of cases the Type I error rates of the nonparametric bootstrap test lie also within the selected robustness interval of 4 6 %. Figure 2 shows the effect of sample size on empirical Type I error rates of the three tests. Non-robustness of the permutation test is even more pronounced for small sample sizes. Type I error rates of the bootstrap procedures lie within the robustness interval and, again, the parametric bootstrap test outperforms the nonparametric bootstrap test. 1 We also performed logistic regressions using the statistical decision of a test (i.e., a binary indicator whether the null hypothesis is accepted or rejected) as the dependent variable. No noteworthy differences resulted between the results of the logistic regression and the ANOVA models. Results of the logistic regression can be obtained from the authors upon request. 14

15 Figure 1: Type I error rates for as function of skewness and correlation (simulating normally distributed residuals). Figure 2: Type I error rates for as function sample size (simulating normally distributed residuals). 15

16 To analyze the sensitivity of the tests to the simulation factors we employed ANOVAs using the logit-transformed p-values of each test as the dependent variable. The skewness of X ( ), magnitude of correlation (ρ), and sample size (n) were selected as independent variables. According to the simulation setup each of the = 300 experimental cells contained 1000 observations. Because of the large number of observations in each cell, we will focus on partial η² values instead of p-values. Details of the ANOVA model are given in Table 1. Overall, R² values as a measure of model fit are close to zero for all ANOVAs. In addition, partial η² estimates suggest that the experimental factors of the simulation do not affect the distribution of logit values and, thus, explain close to nothing of the variation in logits (all partial η² 0.001). The null hypothesis of the ANOVA states that observations across groups are randomly sampled values of the same underlying population, which is reflected in the equality of means. The current results suggest that average logit values of each test do not systematically vary across experimental conditions. In other words, the test statistics are not affected by the simulation factors. For the correctly specified regression model we conclude that the tests only reject the null hypothesis of residual symmetry by chance according to the nominal significance level, and that regardless of sample sizes, skewness levels of X, and magnitude of correlation. 16

17 Table 1: ANOVA results for the Type I error simulation with normally distributed error terms. Source df Typ III Sum of Squares Mean Squares F-value p-value partial η² parametric bootstrap (R² = ) Correlation Skewness Correlation x Skewness Sample Size Correlation x Sample Size Skewness x Sample Size Correlation x Sample Size x Skewness nonparametric bootstrap (R² = ) Correlation Skewness Correlation x Skewness Sample Size Correlation x Sample Size Skewness x Sample Size Correlation x Sample Size x Skewness permutation (R² = ) Correlation Skewness Correlation x Skewness Sample Size Correlation x Sample Size Skewness x Sample Size Correlation x Sample Size x Skewness Power Next, we ask questions concerning the power of the proposed procedures (i.e., the rejection rates of the null hypothesis of residual symmetry of the mis-specified model). Figure 3 summarizes the power properties of the parametric bootstrap, the nonparametric bootstrap, and the permutation test. Overall, the power of all tests increases with the skewness of X and the number of observations (n). In addition, for a fixed level of skewness the power of the tests decreases with the magnitude of correlation between X and Y (Figure 4 17

18 shows this effect more explicitly). For highly skewed explanatory variables and highly correlated variables the parametric bootstrap test outperforms the nonparametric bootstrap and the permutation test. For more symmetric explanatory variables, the three significance tests turn out to be equally powerful. Again, to further explore the sensitivity of the three tests, ANOVAs were performed using the logit-transformed p-values of the corresponding significance tests as dependent variable. Results are summarized in Table 2. Overall, model fit estimates varied from R² = 0.73 to 0.77 depending on the significance test. The largest effects can be observed for the magnitude of correlation (partial η² values range from 0.59 to 0.63) and the level of skewness of X (all partial η² values > 0.4). Partial effect size estimates for sample size varied between 0.17 and Average logit values decrease with the correlation between the variables (ρ) and the number of observations (n). Due to the relation, smaller logit values represent smaller resampling p-values and, thus, more power. Reversely, average logit values increase with the magnitude of correlation which results in higher resampling p- values and less power. In addition, the two-way interactions correlation skewness, correlation sample size, and skewness sample size are statistically meaningful (partial η² range from 0.02 to 0.12). For a fixed level of correlation average logit values decrease with the skewness of X (i.e., the power increases). However, with increasing magnitude of correlation the average logit values increase, i.e., an increasing magnitude of correlation decreases the power of the tests. Similarly, average logit values decrease with increasing sample size for a fixed level of ρ. In other words, all tests are more powerful for larger samples. However, for a fixed number of observations the average logit values again increase as a function of ρ (i.e., the power of the tests reduces for highly correlated variables). The meaningful two-way interaction skewness sample size suggests that the systematic decrease in average logits is even more pronounced for large sample sizes. 18

19 Figure 3: Observed power for as function of sample size, correlation, skewness (simulating normally distributed residuals). Circles represent the nonparametric bootstrap, triangles the parametric bootstrap, and squares the permutation test. 19

20 Figure 4: Observed power for as a function of correlation (simulating normally distributed residuals). Finally, ANOVAs reveal significant three-way interactions ( correlation sample size skewness ) for all significance tests (partial η² = ). For a given fixed level of skewness the power of each test decreases with increasing correlation between X and Y. An increase in the skewness of X or an increase in sample size counters this pattern and can keep the power of the test on a sufficient level even for higher correlations (Figure 3). This pattern is about the same for each test and can be expected from the result given in Equation (10). 20

21 Source Table 2: ANOVA results for the power simulation with normally distributed error terms. df Typ III Sum of Squares Mean Squares F-value p-value partial η² parametric bootstrap (R² = ) Correlation < Skewness < Correlation Skewness < Sample Size < Correlation Sample Size < Skewness Sample Size < Correlation Sample Size Skewness < nonparametric bootstrap (R² = ) Correlation < Skewness < Correlation Skewness < Sample Size < Correlation Sample Size < Skewness Sample Size < Correlation Sample Size Skewness < permutation (R² = ) Correlation < Skewness < Correlation Skewness < Sample Size < Correlation Sample Size < Skewness Sample Size < Correlation Sample Size Skewness < Case B: Laplace-distributed residuals Type I Error So far, we focused on cases where the assumption of normality of residuals is fulfilled. The parametric bootstrap test relies on this assumption. Now, we focus on scenarios where residuals are drawn from a symmetric but non-normal distribution (i.e., Laplacedistributed residuals). Here, we expect the nonparametric bootstrap test and the permutation 21

22 test to show better robustness and power properties. Figure 5 gives the Type I error rates (i.e., rejection rates of the null hypothesis of residual symmetry of the correctly specified model) for the three tests as a function of correlation (ρ) and skewness ( ). In a strict sense, all tests fail to meet the robustness criterion of 4 6 %. However, departures from the 6 % boundary are small for the nonparametric bootstrap test and the permutation test. As expected, nonrobustness is more pronounced for the parametric bootstrap test (Type I error rates are > 24 % for all scenarios). Overall, the magnitude of Type I errors does not depend on the skewness of X or the magnitude of correlation between X and Y. Figure 5: Type I error rates for as function of skewness and correlation (simulating Laplace distributed residuals). 22

23 Figure 6 gives the Type I error rates as a function of sample size. The performance of the permutation test improves with increasing sample size and meets the robustness criterion for n > 150. In contrast, non-robustness of the parametric bootstrap test tends to even increase for larger sample sizes. Figure 6: Type I error rates for as function sample size (simulating Laplace distributed residuals). Table 3 gives the ANOVA results again using the logit-transformed p-values as the dependent variable. R² values as well as partial η² estimates were generally close to zero in all models. This again suggests that the factors of the simulation do not affect average logits of the parametric bootstrap, the nonparametric bootstrap, and the permutation test. 23

24 Table 3: ANOVA results for the Type I error simulation with Laplace distributed error terms. Source df Typ III Sum of Squares Mean Squares F-value p-value partial η² parametric bootstrap (R² = ) Correlation Skewness Correlation Skewness Sample Size Correlation Sample Size Skewness Sample Size Correlation Sample Size Skewness nonparametric bootstrap (R² = ) Correlation Skewness Correlation Skewness Sample Size Correlation Sample Size Skewness Sample Size Correlation Sample Size Skewness permutation (R² = ) Correlation Skewness Correlation Skewness Sample Size Correlation Sample Size Skewness Sample Size Correlation Sample Size Skewness Power Finally, we ask whether power functions of the proposed tests are affected by nonnormally distributed residuals. Again, the power of the tests constitutes the rejection rate of the null hypothesis of residual symmetry of the mis-specified model. Figure 7 shows the probability of rejecting the null hypothesis of symmetry of residuals as a function of sample size, correlation, and skewness of the explanatory variable. Results of the Type I error rates suggest that the three tests are unable to protect the nominal significance level according to 24

25 the 20 % robustness criterion. In particular the parametric bootstrap test shows empirical Type I error rates larger than 24 % (instead of 5 %). Thus, the power functions of the three procedures are not comparable in the usual sense. In this case, Zhang and Boos (1994) suggest the comparison of adjusted power estimates. Following this suggestion, we estimated the adjusted nominal significance level (i.e., the critical p-value; ) which corresponds to the 95 th quantile of the empirically observed distribution of resampling p-values for all three tests under the null hypothesis of indistinguishable regression models. Figure 7 shows the power using (instead of 5 %) as the nominal significance level for the considered sample sizes, magnitude of correlation, and skewness of the explanatory variable. Generally, patterns do not differ from those observed in the case of normally distributed residuals. Again, the power increases with the skewness of the explanatory variable and the number of observations. Adjusted power estimates are inversely related to the magnitude of correlation, i.e., the three significance tests lose power with increasing ρ (see also Figure 8). Both, the permutation test and the nonparametric bootstrap test outperform the parametric bootstrap test, as expected. This power superiority is more pronounced for more symmetric explanatory variables. ANOVAs were again performed using the logit-transformed p-values as the dependent variable to further explore the sensitivity of the three tests. Results are summarized in Table 4. Overall, model fit estimates varied from R² = 0.64 to Again, the strongest effects were observed for magnitude of correlation between X and Y (partial η² range from 0.45 to 0.63) and the skewness of the explanatory variable X (all partial η² > 0.35). Effect size estimates for the sample size factor varied from 0.11 to The average logit values decrease with increasing skewness and increasing sample size. Reversely, average logit values increase with the underlying correlation between X and Y. In addition, small effect sizes were observed for the two-way interactions correlation skewness (partial η² = ), correlation sample size (partial η² = ), and skewness sample size (all 25

26 partial η² ~ 0.03). Again, average logits decrease as skewness of X and number of observations increase. However, for a fixed degree of skewness or a fixed number of observations, average logits increase as a function of the magnitude of correlation which means that the tests lose power with increasing correlation. Reversely, the power of tests increases with increasing skewness and sample size. The effect size estimates for the threeway interaction correlation sample size skewness varied from 0.05 to Again, larger sample sizes together with highly skewed explanatory variables restore the power even for highly correlated data. 26

27 Figure 7: Adjusted power for as function of sample size, correlation, and skewness (simulating Laplace distributed residuals). Circles represent the nonparametric bootstrap, triangles the parametric bootstrap, and squares the permutation test. 27

28 Figure 8: Adjusted power for as a function of correlation (simulating Laplace distributed residuals). Comparison of statistical decisions Applying the proposed significance tests in practice involves statistical decisions concerning the skewness estimate of the residuals of one model (e.g., Y is regressed on X) and statistical inference on the skewness estimate of the residuals of the competing model (i.e., X is regressed on Y). Throughout the study, the first model constitutes the true data generating process. In this case we would expect 1) that the null hypothesis is accepted and 2) that the corresponding null hypothesis for the mis-specified model is rejected. Thus, in the following section we are interested in the behavior of the tests considering the combined statistical decisions for and. In particular, we focus on the 28

29 power of the tests in terms of model selection (i.e., retaining and simultaneously rejecting ) and we focus on those outcomes that do not enable researchers to draw conclusions concerning the direction of the observed effect, i.e., both null hypotheses are retained or both null hypotheses rejected. Factor Table 4: ANOVA results for the power simulation with Laplace distributed error terms. df Typ III Sum of Squares Mean Squares F-value p-value partial η² parametric bootstrap (R² = ) Correlation < Skewness < Correlation Skewness < Sample Size < Correlation Sample Size < Skewness Sample Size < Correlation Sample Size Skewness < nonparametric bootstrap (R² = ) Correlation < Skewness < Correlation Skewness < Sample Size < Correlation Sample Size < Skewness Sample Size < Correlation Sample Size Skewness < permutation (R² = ) Correlation < Skewness < Correlation Skewness < Sample Size < Correlation Sample Size < Skewness Sample Size < Correlation Sample Size Skewness < Figure 9 shows the power of the three tests in terms of identifying the true model as a function of sample size, magnitude of correlation, and skewness of X simulating normally 29

30 distributed residuals. Each line represents the proportion of correct model selections based on the combined inference concerning and. The power to identify the true model again increases with sample size and skewness of X. Again, power decreases with the magnitude of the correlation. However, high correlations together with high skewness and larger sample sizes conserve the power of the procedure. Generally, the parametric bootstrap test shows a power advantage for highly skewed predictors and highly correlated variables. For example, consider the case of n = 200, ρ = 0.8, and = 4: Here, the parametric bootstrap tests is able to select the correct model in 93.2 % of the simulated samples, whereas the nonparametric bootstrap test and the permutation test identify the correct model in 83.3 % and 80.4 % of the simulated samples, respectively. Figure 10 shows the (unadjusted) power functions in terms of model selection simulating Laplace distributed residuals. Overall, power increases with sample sizes and skewness of X and decreases with magnitude of correlation. In the majority of the simulated scenarios, the parametric bootstrap test is less powerful than the nonparametric bootstrap and permutation test. A reversed picture can be observed for highly correlated variables (see ρ > 0.7). Here, the parametric bootstrap test seems to be more powerful than the two competitors. However, this power advantage must be interpreted with caution, for two reasons: First, Figures 5 and 6 reveal that the Type I error rates of the parametric test are far above the nominal significance level. In other words, statistical decisions concerning residual symmetry of the correctly specified regression model are too liberal, which also biases statistical decisions in terms of model selection. Second, the comparison of the unadjusted power functions of the three tests for for Laplace distributed residuals suggests that the power loss of the parametric test is less pronounced for highly correlated variables. This scenario also conserves the power in terms of combined inference. In line with suggestions of 30

31 Zhang and Boos (1994), unadjusted power functions for are not shown here (these results can be obtained from the first author upon request). Next, we focus on those outcomes where no distinct decision about causation can be made. Tables 5 and 6 gives the relative frequencies for those cases where both significance tests lead to a statistically significant result. Findings are presented for the sample sizes n = 50, 150, 250, and 500 and ρ = 0, 0.2, 0.4, 0.6, and 0.8. Results of the remaining experimental conditions are very similar to the presented findings. For normally distributed residuals (see Table 5), the percentages of rejecting both null hypotheses for all simulated samples are 3.62 %, 4.00 %, and 4.87 % for the parametric bootstrap, the nonparametric bootstrap, and the permutation test, respectively. In general, these percentages decrease with increasing correlation between X and Y and increase with sample size n and skewness of X. Table 6 shows the results for Laplace-distributed residuals. Overall, the percentages of combined rejection of zero skewness are 4.55 % and 3.88 % for the nonparametric bootstrap test and the permutation test, respectively. In contrast, due to the heavily inflated Type I error rates of the parametric bootstrap test we observe the high percentage of combined rejections of %. Again, for all tests the percentages of combined rejections decrease with increasing correlation. Reversely, an increase of combined rejections can be observed as a function of the skewness of X. For the parametric bootstrap and the nonparametric bootstrap we also observe an increase of combined rejections for larger sample sizes. In contrast, the percentages of combined rejections of the permutation test slightly decline with increasing sample size. 31

32 Figure 9: Power in terms of model selection (i.e., retaining and rejecting ) as a function of sample size, correlation, and skewness (simulating normally distributed residuals). Circles represent the nonparametric bootstrap, triangles the parametric bootstrap, and squares the permutation test. 32

33 Figure 10: Power in terms of model selection (i.e., retaining and rejecting ) as a function of sample size, correlation, and skewness (simulating Laplace distributed residuals). Circles represent the nonparametric bootstrap, triangles the parametric bootstrap, and squares the permutation test. 33

34 Table 5: Relative frequencies of events where the skewness estimates of both error terms significantly deviate from zero (simulating normally-distributed error terms). Parametric bootstrap test Nonparametric boostrap test Permutation test n: ρ Skewness of X = Skewness of X = Skewness of X = Skewness of X = Skewness of X =

35 Table 6: Relative frequencies of events where the skewness estimates of both error terms significantly deviate from zero (simulating Laplace distributed error terms) Parametric bootstrap test Nonparametric bootstrap test Permutation test n: ρ Skewness of X = Skewness of X = Skewness of X = Skewness of X = Skewness of X = Next, we are interested in the second scenario where no distinct decision about the direction of the effect can be made, i.e., the case in which both significance tests suggest retaining the null hypotheses of zero skewness of residuals. These rates can be interpreted as the combined Type II error rates. Table 7 shows the combined Type II error rates for the case of normally-distributed residuals. Across all generated samples the percentages of a combined Type II error are 20.45%, %, and % for the parametric bootstrap, the nonparametric bootstrap, and the permutation test, respectively (which is in line with the 35

Appendix A (Pornprasertmanit & Little, in press) Mathematical Proof

Appendix A (Pornprasertmanit & Little, in press) Mathematical Proof Appendix A (Pornprasertmanit & Little, in press) Mathematical Proof Definition We begin by defining notations that are needed for later sections. First, we define moment as the mean of a random variable

More information

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018 ` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.

More information

Characteristics of measures of directional dependence - Monte Carlo studies

Characteristics of measures of directional dependence - Monte Carlo studies Characteristics of measures of directional dependence - Monte Carlo studies Alexander von Eye Richard P. DeShon Michigan State University Characteristics of measures of directional dependence - Monte Carlo

More information

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS Melfi Alrasheedi School of Business, King Faisal University, Saudi

More information

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI 88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical

More information

Robust Critical Values for the Jarque-bera Test for Normality

Robust Critical Values for the Jarque-bera Test for Normality Robust Critical Values for the Jarque-bera Test for Normality PANAGIOTIS MANTALOS Jönköping International Business School Jönköping University JIBS Working Papers No. 00-8 ROBUST CRITICAL VALUES FOR THE

More information

Power of t-test for Simple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach

Power of t-test for Simple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach Available Online Publications J. Sci. Res. 4 (3), 609-622 (2012) JOURNAL OF SCIENTIFIC RESEARCH www.banglajol.info/index.php/jsr of t-test for Simple Linear Regression Model with Non-normal Error Distribution:

More information

Empirical tests of directional dependence

Empirical tests of directional dependence Empirical tests of directional dependence Felix Thoemmes, Sarah Moore, & Marina Yamasaki Cornell University Directional dependence Dodge & Rousson (2000, 2001) Attempt at devising a statistical tests that

More information

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 8-26-2016 On Some Test Statistics for Testing the Population Skewness and Kurtosis:

More information

The Two-Sample Independent Sample t Test

The Two-Sample Independent Sample t Test Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 The General Formula The Equal-n Formula 4 5 6 Independence Normality Homogeneity of Variances 7 Non-Normality Unequal

More information

ELEMENTS OF MONTE CARLO SIMULATION

ELEMENTS OF MONTE CARLO SIMULATION APPENDIX B ELEMENTS OF MONTE CARLO SIMULATION B. GENERAL CONCEPT The basic idea of Monte Carlo simulation is to create a series of experimental samples using a random number sequence. According to the

More information

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali Part I Descriptive Statistics 1 Introduction and Framework... 3 1.1 Population, Sample, and Observations... 3 1.2 Variables.... 4 1.2.1 Qualitative and Quantitative Variables.... 5 1.2.2 Discrete and Continuous

More information

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method Meng-Jie Lu 1 / Wei-Hua Zhong 1 / Yu-Xiu Liu 1 / Hua-Zhang Miao 1 / Yong-Chang Li 1 / Mu-Huo Ji 2 Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method Abstract:

More information

On Some Statistics for Testing the Skewness in a Population: An. Empirical Study

On Some Statistics for Testing the Skewness in a Population: An. Empirical Study Available at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 12, Issue 2 (December 2017), pp. 726-752 Applications and Applied Mathematics: An International Journal (AAM) On Some Statistics

More information

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data Statistical Failings that Keep Us All in the Dark Normal and non normal distributions: Why understanding distributions are important when designing experiments and Conflict of Interest Disclosure I have

More information

Window Width Selection for L 2 Adjusted Quantile Regression

Window Width Selection for L 2 Adjusted Quantile Regression Window Width Selection for L 2 Adjusted Quantile Regression Yoonsuh Jung, The Ohio State University Steven N. MacEachern, The Ohio State University Yoonkyung Lee, The Ohio State University Technical Report

More information

Lecture 6: Non Normal Distributions

Lecture 6: Non Normal Distributions Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192 Financial Econometrics Spring 2015 Overview Non-normalities in (standardized) residuals from asset return

More information

Chapter 7. Inferences about Population Variances

Chapter 7. Inferences about Population Variances Chapter 7. Inferences about Population Variances Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from

More information

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is: **BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,

More information

Measuring and managing market risk June 2003

Measuring and managing market risk June 2003 Page 1 of 8 Measuring and managing market risk June 2003 Investment management is largely concerned with risk management. In the management of the Petroleum Fund, considerable emphasis is therefore placed

More information

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii) Contents (ix) Contents Preface... (vii) CHAPTER 1 An Overview of Statistical Applications 1.1 Introduction... 1 1. Probability Functions and Statistics... 1..1 Discrete versus Continuous Functions... 1..

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 4. Cross-Sectional Models and Trading Strategies Steve Yang Stevens Institute of Technology 09/26/2013 Outline 1 Cross-Sectional Methods for Evaluation of Factor

More information

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION XLSTAT makes accessible to anyone a powerful, complete and user-friendly data analysis and statistical solution. Accessibility to

More information

Ideal Bootstrapping and Exact Recombination: Applications to Auction Experiments

Ideal Bootstrapping and Exact Recombination: Applications to Auction Experiments Ideal Bootstrapping and Exact Recombination: Applications to Auction Experiments Carl T. Bergstrom University of Washington, Seattle, WA Theodore C. Bergstrom University of California, Santa Barbara Rodney

More information

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Market Variables and Financial Distress. Giovanni Fernandez Stetson University Market Variables and Financial Distress Giovanni Fernandez Stetson University In this paper, I investigate the predictive ability of market variables in correctly predicting and distinguishing going concern

More information

Simulating the Need of Working Capital for Decision Making in Investments

Simulating the Need of Working Capital for Decision Making in Investments INT J COMPUT COMMUN, ISSN 1841-9836 8(1):87-96, February, 2013. Simulating the Need of Working Capital for Decision Making in Investments M. Nagy, V. Burca, C. Butaci, G. Bologa Mariana Nagy Aurel Vlaicu

More information

Chapter 2 Uncertainty Analysis and Sampling Techniques

Chapter 2 Uncertainty Analysis and Sampling Techniques Chapter 2 Uncertainty Analysis and Sampling Techniques The probabilistic or stochastic modeling (Fig. 2.) iterative loop in the stochastic optimization procedure (Fig..4 in Chap. ) involves:. Specifying

More information

STA 4504/5503 Sample questions for exam True-False questions.

STA 4504/5503 Sample questions for exam True-False questions. STA 4504/5503 Sample questions for exam 2 1. True-False questions. (a) For General Social Survey data on Y = political ideology (categories liberal, moderate, conservative), X 1 = gender (1 = female, 0

More information

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Pivotal subject: distributions of statistics. Foundation linchpin important crucial You need sampling distributions to make inferences:

More information

Value at Risk Ch.12. PAK Study Manual

Value at Risk Ch.12. PAK Study Manual Value at Risk Ch.12 Related Learning Objectives 3a) Apply and construct risk metrics to quantify major types of risk exposure such as market risk, credit risk, liquidity risk, regulatory risk etc., and

More information

ESTIMATION OF MODIFIED MEASURE OF SKEWNESS. Elsayed Ali Habib *

ESTIMATION OF MODIFIED MEASURE OF SKEWNESS. Elsayed Ali Habib * Electronic Journal of Applied Statistical Analysis EJASA, Electron. J. App. Stat. Anal. (2011), Vol. 4, Issue 1, 56 70 e-issn 2070-5948, DOI 10.1285/i20705948v4n1p56 2008 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

UPDATED IAA EDUCATION SYLLABUS

UPDATED IAA EDUCATION SYLLABUS II. UPDATED IAA EDUCATION SYLLABUS A. Supporting Learning Areas 1. STATISTICS Aim: To enable students to apply core statistical techniques to actuarial applications in insurance, pensions and emerging

More information

9. Logit and Probit Models For Dichotomous Data

9. Logit and Probit Models For Dichotomous Data Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar

More information

The mean-variance portfolio choice framework and its generalizations

The mean-variance portfolio choice framework and its generalizations The mean-variance portfolio choice framework and its generalizations Prof. Massimo Guidolin 20135 Theory of Finance, Part I (Sept. October) Fall 2014 Outline and objectives The backward, three-step solution

More information

Does Calendar Time Portfolio Approach Really Lack Power?

Does Calendar Time Portfolio Approach Really Lack Power? International Journal of Business and Management; Vol. 9, No. 9; 2014 ISSN 1833-3850 E-ISSN 1833-8119 Published by Canadian Center of Science and Education Does Calendar Time Portfolio Approach Really

More information

An Improved Skewness Measure

An Improved Skewness Measure An Improved Skewness Measure Richard A. Groeneveld Professor Emeritus, Department of Statistics Iowa State University ragroeneveld@valley.net Glen Meeden School of Statistics University of Minnesota Minneapolis,

More information

Some developments about a new nonparametric test based on Gini s mean difference

Some developments about a new nonparametric test based on Gini s mean difference Some developments about a new nonparametric test based on Gini s mean difference Claudio Giovanni Borroni and Manuela Cazzaro Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali

More information

- International Scientific Journal about Simulation Volume: Issue: 2 Pages: ISSN

- International Scientific Journal about Simulation Volume: Issue: 2 Pages: ISSN Received: 13 June 016 Accepted: 17 July 016 MONTE CARLO SIMULATION FOR ANOVA TU of Košice, Faculty SjF, Institute of Special Technical Sciences, Department of Applied Mathematics and Informatics, Letná

More information

Market Risk Analysis Volume I

Market Risk Analysis Volume I Market Risk Analysis Volume I Quantitative Methods in Finance Carol Alexander John Wiley & Sons, Ltd List of Figures List of Tables List of Examples Foreword Preface to Volume I xiii xvi xvii xix xxiii

More information

Asymptotic Distribution Free Interval Estimation

Asymptotic Distribution Free Interval Estimation D.L. Coffman et al.: ADF Intraclass Correlation 2008 Methodology Hogrefe Coefficient 2008; & Huber Vol. Publishers for 4(1):4 9 ICC Asymptotic Distribution Free Interval Estimation for an Intraclass Correlation

More information

The Effect of Kurtosis on the Cross-Section of Stock Returns

The Effect of Kurtosis on the Cross-Section of Stock Returns Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-2012 The Effect of Kurtosis on the Cross-Section of Stock Returns Abdullah Al Masud Utah State University

More information

Liquidity skewness premium

Liquidity skewness premium Liquidity skewness premium Giho Jeong, Jangkoo Kang, and Kyung Yoon Kwon * Abstract Risk-averse investors may dislike decrease of liquidity rather than increase of liquidity, and thus there can be asymmetric

More information

Approximating the Confidence Intervals for Sharpe Style Weights

Approximating the Confidence Intervals for Sharpe Style Weights Approximating the Confidence Intervals for Sharpe Style Weights Angelo Lobosco and Dan DiBartolomeo Style analysis is a form of constrained regression that uses a weighted combination of market indexes

More information

Keywords coefficient omega, reliability, Likert-type ítems.

Keywords coefficient omega, reliability, Likert-type ítems. ASYMPTOTICALLY DISTRIBUTION FREE (ADF) INTERVAL ESTIMATION OF COEFFICIENT ALPHA IE Working Paper WP06-4 05-1-006 Alberto Maydeu Olivares Donna L. Coffman Instituto de Empresa The Methodology Center Marketing

More information

Introduction to Statistical Data Analysis II

Introduction to Statistical Data Analysis II Introduction to Statistical Data Analysis II JULY 2011 Afsaneh Yazdani Preface Major branches of Statistics: - Descriptive Statistics - Inferential Statistics Preface What is Inferential Statistics? Preface

More information

Bootstrap Inference for Multiple Imputation Under Uncongeniality

Bootstrap Inference for Multiple Imputation Under Uncongeniality Bootstrap Inference for Multiple Imputation Under Uncongeniality Jonathan Bartlett www.thestatsgeek.com www.missingdata.org.uk Department of Mathematical Sciences University of Bath, UK Joint Statistical

More information

The Great Moderation Flattens Fat Tails: Disappearing Leptokurtosis

The Great Moderation Flattens Fat Tails: Disappearing Leptokurtosis The Great Moderation Flattens Fat Tails: Disappearing Leptokurtosis WenShwo Fang Department of Economics Feng Chia University 100 WenHwa Road, Taichung, TAIWAN Stephen M. Miller* College of Business University

More information

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority Chapter 235 Analysis of 2x2 Cross-Over Designs using -ests for Non-Inferiority Introduction his procedure analyzes data from a two-treatment, two-period (2x2) cross-over design where the goal is to demonstrate

More information

Joensuu, Finland, August 20 26, 2006

Joensuu, Finland, August 20 26, 2006 Session Number: 4C Session Title: Improving Estimates from Survey Data Session Organizer(s): Stephen Jenkins, olly Sutherland Session Chair: Stephen Jenkins Paper Prepared for the 9th General Conference

More information

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION 208 CHAPTER 6 DATA ANALYSIS AND INTERPRETATION Sr. No. Content Page No. 6.1 Introduction 212 6.2 Reliability and Normality of Data 212 6.3 Descriptive Analysis 213 6.4 Cross Tabulation 218 6.5 Chi Square

More information

Inferences on Correlation Coefficients of Bivariate Log-normal Distributions

Inferences on Correlation Coefficients of Bivariate Log-normal Distributions Inferences on Correlation Coefficients of Bivariate Log-normal Distributions Guoyi Zhang 1 and Zhongxue Chen 2 Abstract This article considers inference on correlation coefficients of bivariate log-normal

More information

Prediction Market Prices as Martingales: Theory and Analysis. David Klein Statistics 157

Prediction Market Prices as Martingales: Theory and Analysis. David Klein Statistics 157 Prediction Market Prices as Martingales: Theory and Analysis David Klein Statistics 157 Introduction With prediction markets growing in number and in prominence in various domains, the construction of

More information

Testing for the martingale hypothesis in Asian stock prices: a wild bootstrap approach

Testing for the martingale hypothesis in Asian stock prices: a wild bootstrap approach Testing for the martingale hypothesis in Asian stock prices: a wild bootstrap approach Jae H. Kim Department of Econometrics and Business Statistics Monash University, Caulfield East, VIC 3145, Australia

More information

Factors in Implied Volatility Skew in Corn Futures Options

Factors in Implied Volatility Skew in Corn Futures Options 1 Factors in Implied Volatility Skew in Corn Futures Options Weiyu Guo* University of Nebraska Omaha 6001 Dodge Street, Omaha, NE 68182 Phone 402-554-2655 Email: wguo@unomaha.edu and Tie Su University

More information

Asymmetric Price Transmission: A Copula Approach

Asymmetric Price Transmission: A Copula Approach Asymmetric Price Transmission: A Copula Approach Feng Qiu University of Alberta Barry Goodwin North Carolina State University August, 212 Prepared for the AAEA meeting in Seattle Outline Asymmetric price

More information

CFA Level II - LOS Changes

CFA Level II - LOS Changes CFA Level II - LOS Changes 2018-2019 Topic LOS Level II - 2018 (465 LOS) LOS Level II - 2019 (471 LOS) Compared Ethics 1.1.a describe the six components of the Code of Ethics and the seven Standards of

More information

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley. Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1

More information

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation. 1/31 Choice Probabilities Basic Econometrics in Transportation Logit Models Amir Samimi Civil Engineering Department Sharif University of Technology Primary Source: Discrete Choice Methods with Simulation

More information

Web Extension: Continuous Distributions and Estimating Beta with a Calculator

Web Extension: Continuous Distributions and Estimating Beta with a Calculator 19878_02W_p001-008.qxd 3/10/06 9:51 AM Page 1 C H A P T E R 2 Web Extension: Continuous Distributions and Estimating Beta with a Calculator This extension explains continuous probability distributions

More information

Linda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach

Linda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach P1.T4. Valuation & Risk Models Linda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach Bionic Turtle FRM Study Notes Reading 26 By

More information

Key Moments in the Rouwenhorst Method

Key Moments in the Rouwenhorst Method Key Moments in the Rouwenhorst Method Damba Lkhagvasuren Concordia University CIREQ September 14, 2012 Abstract This note characterizes the underlying structure of the autoregressive process generated

More information

Comparing Tests of Multinormality under Sparse Data Conditions - a Monte Carlo Study

Comparing Tests of Multinormality under Sparse Data Conditions - a Monte Carlo Study Comparing Tests of Multinormality under Sparse Data Conditions - a Monte Carlo Study Alexander von Eye 1 Michigan State University 1 The author is indebted to G. Anne Bogat for helpful comments on an earlier

More information

574 Flanders Drive North Woodmere, NY ~ fax

574 Flanders Drive North Woodmere, NY ~ fax DM STAT-1 CONSULTING BRUCE RATNER, PhD 574 Flanders Drive North Woodmere, NY 11581 br@dmstat1.com 516.791.3544 ~ fax 516.791.5075 www.dmstat1.com The Missing Statistic in the Decile Table: The Confidence

More information

Quantile Regression due to Skewness. and Outliers

Quantile Regression due to Skewness. and Outliers Applied Mathematical Sciences, Vol. 5, 2011, no. 39, 1947-1951 Quantile Regression due to Skewness and Outliers Neda Jalali and Manoochehr Babanezhad Department of Statistics Faculty of Sciences Golestan

More information

FV N = PV (1+ r) N. FV N = PVe rs * N 2011 ELAN GUIDES 3. The Future Value of a Single Cash Flow. The Present Value of a Single Cash Flow

FV N = PV (1+ r) N. FV N = PVe rs * N 2011 ELAN GUIDES 3. The Future Value of a Single Cash Flow. The Present Value of a Single Cash Flow QUANTITATIVE METHODS The Future Value of a Single Cash Flow FV N = PV (1+ r) N The Present Value of a Single Cash Flow PV = FV (1+ r) N PV Annuity Due = PVOrdinary Annuity (1 + r) FV Annuity Due = FVOrdinary

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

Technical Appendices to Extracting Summary Piles from Sorting Task Data

Technical Appendices to Extracting Summary Piles from Sorting Task Data Technical Appendices to Extracting Summary Piles from Sorting Task Data Simon J. Blanchard McDonough School of Business, Georgetown University, Washington, DC 20057, USA sjb247@georgetown.edu Daniel Aloise

More information

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr. Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics and Probabilities JProf. Dr. Claudia Wagner Data Science Open Position @GESIS Student Assistant Job in Data

More information

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL Isariya Suttakulpiboon MSc in Risk Management and Insurance Georgia State University, 30303 Atlanta, Georgia Email: suttakul.i@gmail.com,

More information

Data Distributions and Normality

Data Distributions and Normality Data Distributions and Normality Definition (Non)Parametric Parametric statistics assume that data come from a normal distribution, and make inferences about parameters of that distribution. These statistical

More information

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright Faculty and Institute of Actuaries Claims Reserving Manual v.2 (09/1997) Section D7 [D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright 1. Introduction

More information

CFA Level II - LOS Changes

CFA Level II - LOS Changes CFA Level II - LOS Changes 2017-2018 Ethics Ethics Ethics Ethics Ethics Ethics Ethics Ethics Ethics Topic LOS Level II - 2017 (464 LOS) LOS Level II - 2018 (465 LOS) Compared 1.1.a 1.1.b 1.2.a 1.2.b 1.3.a

More information

Internet Appendix for Asymmetry in Stock Comovements: An Entropy Approach

Internet Appendix for Asymmetry in Stock Comovements: An Entropy Approach Internet Appendix for Asymmetry in Stock Comovements: An Entropy Approach Lei Jiang Tsinghua University Ke Wu Renmin University of China Guofu Zhou Washington University in St. Louis August 2017 Jiang,

More information

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Quantitative Methods for Economics, Finance and Management (A86050 F86050) Quantitative Methods for Economics, Finance and Management (A86050 F86050) Matteo Manera matteo.manera@unimib.it Marzio Galeotti marzio.galeotti@unimi.it 1 This material is taken and adapted from Guy Judge

More information

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Financial Econometrics Notes. Kevin Sheppard University of Oxford Financial Econometrics Notes Kevin Sheppard University of Oxford Monday 15 th January, 2018 2 This version: 22:52, Monday 15 th January, 2018 2018 Kevin Sheppard ii Contents 1 Probability, Random Variables

More information

Context Power analyses for logistic regression models fit to clustered data

Context Power analyses for logistic regression models fit to clustered data . Power Analysis for Logistic Regression Models Fit to Clustered Data: Choosing the Right Rho. CAPS Methods Core Seminar Steve Gregorich May 16, 2014 CAPS Methods Core 1 SGregorich Abstract Context Power

More information

ONLINE APPENDIX (NOT FOR PUBLICATION) Appendix A: Appendix Figures and Tables

ONLINE APPENDIX (NOT FOR PUBLICATION) Appendix A: Appendix Figures and Tables ONLINE APPENDIX (NOT FOR PUBLICATION) Appendix A: Appendix Figures and Tables 34 Figure A.1: First Page of the Standard Layout 35 Figure A.2: Second Page of the Credit Card Statement 36 Figure A.3: First

More information

Analysis of the Influence of the Annualized Rate of Rentability on the Unit Value of the Net Assets of the Private Administered Pension Fund NN

Analysis of the Influence of the Annualized Rate of Rentability on the Unit Value of the Net Assets of the Private Administered Pension Fund NN Year XVIII No. 20/2018 175 Analysis of the Influence of the Annualized Rate of Rentability on the Unit Value of the Net Assets of the Private Administered Pension Fund NN Constantin DURAC 1 1 University

More information

A New Test for Correlation on Bivariate Nonnormal Distributions

A New Test for Correlation on Bivariate Nonnormal Distributions Journal of Modern Applied Statistical Methods Volume 5 Issue Article 8 --06 A New Test for Correlation on Bivariate Nonnormal Distributions Ping Wang Great Basin College, ping.wang@gbcnv.edu Ping Sa University

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

Journal of Economics and Financial Analysis, Vol:1, No:1 (2017) 1-13

Journal of Economics and Financial Analysis, Vol:1, No:1 (2017) 1-13 Journal of Economics and Financial Analysis, Vol:1, No:1 (2017) 1-13 Journal of Economics and Financial Analysis Type: Double Blind Peer Reviewed Scientific Journal Printed ISSN: 2521-6627 Online ISSN:

More information

A New Hybrid Estimation Method for the Generalized Pareto Distribution

A New Hybrid Estimation Method for the Generalized Pareto Distribution A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD

More information

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR Nelson Mark University of Notre Dame Fall 2017 September 11, 2017 Introduction

More information

Presented at the 2012 SCEA/ISPA Joint Annual Conference and Training Workshop -

Presented at the 2012 SCEA/ISPA Joint Annual Conference and Training Workshop - Applying the Pareto Principle to Distribution Assignment in Cost Risk and Uncertainty Analysis James Glenn, Computer Sciences Corporation Christian Smart, Missile Defense Agency Hetal Patel, Missile Defense

More information

Frumkin, 2e Part 5: The Practice of Environmental Health. Chapter 29: Risk Assessment

Frumkin, 2e Part 5: The Practice of Environmental Health. Chapter 29: Risk Assessment Frumkin, 2e Part 5: The Practice of Environmental Health Chapter 29: Risk Assessment Risk Assessment Risk assessment is the process of identifying and evaluating adverse events that could occur in defined

More information

Linear Regression with One Regressor

Linear Regression with One Regressor Linear Regression with One Regressor Michael Ash Lecture 9 Linear Regression with One Regressor Review of Last Time 1. The Linear Regression Model The relationship between independent X and dependent Y

More information

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,

More information

On the Distribution and Its Properties of the Sum of a Normal and a Doubly Truncated Normal

On the Distribution and Its Properties of the Sum of a Normal and a Doubly Truncated Normal The Korean Communications in Statistics Vol. 13 No. 2, 2006, pp. 255-266 On the Distribution and Its Properties of the Sum of a Normal and a Doubly Truncated Normal Hea-Jung Kim 1) Abstract This paper

More information

CABARRUS COUNTY 2008 APPRAISAL MANUAL

CABARRUS COUNTY 2008 APPRAISAL MANUAL STATISTICS AND THE APPRAISAL PROCESS PREFACE Like many of the technical aspects of appraising, such as income valuation, you have to work with and use statistics before you can really begin to understand

More information

Asymmetric fan chart a graphical representation of the inflation prediction risk

Asymmetric fan chart a graphical representation of the inflation prediction risk Asymmetric fan chart a graphical representation of the inflation prediction ASYMMETRIC DISTRIBUTION OF THE PREDICTION RISK The uncertainty of a prediction is related to the in the input assumptions for

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis These models are appropriate when the response

More information

TABLE OF CONTENTS - VOLUME 2

TABLE OF CONTENTS - VOLUME 2 TABLE OF CONTENTS - VOLUME 2 CREDIBILITY SECTION 1 - LIMITED FLUCTUATION CREDIBILITY PROBLEM SET 1 SECTION 2 - BAYESIAN ESTIMATION, DISCRETE PRIOR PROBLEM SET 2 SECTION 3 - BAYESIAN CREDIBILITY, DISCRETE

More information

Chapter 18: The Correlational Procedures

Chapter 18: The Correlational Procedures Introduction: In this chapter we are going to tackle about two kinds of relationship, positive relationship and negative relationship. Positive Relationship Let's say we have two values, votes and campaign

More information

Chapter 6 Simple Correlation and

Chapter 6 Simple Correlation and Contents Chapter 1 Introduction to Statistics Meaning of Statistics... 1 Definition of Statistics... 2 Importance and Scope of Statistics... 2 Application of Statistics... 3 Characteristics of Statistics...

More information

Chapter IV. Forecasting Daily and Weekly Stock Returns

Chapter IV. Forecasting Daily and Weekly Stock Returns Forecasting Daily and Weekly Stock Returns An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts -for support rather than for illumination.0 Introduction In the previous chapter,

More information

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to

More information

Non-Inferiority Tests for the Ratio of Two Means in a 2x2 Cross-Over Design

Non-Inferiority Tests for the Ratio of Two Means in a 2x2 Cross-Over Design Chapter 515 Non-Inferiority Tests for the Ratio of Two Means in a x Cross-Over Design Introduction This procedure calculates power and sample size of statistical tests for non-inferiority tests from a

More information

CFA Level I - LOS Changes

CFA Level I - LOS Changes CFA Level I - LOS Changes 2018-2019 Topic LOS Level I - 2018 (529 LOS) LOS Level I - 2019 (525 LOS) Compared Ethics 1.1.a explain ethics 1.1.a explain ethics Ethics Ethics 1.1.b 1.1.c describe the role

More information

Brooks, Introductory Econometrics for Finance, 3rd Edition

Brooks, Introductory Econometrics for Finance, 3rd Edition P1.T2. Quantitative Analysis Brooks, Introductory Econometrics for Finance, 3rd Edition Bionic Turtle FRM Study Notes Sample By David Harper, CFA FRM CIPM and Deepa Raju www.bionicturtle.com Chris Brooks,

More information

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M. adjustment coefficient, 272 and Cramér Lundberg approximation, 302 existence, 279 and Lundberg s inequality, 272 numerical methods for, 303 properties, 272 and reinsurance (case study), 348 statistical

More information