Technology Support Center Issue

Size: px
Start display at page:

Download "Technology Support Center Issue"

Transcription

1 United States Office of Office of Solid EPA/600/R-02/084 Environmental Protection Research and Waste and October 2002 Agency Development Emergency Response Technology Support Center Issue Estimation of the Exposure Point Concentration Term Using a Gamma Distribution Anita Singh 1, Ashok K. Singh 2, and Ross J. Iaci 3 The Technology Support Projects, Technology Support Center (TSC) for Monitoring and Site Characterization was established in 1987 as a result of an agreement between the Office of Research and Development (ORD), the Office of Solid Waste and Emergency Response (OSWER) and all ten Regional Offices. The objectives of the Technology Support Project and the TSC were to make available and provide ORD's state-of-the-science contaminant characterization technologies and expertise to Regional staff, facilitate the evaluation and application of site characterization technologies at Superfund and RCRA sites, and to improve communications between Regions and ORD Laboratories. The TSC identified a need to provide federal, state, and private environmental scientists working on hazardous waste sites with a technical issue paper that identifies data assessment applications that can be implemented to better define and identify the distribution of hazardous waste site contaminants. The examples given in this Issue paper and the recommendations provided were the result of numerous data assessment approaches performed by the TSC at hazardous waste sites. This paper was prepared by Anita Singh, Ashok K. Singh, and Ross J. Iaci. Support for this project was provided by the EPA National Exposure Research Laboratory's Environmental Sciences Division with the assistance of the Superfund Technical Support Projects Technology Support Center for Monitoring and Site Characterization, OSWER s Technology Innovation Office, the U.S. DOE Idaho National Engineering and Environmental Laboratory, and the Associated Western Universities Faculty Fellowship Program. For further information, contact Christopher Sibert, Technology Support Center Director, at (702) , Anita Singh at (702) , or A. K. Singh at (702) In Superfund and RCRA projects of the U.S. EPA, cleanup, exposure, and risk assessment decisions are often made based upon the mean concentrations of the contaminants of potential concern. A 95% upper confidence limit (UCL) of the population mean is used to estimate the exposure point concentration (EPC) term (EPA, 1992), to determine the attainment of cleanup standards (EPA, 1989), to estimate background level contaminant concentrations, or to compare the soil concentrations with the site specific soil screening levels (EPA, 1996). It is, therefore, important to compute an accurate and stable 95% UCL of the population mean from the available data. The formula for computing a UCL depends upon the data distribution. Typically, environmental data are positively skewed, and 1 Lockheed Martin Environmental Services, 1050 East Flamingo, Ste. E120, Las Vegas, NV Department of Mathematics, University of Nevada, Las Vegas, NV Department of Statistics, University of Georgia, Athens, GA Technology Support Center for Monitoring and Site Characterization, National Exposure Research Laboratory Environmental Sciences Division Las Vegas, NV Technology Innovation Office Office of Solid Waste and Emergency Response, U.S. EPA, Washington, D.C. Walter W. Kovalick, Jr., Ph.D., Director Printed on Recycled Paper 289CMB02.RPT Rev. 3/6/03

2 a lognormal distribution (EPA, 1992) is often used to model such data distributions. The H-statistic (Land, 1971) based upper confidence limit of the mean (denoted henceforth as H-UCL) is used in these applications. However, recent research in this area (Hardin and Gilbert, 1992; Singh, et al., 1997, 1999; and Schultz and Griffin, 1999) suggest that this may not be an appropriate choice. It is observed that for large values of standard deviation (e.g., exceeding ) of the logtransformed data, the use of H-statistic leads to unreasonably large, unstable, and impractical UCL values. This is especially true for sample sets of smaller sizes (e.g., n < 20-25). The H-UCL is also very sensitive to a few low or high values. For example, the addition of a sample below detection limit can cause the H-UCL to increase by a large amount. Realizing that the use of H-statistic can result in an unreasonably large UCL, it has been recommended (EPA, 1992) to use the maximum observed value as an estimate of the UCL (EPC term) in cases where the H-UCL exceeds the maximum observed value. Also, when the sample size is 5 or less, the maximum observed concentration is often used as an estimate of the EPC term. However, it is observed that for highly skewed data sets, use of the maximum observed concentration may not provide the specified 95% coverage to the population mean (as shown in Section 5). This is especially true for samples of small size (e.g., 5-10). For larger sample sets/data sets (e.g., n$20), the use of the maximum observed value results in an overestimate of the 95% UCL of population mean. For such highly skewed data sets, use of a gamma distribution based UCL of the mean provides a viable option. A positively skewed data set can quite often be modeled by lognormal as well as gamma distributions. Due to the relative computational ease, however, the lognormal distribution is used to model positively skewed data sets. However, use of a lognormal model for an environmental data set unjustifiably elevates the minimum variance unbiased estimate of the mean and its UCL to levels that may not be applicable in practice. In this paper, we propose the use of a gamma distribution to model positively skewed data sets. The objective of the present work is to above, the test is based on a one-sided UCL of the mean. A one-sided UCL is a statistic such that the study procedures which can be used to compute a stable and accurate UCL of the mean based upon a gamma distribution. Several parametric and non-parametric (e.g., standard bootstrap, bootstrap-t, Hall s bootstrap, Chebyshev inequality) methods of computing a UCL of the unknown population mean, :, have also been considered. Monte Carlo simulation experiments have been performed to compare the performances of these methods. The comparison of the various methods has been evaluated in terms of the coverage (confidence coefficient) probabilities achieved by the various UCLs. Based upon this study, in Section 6, recommendations have been made about the computation of a UCL of the mean for skewed data distributions originating from various environmental applications. 1. Introduction Suppose the Regional Project Manager (RPM) of a Superfund site believes that the mean concentration of the contaminant of potential concern (COPC) exceeds a specified cleanup standard, C s, but the potentially responsible party (PRP) claims that the mean concentration is below C s. In statistical terminology this can be stated in terms of testing of hypotheses. The hypotheses of interest are the null hypothesis that the mean concentration exceeds the cleanup standard, H 0 : : $ C s, versus the alternative hypothesis, H a : : < C s. This formulation of the problem is protective of the environment because it assumes that the area in question is contaminated, and the burden of testing is to show otherwise. In order to perform a test of these hypotheses, a random sample is collected from the site and concentrations of the COPC in these samples are determined. A suitable statistical test is then used to make a decision. A convenient way to perform a test of hypotheses about an unknown population parameter is first to compute a confidence interval for the parameter, and then reject H 0 if the hypothesized value, in this case the cleanup standard, C s, lies outside of the confidence interval. For the one-sided hypotheses mentioned true population mean, :, is less than the UCL with a prescribed probability or level of confidence, say 2

3 (1! "). For example, if the UCL is a 95% onesided upper confidence limit, then : < UCL with 95% confidence (or with 0.95 probability), and the set of all real numbers less than UCL forms a 95% upper one-sided confidence interval. The corresponding statistical test will reject H 0 (i.e., declare the site clean) if UCL < C s, and the significance level of this test, or false positive error rate, is ". This follows because if the site is contaminated (i.e., : $ C s ), then the probability of declaring it clean is the probability that UCL < C s, which is at most ". Testing of these hypotheses and computation of a UCL of the mean depends upon the population distribution of the COPC concentrations. Several procedures are available to compute a UCL of the mean of a normal or a lognormal distribution in the literature of environmental statistics (i.e., Singh, Singh, and Engelhardt, 1997, 1999; Schultz and Griffin, 1999). In this paper, we focus our effort on the inference procedures for an unknown population mean based upon a gamma distribution. The objective here is to study procedures that can be used to compute an accurate and stable UCL of the mean. Several parametric (Johnson, 1978; Chen, 1995; and, Grice and Bain, 1980) and nonparametric (e.g., standard bootstrap, bootstrap-t (Efron, 1982, Hall, 1988), Hall s bootstrap (Hall, 1992), Chebyshev inequality) methods of computing a UCL of population mean, :, of a skewed distribution have also been considered. The comparison of the various methods has been performed in terms of the coverage (confidence coefficient) probabilities provided by the various 95% UCLs. Monte Carlo simulation experiments have been performed to compare the performances of these methods. Based upon this study, recommendations have been made about the computation of a UCL of the mean for skewed data distributions originating from various environmental applications. Section 2 has a brief description of the gamma distribution and a discussion of goodness-of-fit tests for the gamma distribution. Section 2 also describes estimation of gamma parameters and the computation of the UCL of mean based upon a gamma distribution. Section 3 describes the various other methods which can be used to compute a UCL of population mean. Section 4 has some examples illustrating the procedures used. Section 5 discusses the Monte Carlo experiments used to illustrate these methods and results. Section 6 consists of our recommendations for dealing with heavily skewed data sets. 2. The Gamma Distribution A continuous random variable, X (e.g., COPC concentration), is said to follow a two-parameter gamma distribution, G(k,2) with parameters k>0 and 2>0, if its probability density function is given by the following equation: (1) and zero otherwise. The parameter k is the shape parameter, and 2 is the scale parameter (the location parameter is set to zero). Plots of the gamma distribution, G(k,2) for varying choices of the shape parameter, k, and the scale parameter, 2, are shown in Figures 1-4. These figures have been generated using the statistical software package, MINITAB. The mean, variance, and skewness of a gamma distribution, G(k,2) are given as follows: Mean = : = k2. (2) Variance = F 2 = k2 2. (3) Skewness = 2/qk. (4) From equation (4), it is noted that skewness increases as the shape parameter k decreases. Figures 1 and 2 have the graphs of highly skewed distributions. As k increases, skewness decreases, and consequently a gamma distribution starts approaching a normal distribution for larger values of k (e.g., k $10), as can be seen in Figures 3 and 4. Thus for larger values of k, the UCL based upon a gamma distribution and a UCL based upon a normal distribution are in close agreement. From Figures 1-4, it can also be seen that the scale parameter, 2, simply affects the scale of the distribution and has no effect on the shape of the gamma distribution. In practice, a highly skewed data set can be fitted by both lognormal and gamma distributions. However, the difference between the UCLs obtained using the two 3

4 distributions can be enormous. This is especially true when the shape parameter is small (e.g., k < 1). This is illustrated in examples 2-5 given in Section 4. Figure 1. Graphs of the gamma distributions G(0.1, 1), G(0.2, 1), and G(0.5, 1). Figure 2. Graphs of the gamma distributions G(0.1, 50), G(0.2, 50), and G(0.5, 50). 4

5 Figure 3. Graphs of the gamma distributions G(2, 1), G(4, 1), and G(10, 1). Figure 4. Graphs of the gamma distributions G(2, 50), G(4, 50), and G(10, 50). 2.1 Goodness-of-Fit Tests for Gamma Distribution Since the goodness-of-fit tests for gamma distributions are not readily available, a brief description of those tests is given here. Several tests based upon empirical distribution functions (EDF) exist in the statistical literature, and can be used to test for a gamma distribution. These tests include Kolmogorov-Smirnov, D-test statistic, Anderson-Darling, A 2 -test statistic, and Cramervon Mises test statistics, W 2 and U 2 (e.g., see D Agostino and Stephens (1986), page 101). The exact critical values of these statistics are not available; this is especially true when the shape 5

6 parameter, k, of the gamma distribution is less than 1. Some asymptotic upper-tail critical values of the test statistics, W 2, A 2, and U 2 are given in D Agostino and Stephens (1986) for values of the shape parameter, k$1 (pages ). Schneider (1978) also studied the goodness-of-fit tests for gamma distribution. He derived the critical values of Kolmogorov-Smirnov, D-test statistic for selected values of the shape parameter, k, and the sample size for the gamma distribution with unknown parameters. All of these tests are righttailed. This means that if a computed test-statistic exceeds its respective "100% critical value, the null hypothesis of gamma distribution will be rejected at " level of significance. Most of the commercially available software packages such as SAS and S-PLUS do not provide the goodness-of-fit tests for a gamma distribution. The software ExpertFit (developed by Law & Associates, Inc., 2001) performs a goodness-of-fit test for gamma distribution using the Anderson- Darling test statistic, A 2 and Kolmogorov-Smirnov test statistic. Due to the unavailability of the exact critical values of the general test statistics, the software ExpertFit (Law and Kelton (2000)) uses approximate critical values of the test statistic under the assumption that all parameters (e.g., shape and scale) of the distribution are known, that is the distribution is completely specified as given in Stephens (1970). Those critical values are the generic critical values for all completely specified distributions. ExpertFit uses these generic critical values to test for a gamma distribution. These critical values are also given on page 105 of D Agostino and Stephens (1986). The authors of this article also developed a program, GamGood (2002), to test for a gamma distribution. This program computes the various goodness-of-fit test statistics using the formulae as given on page 101 of D Agostino and Stephens (1986). In this paper, we also use the smoothed percentage points of the Kolmogorov-Smirnov (K- S), D-test statistic as computed by Schneider and Clickner (1976), Schneider (1978) to test for a gamma distribution. An illustration of the goodness-of-fit test for a gamma distribution has been discussed in Example 1. Example 1 The following data set of size 20 is given by Grice and Bain (1980): 152, 152, 115, 109, 137, 88, 94, 77, 160, 165, 125, 40, 128, 123, 136, 101, 62, 153, 83, and 69. None of the parameters of the underlying distribution are known. The various goodness-of-fit test statistics are given by A 2 = , W 2 = , U 2 = , and D = The estimated shape parameter, k, for this data set is (see Example 1, to be continued). For a shape parameter of 7.513, the asymptotic 5% critical values (Table 4-21, page 155, D Agostino and Stephens, 1986) of these statistics are: A 2 = 0.755, W 2 = 0.127, and U 2 = 0.117, and the critical value of the K-S statistic, is D = (Table 7 of Schneider, 1978). Since all of the test-statistics are less than their respective critical values, it is concluded that there is insufficient evidence to conclude at the 0.05 level of significance that the data do not follow a gamma distribution. 2.2 Estimation of Parameters of the Gamma Distribution Next, we consider the estimation of the parameters of a gamma distribution. The population mean and variance of a gamma distribution, G(k,2), are functions of both parameters, k and 2. In order to estimate the mean, one has to obtain estimates of k and 2. Computation of the maximum likelihood estimate (MLE) of k is quite complex and requires the computation of Digamma and Trigamma functions (Choi and Wette, 1969). Several authors (Choi and Wette, 1969, Bowman and Shenton, 1988, Johnson, Kotz, and Balakrishnan, 1994) have studied the estimation of shape and scale parameters of a gamma distribution. The maximum likelihood estimation procedure to estimate shape and scale parameters of a gamma distribution is described below. Let x 1,x 2,...,x n be a random sample (of COPC concentrations) of size n from a gamma distribution, G(k,2), with unknown shape and scale parameters k and 2, respectively. The log likelihood function is given as follows: (5) 6

7 To find the MLEs of k and 2, which are $k and $ θ, respectively, we differentiate the log likelihood function as given in (5) with respect to k and 2, and set the derivatives to zero. This results in the following two equations: (6) (7) Raphson (Faires and Burden, 1993) method leading to the following iterative equation: The iterative process stops when $k starts to converge. In practice, convergence is typically achieved in fewer than 10 iterations. In equation (9): (9) Solving equation (7) for $ θ and substituting the result in equation (6), we get the following equation: There does not exist a closed form solution of equation (8). This equation needs to be solved numerically for $k, which requires the computation of digamma and trigamma functions. This is quite easy to do using a personal computer. An estimate of k can be computed iteratively by using the Newton- (8) where Q (k) is the digamma function and QN (k) is the trigamma function. In order to obtain the MLEs of k and 2, one needs to compute the digamma and trigamma functions. Good approximate values for these two functions (Choi and Wette, 1969) can be obtained using the following approximations. For k$8, these functions are approximated by: (10) and (11) For k < 8, one can use the following recurrence relation to compute these functions: iteration from the following formula: (12) (13) The iterative process requires an initial estimate of k. A good starting value for k in this iterative process is given by k 0 = 1/(2M). Thom (1968) suggests the following approximation as an estimate of k: (14) Bowman and Shenton (1988) suggested using $k as given by equation (14) to be a starting value of k for an iterative procedure, calculating at the l th 16 (15) Both equations (9) and (15) have been used to compute the MLE of k. It is observed that the estimate, $k based upon Newton-Raphson method as given by equation (9) is in close agreement with that obtained using equation (15) with Thom s approximation as an initial estimate. Choi and Wette (1969) further concluded that the MLE of k, $k, is biased high. A bias corrected (Johnson, et al., 1994) estimate of k is given by the following equation: (16) In (16), $k is the MLE of k obtained using either (9) or (15). Substitution of equation (16) in 7

8 equation (7) yields an estimate of the scale parameter, 2 given as follows: (17) Next we provide an example illustrating the computations of the MLEs of k and 2. Consider the data set of Example 1. The sample mean, x, is The MLEs of the two parameters, k and 2, are obtained iteratively using the Newton-Raphson method (equation 9), and Bowman and Shenton s proposal as given by equation (15). The two sets of estimates are in agreement and are given by $k = 8.799, and $ θ = The corresponding bias-corrected estimates of k and 2, as given by equations (16) * * and (17) are $k = and $ θ = Note that the bias-corrected MLE of the shape parameter, k = , which is quite high; consequently, the skewness of this data set is mild and its MLE = 0.73 (from equation (4)). Goodness-of-fit tests performed on this data set suggest that the data cannot reject the hypothesis that the data are normal or that they are lognormal. 2.3 Computation of UCL of the Mean of a Gamma, G(k,2) Distribution In the statistical literature, even though procedures exist to compute a UCL of the mean of a gamma distribution (Grice and Bain, 1980, Wong, 1993), those procedures have not become popular due to their computational complexity. Those approximate and adjusted procedures depend upon the Chi-square distribution and an estimate of the shape parameter, k. As seen above, computation of a MLE of k is quite involved, and this works as a deterrent to the use of a gamma distribution-based UCL of the mean. However, the computation of a gamma UCL currently should not be a problem due to easy availability of personal computers. Given a random sample, x 1,x 2,...,x n of size n from a gamma, G(k,2) distribution, it can be shown that 2 nx / θ follows a Chi-square distribution,, with 2nk degrees of freedom (df). It is noted that (2 nx ) / θ = 2(X 1 + X For " = 0.05 (confidence coefficient of 0.95), " = 0.1, and " = 0.01, these adjusted probability levels + X n ) / 2. Using a simple transformation of variables, it is seen that each of the random variables, 2X i /2;i:=1,2,...,n follows a chi-square,, distribution. Also those chi-square random variables are independently distributed. Since the sum of the independently distributed chi-square random variables also follows a chi-square distribution, it is concluded that (2 nx ) / θ follows a chi-square, distribution with 2nk degrees-of-freedom. When the shape parameter, k, is known, a uniformly most powerful test of size " of the null hypothesis, H 0 : :$C s, against the alternative hypothesis, H 1 : :<C s, is to reject H 0 if x / C ( )/ nk s < χ 2 α 2. 2nk The corresponding (1-") 100% uniformly most accurate UCL for the mean, :, is then given by the probability statement: (18) where (") denotes the " cumulative percentage point of the Chi-square distribution. That is, if Y 2 follows, then PY ( χυ ( α)) = α. In practice, k is not known and needs to be estimated from data. A reasonable procedure is to replace k by its bias corrected estimate, $k *, as given by equation (16). This results in the following approximate (1-") 100% UCL of the mean: (19) It should be pointed out that the UCL given in (19) is an approximate UCL and there is no guarantee that the confidence level of (1-") will be achieved by this UCL. However, it does provide a way of computing a UCL of mean of a gamma distribution. Simulation studies conducted in Section 4 suggest that an approximate gamma UCL thus obtained provides the specified coverage (95%) as the shape parameter, k approaches 0.5. Thus when k$0.5, one can use the approximate UCL given by (19). It should be observed that this approximation is good even for smaller (e.g., n=5) sample sizes. Grice and Bain (1980) computed an adjusted probability level, $, which can be used in (19) to achieve the specified confidence level of (1-"). are given below for some values of the sample size n (Table 1). One can use linear interpolation to 8

9 obtain an adjusted $ for values of n not covered in the table. The adjusted (1-") 100% UCL of gamma mean, : = k2 is given by: (20) where $ is given in Table 1 for "=0.05, 0.1, and Note that as the sample size, n, becomes large, the adjusted probability level, $, approaches ". Except for the computation of the MLE of k, equations (19) and (20) provide simple Chisquare-distribution-based UCLs of the mean of a gamma distribution. It should also be noted that the UCLs as given by (19) and (20) only depend upon the estimate of the shape parameter, k, and are independent of the scale parameter, 2, and its estimate. Consequently, as expected, it is observed that coverage probabilities for the mean associated with these UCLs do not depend upon the values of the scale parameter, 2. This is further discussed in Section 4. Table 1. Adjusted Critical Level, $ for Various Values of " and n " = 0.05 " = 0.1 " = 0.01 n probability level, $ probability level, $ probability level, $ It is observed (Figures 5-7) that except for highly skewed (k<0.15) data and samples of small size (e.g., <10), the adjusted gamma UCL given by (20) provides the specified 95% coverage of the population mean. It is also noted that for highly skewed (k<0.15) data sets of small sizes, except for the H-UCL, the coverage probability provided by the adjusted gamma UCL is the highest and is close to the specified level, However, for these highly skewed data sets, the H- statistic results in unacceptably large values of the UCL. This is further illustrated in examples 2-4. For values of k $0.2, the specified coverage of 0.95 is always approximately achieved by the adjusted gamma UCL given by equation (20), as shown in Figures Figure 5. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 0.10, 2 = 50). 9

10 Figure 6. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 0.15, 2 = 50). Figure 7. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 0.20, 2 = 50). Figure 8. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 0.25, 2 = 50). 10

11 Figure 9. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 0.50, 2 = 50). Figure 10. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 1.0, 2 = 50). Figure 11. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 2.0, 2 = 50). 11

12 Figure 12. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 4.0, 2 = 50). Figure 13. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 6.0, 2 = 50). Figure 14. Graphs of coverage probabilities by 95% UCLs of mean of G(k = 10.0, 2 = 50). 12

13 Example 1 (Continued) The data set of size 20 and the associated MLEs of parameters k and 2 are given in Example 1. For n=20 and =0.05, the adjusted probability level, * $ = (Table 1), and the adjusted df, 2nk $ = υ * = The approximate 95% UCL of the mean obtained using equation (19) is given by UCL = , and the adjusted 95% UCL of mean obtained using equation (20) is given by UCL = As noted above, this data set passes both normality as well as lognormality tests. The associated Student s t-statistic based and the H-statistic based UCLs are and , respectively. For this mildly skewed data set, one can use any of these four UCLs. 3. Other UCL Computation Methods Several authors (Johnson, 1978, Kleijnen, Kloppenburg, and Meeuwsen, 1986, Chen, 1995, Sutton, 1993) have developed inference procedures for estimating the means of asymmetrical distributions. Also, several bootstrap procedures (Efron, 1982, Hall, 1988 and 1992, Manly, 1997) have been recommended for the computation of confidence intervals for means of skewed distributions. These are summarized below and are also included in the simulation experiments described in Section 4. Some examples have been included to illustrate these procedures. 3.1 UCL Based Upon Student s t-statistic A (1!") 100% one-sided upper confidence limit for the mean based upon Student s t-statistic is given by the following equation: (21) where t ", n!1 is the upper " th percentile of the Student's t distribution with n!1 degrees of freedom, and the sample variance is given by: specified (1-") 100% coverage for the population mean, :. 3.2 UCL Based Upon Modified Student s t- Statistic for Asymmetric Distributions Johnson (1978) and Sutton (1993) proposed the use of a modified t-statistic for testing the mean of a positively skewed distribution. An adjusted (1-") 100% UCL (Singh, Singh, and Engelhardt, 1999) of the mean, :, based upon modified t-statistic is given as follows: (22) Where, $µ 3 an unbiased moment estimate (Kleijnen, Kloppenburg, and Meeuwsen, 1986) of the third central moment,, is given as follows: µ 3 (23) The simulation study conducted in Section 4 suggests that the UCL based upon the modified-t statistic also fails to provide the specified coverage (95% here) for skewed data sets from gamma distributions. 3.3 UCL of the Mean Based Upon the Adjusted Central Limit Theorem for Skewed Distributions Given a random sample, x 1, x 2,..., x n of size n from a population with finite variance, F 2, and mean, :, the Central Limit Theorem (CLT) states that the asymptotic distribution (as n approaches infinity) of the sample mean, x n, is normally distributed with mean : and variance F 2 /n. An often cited rule of thumb for a minimum sample size satisfying the CLT is n $ 30. However, this is not adequate if the population is highly skewed (Singh, Singh, and Engelhardt, 1999). A refinement of the CLT approach which makes an adjustment for skewness is discussed by Chen (1995). Specifically, the "adjusted CLT" UCL is given by: (24) This UCL should be used when either the data follow a normal distribution, or when the data distribution is only mildly skewed and sample size n is large. For highly skewed data sets, the UCL based upon this method fails to provide the where $k 3, the coefficient of skewness, is given by $ 3 k 3 = $ µ 3 / s x. The simulation study conducted in Section 5 suggests that even for larger samples, the adjustment made in the CLT-UCL method is not effective enough to provide the specified 13

14 (95%) coverage for skewed data sets. As skewness decreases, the coverage provided by the adjusted CLT-UCL approaches 95% for larger sample sizes, as can be seen in Figures UCL of the Mean of a Lognormal Distribution Based Upon Land s Method In practice, a skewed data set can be modeled by both lognormal and gamma distribution. However, due to computational ease, the lognormal distribution is typically used to model such skewed data sets. A (1!")100% UCL for the mean, :, of a lognormal distribution based upon Land s H-statistic (1971) is given as follows: (25) 2 where y and s y are the sample mean and variance of the log-transformed data. Tables of values denoted by H 1! " can be found in Gilbert (1987). From the simulation experiments discussed in Section 4, it is observed that H-statistic based UCL grossly overestimates the 95% UCL and consequently, coverage provided by a H-UCL is always larger than the specified coverage of 95%. In Section 4, examples to illustrate this unreasonable behavior of the H-statistic based UCL are included. The practical merit of a H- UCL is doubtful as it results in unacceptably high UCL values. This is especially true for samples of small size (e.g., <25) with values of s y exceeding This is illustrated in examples UCL of the Mean Based Upon the Chebyshev Inequality Chebyshev inequality can be used to obtain a reasonably conservative but stable estimate of the UCL of the mean. The two-sided Chebyshev Theorem states that given a random variable X with finite mean and standard deviation, : and F, we have: (26) Here, j is a positive real number. This result can be applied with the sample mean, x, to obtain a conservative UCL for the population mean. Specifically, a (1-") 100% UCL of the mean, :, is given by: (27) Of course, this would require the user to know the value of F. The obvious modification would be to replace F with the sample standard deviation, s x, but this is estimated from data, and therefore, the result is no longer guaranteed to be conservative. In general, if : is an unknown mean, $µ is an estimate, and $ σ ( $µ ) is an estimate of the standard error of $µ, then the quantity UCL = $µ $ σ ( $) µ will provide a 95% UCL for :, which should tend to be conservative, but this is not assured. In this article we use equation (27) to compute a 95% UCL of mean based upon Chebyshev inequality. From the Monte-Carlo results discussed in Section 4, it is observed that for highly skewed data sets (with k<0.5), the coverage provided by the Chebyshev UCL is smaller than the specified coverage of This is especially true when the sample size is smaller than 20. As expected, for larger samples sizes, the coverage provided by the Chebyshev UCL is at least 95%. This means that for larger samples, the Chebyshev UCL will result in a higher (but stable) UCL of the gamma, G(k, 2) mean. Bootstrap Procedures General methods for deriving estimates, such as the method of maximum likelihood, often result in estimates that are biased. Bootstrap procedures as discussed by Efron (1982) are nonparametric statistical techniques which can be used to reduce bias of point estimates and construct approximate confidence intervals for parameters such as the population mean. These procedures require no assumptions regarding the statistical distribution (e.g. normal, lognormal, gamma) for the underlying population, and can be applied to a variety of situations no matter how complicated. However, it should be pointed out that a use of a parametric statistical method (depending upon distributional assumptions) when appropriate is more efficient than its nonparametric counterpart. In practice, parametric assumptions are often difficult to justify, especially in environmental applications. In these cases, nonparametric methods provide valuable tools for obtaining reliable estimates of the parameters of interest. Use of these methods has been considered in environmental applications (Singh, Singh, and Engelhardt, 1997, 1999; Schulz and Griffin, 14

15 1999). Some of those methods are described as follows. Let x 1, x 2,..., x n be a random sample of size n from a population with an unknown parameter 2 (e.g., 2 = :) and let $ θ be an estimate of 2 which is a function of all n observations. For example, the parameter 2 could be the mean, and a reasonable choice for the estimate $ θ might be the sample mean x. In the bootstrap procedures, repeated samples of size n are drawn with replacement from the given set of observations. The process is repeated a large number of times (e.g., 1000), and each time an estimate, θ $, of 2 (the mean, here) is computed. The estimates thus obtained are used to compute an estimate of the standard error of $ θ. There exists in the literature of statistics an extensive array of different bootstrap methods for constructing confidence intervals. In this article three of those methods are considered: 1) the standard bootstrap method, and 2) bootstrap - t method (Efron, 1982, Hall, 1988), and 3) Hall s bootstrap method (Hall, 1992, Manly, 1997). 3.6 UCL of the Mean Based Upon the Standard Bootstrap Method Step 1. Let (x i1, x i2,..., x in ) represent the i th sample of size n with replacement from the original data set (x 1, x 2,..., x n ). Compute the sample mean x of the i th i sample. Step 2. Repeat Step 1 independently N times (e.g., ), each time calculating a new estimate. Denote these estimates by x1, x2, x3,..., x N. The bootstrap estimate of the population mean is the arithmetic mean, x B, of the N estimates x i. The bootstrap estimate of the standard error is given by: (28) The general bootstrap estimate, denoted by θ B, is the arithmetic mean of the N estimates. The difference, θ ˆ B θ, provides an estimate of the bias of the estimate, $ θ. The standard bootstrap confidence interval is derived from the following pivotal quantity, t: (29) A (1!") 100% standard bootstrap UCL for 2, which assumes that equation (29) is approximately normal, is given as follows: (30) It is observed that the standard bootstrap method does not adequately adjust for skewness, and the UCL given by equation (30) fails to provide the specified (1-") 100% coverage of the population mean of skewed data distributions. 3.7 UCL of the Mean Based Upon the Bootstrap t Method Another variation of the bootstrap method, called the "bootstrap - t" by Efron (1982) is a nonparametric procedure which uses the bootstrap methodology to estimate quantiles of the t- statistic, given by (29), directly from data (Hall, 1988). In practice, for non-normal populations, the required t-quantiles may not be easily obtained, or may be impossible to derive exactly. In this method, as before in Steps 1 and 2 described above, x is the sample mean computed from the original data, and x i and s x,i are the sample mean and sample standard deviation computed from the ith resampling of the original data. The N quantities t i = q(n) ( xi x)/ sx, i are computed and sorted, yielding ordered quantities t (1) # t (2) # t (N). The estimate of the lower " th quantile of the pivotal quantity (29) is t ",B = t ("N). For example, if N = 1000 bootstrap samples are generated, then the 50th ordered value, t (50), would be the bootstrap estimate of the lower 0.05th quantile of the t-statistic as given by (29). Then a (1-") 100% UCL of mean based upon bootstrap t- method is given as follows: (31) 3.8 UCL of the Mean Based Upon Hall s Bootstrap Method Hall (1992) proposed a bootstrap method which adjusts for bias as well as skewness. In this method that is the sample mean, sample standard deviation, and sample skewness, respectively (as given in Section 3.3 above) are computed from the ith resampling (i=1,2,..., N) of 15

16 the original data. Let x be the sample mean, s x be the sample standard deviation, and $k 3 be the sample skewness computed from the original data. The quantities W i and Q i given as follows are computed for each of the N bootstrap samples, where: The quantities Q i (W i ) given above are arranged in ascending order. For a specified (1-") confidence coefficient, compute the ("N) th ordered value, q " of quantities Q i (W i ). Finally, compute W(q " ) using the inverse function, which is given as follows: (32) Finally, the (1-") 100% UCL of the population mean based upon Hall s bootstrap method (Manly, 1997) is given as follows: (33) It is observed (Section 4) that the coverage probabilities provided by bootstrap - t and Hall s bootstrap methods are in close agreement. For larger samples these two methods approximately provide the specified 95% coverage to the population mean, k2. For smaller sample sizes, the coverage provided by these methods is only slightly lower than the specified level of It is also noted that, for highly skewed (Figures 5-8) data sets (with k#0.25) of small size (e.g., n<10), coverage probability provided by these two methods is higher than the Chebyshev UCL. 4. Examples Several examples illustrating the computation of the various 95% UCLs of the population mean are included in this section. Software, ProUCL (EPA 2002) has been used to compute some of the UCLs values. Gamma UCLs are computed using the program Chi_test (2002). Examples are generated from the gamma distribution and the lognormal distribution, and UCLs are computed using all of the methods discussed in this paper. It is observed that for small data sets, it is not easy to distinguish between a gamma model and a lognormal distribution. It is further noted that use of a gamma distribution results in practical and reliable UCLs of the population mean. Simulation results discussed in Section 5 suggest that the adjusted gamma UCL approximately provides the specified 95% coverage to the population mean for data sets with shape parameter, k, exceeding Simulated Examples from Gamma Distribution Example 2 A data set of size 15 is generated from a gamma, G(0.2, 100), distribution with the true population mean = 20, and skewness = The data are: , , , , , , , , , , , , , , The data set consists of very small values as well as some large values. These types of data sets often occur in environmental applications. The sample mean is Using the Shapiro- Wilk s test, it is concluded that the data also follow a lognormal model. The standard deviation (sd) of log-transformed data is quite large, 5.618; therefore, the H-statistic based UCL of mean becomes unpractically large. The bias-corrected MLEs of k and 2 are and , respectively. The adjusted (using bias-corrected estimate of k) df, υ$ * = For " =0.05, and n=15, the critical probability level, $, to be used is (from Table 1). The UCLs obtained using the various methods are summarized in the following table. 16

17 UCL Computation Method 95% UCL of Mean Approximate gamma UCL (equation (19)) Adjusted gamma UCL (equation (20)) UCL based upon t-statistic (equation (21)) UCL based upon modified t-statistic (equation (22)) UCL based upon adjusted CLT (equation (24)) UCL based upon H-statistic (equation (25)) 5.4E+13 UCL based upon Chebyshev (equation (27)) UCL based upon standard bootstrap (equation (30)) UCL based upon bootstrap - t (equation (31)) Hall s bootstrap UCL (equation (33)) Note that the H-UCL becomes unacceptably large. Since the H-UCL exceeds the maximum observed value of , using the recommendation made in the EPA (1992) RAGS document, one would use that maximum value as an estimate of the EPC term. Simulation results summarized in the next section (Figures 6-7) suggest that for n=15 and an estimate of k = 0.165, the adjusted UCL based upon a gamma model provides the specified 95% coverage to the population mean. Therefore, for this data set, the use of the adjusted gamma UCL of (equation 20) is an appropriate choice for an estimate of the EPC term. The maximum observed value represents an overestimate of the EPC term. Example 3 A data set of size 15 is generated from a gamma distribution with: k=0.5; and 2 =100 with mean, : = k2 = 50, and skewness = The data are: , , 0.33, 1.42, 13.17, , , 158.0, 70.65, 25.05, , 63.65, 62.50, 11.58, Using Shapiro-Wilk s test, it is concluded that these data cannot reject the hypothesis that the data also follow a lognormal distribution with sample mean = The bias-corrected estimates of k and 2 are , and , respectively. The adjusted df, υ * = 2nk$ *, for the Chi-square distribution = As before, for " =0.05, and n=15, the critical probability level, $ = The 95% UCLs of mean obtained using the various methods described above are given below. UCL Computation Method 95% UCL of Mean Approximate gamma UCL (equation (19)) Adjusted gamma UCL (equation (20)) UCL based upon t-statistic (equation (21)) UCL based upon modified t-statistic (equation (22)) UCL based upon adjusted CLT (equation (24)) UCL based upon H-statistic (equation (25)) UCL based upon Chebyshev (equation (27)) UCL based upon standard bootstrap (equation (30)) UCL based upon bootstrap - t (equation (31)) Hall s bootstrap UCL (equation (33)) Again note that the H-UCL is , which is much higher than the UCLs obtained using any of the other methods. Simulation results suggest that, for n=15 and an MLE of k to be , both approximate as well as the adjusted UCLs based upon a gamma model provide the specified 95% coverage to the population mean (Figure 9). Also, note that the Chebyshev UCL is very close to the adjusted gamma UCL. Any of these three methods may be used to compute the UCL of the population mean. 17

18 Example 4 A random sample of size n=10 is generated from a gamma (1,100) distribution with mean 100 and skewness=2. The data are: , , , , , , , , , Also, at 0.05 level of significance, these data cannot reject the hypothesis that the data follow a lognormal distribution. They also pass the Shapiro-Wilk s test for normality. The sample mean is The bias corrected MLEs of k and 2 are and , respectively, and the associated df = For n=10, and " =0.05, the critical probability level, $, to be used (to achieve a confidence coefficient of 0.95) is = The UCLs obtained using the various methods are given as follows. UCL Computation Method 95% UCL of Mean Approximate gamma UCL (equation (19)) Adjusted gamma UCL (equation (20)) UCL based upon t-statistic (equation (21)) UCL based upon modified t-statistic (equation (22)) UCL based upon adjusted CLT (equation (24)) UCL based upon H-statistic (equation (25)) UCL based upon Chebyshev (equation (27)) UCL based upon standard bootstrap (equation (30)) UCL based upon bootstrap - t (equation (31)) Hall s bootstrap UCL (equation (33)) Once again, note that the H-UCL is , which is much higher than the UCLs obtained using any of the other methods. Simulation results summarized in the next section suggest that for, for n=10 and an estimate of k to be (Figures 9-10), both the approximate and adjusted UCLs based upon the gamma model at least provide the specified 95% coverage to the population mean. 95% Chebyshev UCL also provides the specified coverage to population mean. For this combination of skewness and sample size, any of these three methods may be used to compute a 95% UCL of population mean. Example 5 A mildly skewed data set of size 10 was generated from a gamma distribution G(4,100) with mean 400 and skewness =1. The data are , , , , , , , , , The sample mean = Based upon the Shapiro-Wilk s test, at 0.05 level of significance, the data do not reject the hypotheses of normality as well as of lognormality. The sd of the log-transformed data is The bias corrected MLEs of k and 2 are and , respectively. The associated df = For n=10, and " =0.05, the critical probability level, $ (to achieve a confidence coefficient of 0.095), to be used is = The UCLs obtained using the various methods are given below. For this data set, the difference between the H- UCL and other UCLs is small. Simulation results suggest that as the sample size increases, these differences in the UCLs will decrease. From these results (Figures 11-12), it is noted that for a sample of size 10 and an estimate of k=2.55, both the approximate Gamma UCL and adjusted gamma UCL at least provide the specified 95% coverage to the population mean. Any of the two methods can be used to compute a 95% UCL of the mean. The Chebyshev inequality results in an overestimate of the UCL. 18

19 UCL Computation Method 95% UCL of Mean Approximate gamma UCL (equation (19)) Adjusted gamma UCL (equation (20)) UCL based upon t-statistic (equation (21)) UCL based upon modified t-statistic (equation (22)) UCL based upon adjusted CLT (equation (24)) UCL based upon H-statistic (equation (25)) UCL based upon Chebyshev (equation (27)) UCL based upon standard bootstrap (equation (30)) UCL based upon bootstrap - t (equation (31)) Hall s bootstrap UCL (equation (33)) Simulated Examples from Lognormal Distributions Next we consider a couple of small data sets generated from lognormal distributions. It is observed that those data sets also follow gamma models. Example 6 A sample of size n = 15 is generated from the lognormal distribution with parameters : = 5, F = 2; the true mean of this distribution is , the coefficient of variation (CV) is 7.32, and skewness is The generated data are: 47.42, , , , 14.73, 7.67, 73.36, , , , 14.8, 37.32, 24.74, , A goodness-of-fit test showed the data distribution to be non-normal (P < 0.01) and also that the data passes the test of lognormality (P > 0.15). The software packages ExpertFit (2001) and GamGood (2002) were used to test the goodness-of-fit of the gamma distribution. The observed value of the Anderson-Darling test statistic is 1.094, and the approximate critical value for test size 0.05 is 2.492, and hence an approximate gamma distribution can also be used to model the probability distribution of this data set. The Chi-square goodness-of-fit test with four equal intervals led to the same conclusion. The bias-adjusted estimates of shape, k, and scale, 2, of the gamma distribution are and , respectively. The 95% UCLs computed from the various methods are given below. Notice that the H-UCL is more than 5 times higher than the maximum concentration in the sample, and more than 10 times higher than all the other UCLs. All UCLs are larger than the true population mean (1096.6) for this data set. From Figures 8 and 9, it is observed that for an estimate of k=0.321 and n=15, the adjusted gamma UCL = provides the specified 95% coverage to population mean. Student s t Adjusted CLT Modified t CLT Standard Bootstrap Bootstrap t Hall s Bootstrap Chebyshev (Mean, Std) % H-UCL Adjusted Gamma UCL Continuing with this example, suppose that another sample is collected and it turns out to be below the detection limit (DL) of the instrument. Suppose further that DL = 10, and following EPA guidance documents, this value is replaced by DL/2 = 5. One would expect that this additional non-detect observation would result in a reduction of the UCL. The UCLs calculated from this sample of n = 16 observations are given below: Student s t 1884 Adjusted CLT Modified t CLT Standard Bootstrap Bootstrap t Chebyshev H-UCL Gamma UCL

A NEW POINT ESTIMATOR FOR THE MEDIAN OF GAMMA DISTRIBUTION

A NEW POINT ESTIMATOR FOR THE MEDIAN OF GAMMA DISTRIBUTION Banneheka, B.M.S.G., Ekanayake, G.E.M.U.P.D. Viyodaya Journal of Science, 009. Vol 4. pp. 95-03 A NEW POINT ESTIMATOR FOR THE MEDIAN OF GAMMA DISTRIBUTION B.M.S.G. Banneheka Department of Statistics and

More information

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is: **BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,

More information

Homework Problems Stat 479

Homework Problems Stat 479 Chapter 10 91. * A random sample, X1, X2,, Xn, is drawn from a distribution with a mean of 2/3 and a variance of 1/18. ˆ = (X1 + X2 + + Xn)/(n-1) is the estimator of the distribution mean θ. Find MSE(

More information

Chapter 7. Inferences about Population Variances

Chapter 7. Inferences about Population Variances Chapter 7. Inferences about Population Variances Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from

More information

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI 88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical

More information

Robust Critical Values for the Jarque-bera Test for Normality

Robust Critical Values for the Jarque-bera Test for Normality Robust Critical Values for the Jarque-bera Test for Normality PANAGIOTIS MANTALOS Jönköping International Business School Jönköping University JIBS Working Papers No. 00-8 ROBUST CRITICAL VALUES FOR THE

More information

Statistics for Business and Economics

Statistics for Business and Economics Statistics for Business and Economics Chapter 7 Estimation: Single Population Copyright 010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-1 Confidence Intervals Contents of this chapter: Confidence

More information

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018 ` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.

More information

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg :

More information

A New Hybrid Estimation Method for the Generalized Pareto Distribution

A New Hybrid Estimation Method for the Generalized Pareto Distribution A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD

More information

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method Meng-Jie Lu 1 / Wei-Hua Zhong 1 / Yu-Xiu Liu 1 / Hua-Zhang Miao 1 / Yong-Chang Li 1 / Mu-Huo Ji 2 Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method Abstract:

More information

STRESS-STRENGTH RELIABILITY ESTIMATION

STRESS-STRENGTH RELIABILITY ESTIMATION CHAPTER 5 STRESS-STRENGTH RELIABILITY ESTIMATION 5. Introduction There are appliances (every physical component possess an inherent strength) which survive due to their strength. These appliances receive

More information

Abstract. Keywords and phrases: gamma distribution, median, point estimate, maximum likelihood estimate, moment estimate. 1.

Abstract. Keywords and phrases: gamma distribution, median, point estimate, maximum likelihood estimate, moment estimate. 1. Vidyodaya J. of sc: (201J9) Vol. /-1. f'f' 95-/03 A new point estimator for the median of gamma distribution B.M.S. G Banneheka' and GE.M. V.P.D Ekanayake' IDepartment of Statistics and Computer Science,

More information

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions UNIVERSITY OF VICTORIA Midterm June 04 Solutions NAME: STUDENT NUMBER: V00 Course Name & No. Inferential Statistics Economics 46 Section(s) A0 CRN: 375 Instructor: Betty Johnson Duration: hour 50 minutes

More information

The Two-Sample Independent Sample t Test

The Two-Sample Independent Sample t Test Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 The General Formula The Equal-n Formula 4 5 6 Independence Normality Homogeneity of Variances 7 Non-Normality Unequal

More information

12 The Bootstrap and why it works

12 The Bootstrap and why it works 12 he Bootstrap and why it works For a review of many applications of bootstrap see Efron and ibshirani (1994). For the theory behind the bootstrap see the books by Hall (1992), van der Waart (2000), Lahiri

More information

Financial Time Series and Their Characteristics

Financial Time Series and Their Characteristics Financial Time Series and Their Characteristics Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board Summer School in Financial Mathematics Faculty of Mathematics & Physics University of Ljubljana

More information

Homework Problems Stat 479

Homework Problems Stat 479 Chapter 2 1. Model 1 is a uniform distribution from 0 to 100. Determine the table entries for a generalized uniform distribution covering the range from a to b where a < b. 2. Let X be a discrete random

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS Answer any FOUR of the SIX questions.

More information

A Saddlepoint Approximation to Left-Tailed Hypothesis Tests of Variance for Non-normal Populations

A Saddlepoint Approximation to Left-Tailed Hypothesis Tests of Variance for Non-normal Populations UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2016 A Saddlepoint Approximation to Left-Tailed Hypothesis Tests of Variance for Non-normal Populations Tyler L. Grimes University of

More information

Much of what appears here comes from ideas presented in the book:

Much of what appears here comes from ideas presented in the book: Chapter 11 Robust statistical methods Much of what appears here comes from ideas presented in the book: Huber, Peter J. (1981), Robust statistics, John Wiley & Sons (New York; Chichester). There are many

More information

Introduction to Statistical Data Analysis II

Introduction to Statistical Data Analysis II Introduction to Statistical Data Analysis II JULY 2011 Afsaneh Yazdani Preface Major branches of Statistics: - Descriptive Statistics - Inferential Statistics Preface What is Inferential Statistics? Preface

More information

On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations

On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations Khairul Islam 1 * and Tanweer J Shapla 2 1,2 Department of Mathematics and Statistics

More information

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 8-26-2016 On Some Test Statistics for Testing the Population Skewness and Kurtosis:

More information

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate

More information

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage 6 Point Estimation Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Point Estimation Statistical inference: directed toward conclusions about one or more parameters. We will use the generic

More information

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate

More information

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی یادگیري ماشین توزیع هاي نمونه و تخمین نقطه اي پارامترها Sampling Distributions and Point Estimation of Parameter (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی درس هفتم 1 Outline Introduction

More information

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data Statistical Failings that Keep Us All in the Dark Normal and non normal distributions: Why understanding distributions are important when designing experiments and Conflict of Interest Disclosure I have

More information

Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk?

Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk? Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk? Ramon Alemany, Catalina Bolancé and Montserrat Guillén Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter

More information

8.1 Estimation of the Mean and Proportion

8.1 Estimation of the Mean and Proportion 8.1 Estimation of the Mean and Proportion Statistical inference enables us to make judgments about a population on the basis of sample information. The mean, standard deviation, and proportions of a population

More information

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics σ : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics CONTENTS Estimating other parameters besides μ Estimating variance Confidence intervals for σ Hypothesis tests for σ Estimating standard

More information

Frequency Distribution Models 1- Probability Density Function (PDF)

Frequency Distribution Models 1- Probability Density Function (PDF) Models 1- Probability Density Function (PDF) What is a PDF model? A mathematical equation that describes the frequency curve or probability distribution of a data set. Why modeling? It represents and summarizes

More information

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright Faculty and Institute of Actuaries Claims Reserving Manual v.2 (09/1997) Section D7 [D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright 1. Introduction

More information

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD MAJOR POINTS Sampling distribution of the mean revisited Testing hypotheses: sigma known An example Testing hypotheses:

More information

Probability & Statistics

Probability & Statistics Probability & Statistics BITS Pilani K K Birla Goa Campus Dr. Jajati Keshari Sahoo Department of Mathematics Statistics Descriptive statistics Inferential statistics /38 Inferential Statistics 1. Involves:

More information

STAT Chapter 6: Sampling Distributions

STAT Chapter 6: Sampling Distributions STAT 515 -- Chapter 6: Sampling Distributions Definition: Parameter = a number that characterizes a population (example: population mean ) it s typically unknown. Statistic = a number that characterizes

More information

Using Monte Carlo Analysis in Ecological Risk Assessments

Using Monte Carlo Analysis in Ecological Risk Assessments 10/27/00 Page 1 of 15 Using Monte Carlo Analysis in Ecological Risk Assessments Argonne National Laboratory Abstract Monte Carlo analysis is a statistical technique for risk assessors to evaluate the uncertainty

More information

Asymmetric Price Transmission: A Copula Approach

Asymmetric Price Transmission: A Copula Approach Asymmetric Price Transmission: A Copula Approach Feng Qiu University of Alberta Barry Goodwin North Carolina State University August, 212 Prepared for the AAEA meeting in Seattle Outline Asymmetric price

More information

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Exam 2 Spring 2015 Statistics for Applications 4/9/2015 18.443 Exam 2 Spring 2015 Statistics for Applications 4/9/2015 1. True or False (and state why). (a). The significance level of a statistical test is not equal to the probability that the null hypothesis

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi Chapter 4: Commonly Used Distributions Statistics for Engineers and Scientists Fourth Edition William Navidi 2014 by Education. This is proprietary material solely for authorized instructor use. Not authorized

More information

Lecture 2. Probability Distributions Theophanis Tsandilas

Lecture 2. Probability Distributions Theophanis Tsandilas Lecture 2 Probability Distributions Theophanis Tsandilas Comment on measures of dispersion Why do common measures of dispersion (variance and standard deviation) use sums of squares: nx (x i ˆµ) 2 i=1

More information

Chapter 8: Sampling distributions of estimators Sections

Chapter 8: Sampling distributions of estimators Sections Chapter 8 continued Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample

More information

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Statistics 431 Spring 2007 P. Shaman. Preliminaries Statistics 4 Spring 007 P. Shaman The Binomial Distribution Preliminaries A binomial experiment is defined by the following conditions: A sequence of n trials is conducted, with each trial having two possible

More information

Analysis of truncated data with application to the operational risk estimation

Analysis of truncated data with application to the operational risk estimation Analysis of truncated data with application to the operational risk estimation Petr Volf 1 Abstract. Researchers interested in the estimation of operational risk often face problems arising from the structure

More information

On Some Statistics for Testing the Skewness in a Population: An. Empirical Study

On Some Statistics for Testing the Skewness in a Population: An. Empirical Study Available at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 12, Issue 2 (December 2017), pp. 726-752 Applications and Applied Mathematics: An International Journal (AAM) On Some Statistics

More information

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS Melfi Alrasheedi School of Business, King Faisal University, Saudi

More information

GENERATION OF STANDARD NORMAL RANDOM NUMBERS. Naveen Kumar Boiroju and M. Krishna Reddy

GENERATION OF STANDARD NORMAL RANDOM NUMBERS. Naveen Kumar Boiroju and M. Krishna Reddy GENERATION OF STANDARD NORMAL RANDOM NUMBERS Naveen Kumar Boiroju and M. Krishna Reddy Department of Statistics, Osmania University, Hyderabad- 500 007, INDIA Email: nanibyrozu@gmail.com, reddymk54@gmail.com

More information

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz 1 EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu

More information

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION INSTITUTE AND FACULTY OF ACTUARIES Curriculum 2019 SPECIMEN EXAMINATION Subject CS1A Actuarial Statistics Time allowed: Three hours and fifteen minutes INSTRUCTIONS TO THE CANDIDATE 1. Enter all the candidate

More information

1. You are given the following information about a stationary AR(2) model:

1. You are given the following information about a stationary AR(2) model: Fall 2003 Society of Actuaries **BEGINNING OF EXAMINATION** 1. You are given the following information about a stationary AR(2) model: (i) ρ 1 = 05. (ii) ρ 2 = 01. Determine φ 2. (A) 0.2 (B) 0.1 (C) 0.4

More information

Fitting parametric distributions using R: the fitdistrplus package

Fitting parametric distributions using R: the fitdistrplus package Fitting parametric distributions using R: the fitdistrplus package M. L. Delignette-Muller - CNRS UMR 5558 R. Pouillot J.-B. Denis - INRA MIAJ user! 2009,10/07/2009 Background Specifying the probability

More information

MATH 3200 Exam 3 Dr. Syring

MATH 3200 Exam 3 Dr. Syring . Suppose n eligible voters are polled (randomly sampled) from a population of size N. The poll asks voters whether they support or do not support increasing local taxes to fund public parks. Let M be

More information

Mean GMM. Standard error

Mean GMM. Standard error Table 1 Simple Wavelet Analysis for stocks in the S&P 500 Index as of December 31 st 1998 ^ Shapiro- GMM Normality 6 0.9664 0.00281 11.36 4.14 55 7 0.9790 0.00300 56.58 31.69 45 8 0.9689 0.00319 403.49

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation The likelihood and log-likelihood functions are the basis for deriving estimators for parameters, given data. While the shapes of these two functions are different, they have

More information

Paper Series of Risk Management in Financial Institutions

Paper Series of Risk Management in Financial Institutions - December, 007 Paper Series of Risk Management in Financial Institutions The Effect of the Choice of the Loss Severity Distribution and the Parameter Estimation Method on Operational Risk Measurement*

More information

Confidence Intervals Introduction

Confidence Intervals Introduction Confidence Intervals Introduction A point estimate provides no information about the precision and reliability of estimation. For example, the sample mean X is a point estimate of the population mean μ

More information

MM and ML for a sample of n = 30 from Gamma(3,2) ===============================================

MM and ML for a sample of n = 30 from Gamma(3,2) =============================================== and for a sample of n = 30 from Gamma(3,2) =============================================== Generate the sample with shape parameter α = 3 and scale parameter λ = 2 > x=rgamma(30,3,2) > x [1] 0.7390502

More information

Jackknife Empirical Likelihood Inferences for the Skewness and Kurtosis

Jackknife Empirical Likelihood Inferences for the Skewness and Kurtosis Georgia State University ScholarWorks @ Georgia State University Mathematics Theses Department of Mathematics and Statistics 5-10-2014 Jackknife Empirical Likelihood Inferences for the Skewness and Kurtosis

More information

Review: Population, sample, and sampling distributions

Review: Population, sample, and sampling distributions Review: Population, sample, and sampling distributions A population with mean µ and standard deviation σ For instance, µ = 0, σ = 1 0 1 Sample 1, N=30 Sample 2, N=30 Sample 100000000000 InterquartileRange

More information

Statistical Methodology. A note on a two-sample T test with one variance unknown

Statistical Methodology. A note on a two-sample T test with one variance unknown Statistical Methodology 8 (0) 58 534 Contents lists available at SciVerse ScienceDirect Statistical Methodology journal homepage: www.elsevier.com/locate/stamet A note on a two-sample T test with one variance

More information

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS Questions 1-307 have been taken from the previous set of Exam C sample questions. Questions no longer relevant

More information

Tests for Two ROC Curves

Tests for Two ROC Curves Chapter 65 Tests for Two ROC Curves Introduction Receiver operating characteristic (ROC) curves are used to summarize the accuracy of diagnostic tests. The technique is used when a criterion variable is

More information

FINITE SAMPLE DISTRIBUTIONS OF RISK-RETURN RATIOS

FINITE SAMPLE DISTRIBUTIONS OF RISK-RETURN RATIOS Available Online at ESci Journals Journal of Business and Finance ISSN: 305-185 (Online), 308-7714 (Print) http://www.escijournals.net/jbf FINITE SAMPLE DISTRIBUTIONS OF RISK-RETURN RATIOS Reza Habibi*

More information

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics CONTENTS Estimating parameters The sampling distribution Confidence intervals for μ Hypothesis tests for μ The t-distribution Comparison

More information

A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution

A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution Debasis Kundu 1, Rameshwar D. Gupta 2 & Anubhav Manglick 1 Abstract In this paper we propose a very convenient

More information

SYLLABUS OF BASIC EDUCATION SPRING 2018 Construction and Evaluation of Actuarial Models Exam 4

SYLLABUS OF BASIC EDUCATION SPRING 2018 Construction and Evaluation of Actuarial Models Exam 4 The syllabus for this exam is defined in the form of learning objectives that set forth, usually in broad terms, what the candidate should be able to do in actual practice. Please check the Syllabus Updates

More information

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority Chapter 235 Analysis of 2x2 Cross-Over Designs using -ests for Non-Inferiority Introduction his procedure analyzes data from a two-treatment, two-period (2x2) cross-over design where the goal is to demonstrate

More information

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y )) Correlation & Estimation - Class 7 January 28, 2014 Debdeep Pati Association between two variables 1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by Cov(X, Y ) = E(X E(X))(Y

More information

Practice Exam 1. Loss Amount Number of Losses

Practice Exam 1. Loss Amount Number of Losses Practice Exam 1 1. You are given the following data on loss sizes: An ogive is used as a model for loss sizes. Determine the fitted median. Loss Amount Number of Losses 0 1000 5 1000 5000 4 5000 10000

More information

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M. adjustment coefficient, 272 and Cramér Lundberg approximation, 302 existence, 279 and Lundberg s inequality, 272 numerical methods for, 303 properties, 272 and reinsurance (case study), 348 statistical

More information

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book. Simulation Methods Chapter 13 of Chris Brook s Book Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 April 26, 2017 Christopher

More information

Application of MCMC Algorithm in Interest Rate Modeling

Application of MCMC Algorithm in Interest Rate Modeling Application of MCMC Algorithm in Interest Rate Modeling Xiaoxia Feng and Dejun Xie Abstract Interest rate modeling is a challenging but important problem in financial econometrics. This work is concerned

More information

Presented at the 2012 SCEA/ISPA Joint Annual Conference and Training Workshop -

Presented at the 2012 SCEA/ISPA Joint Annual Conference and Training Workshop - Applying the Pareto Principle to Distribution Assignment in Cost Risk and Uncertainty Analysis James Glenn, Computer Sciences Corporation Christian Smart, Missile Defense Agency Hetal Patel, Missile Defense

More information

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range February 19, 2004 EXAM 1 : Page 1 All sections : Geaghan Read Carefully. Give an answer in the form of a number or numeric expression where possible. Show all calculations. Use a value of 0.05 for any

More information

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 7 Sampling Distributions and Point Estimation of Parameters Chapter 7 Sampling Distributions and Point Estimation of Parameters Part 1: Sampling Distributions, the Central Limit Theorem, Point Estimation & Estimators Sections 7-1 to 7-2 1 / 25 Statistical Inferences

More information

Statistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015

Statistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015 Statistical Analysis of Data from the Stock Markets UiO-STK4510 Autumn 2015 Sampling Conventions We observe the price process S of some stock (or stock index) at times ft i g i=0,...,n, we denote it by

More information

The Assumption(s) of Normality

The Assumption(s) of Normality The Assumption(s) of Normality Copyright 2000, 2011, 2016, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you

More information

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER STA2601/105/2/2018 Tutorial letter 105/2/2018 Applied Statistics II STA2601 Semester 2 Department of Statistics TRIAL EXAMINATION PAPER Define tomorrow. university of south africa Dear Student Congratulations

More information

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION International Days of Statistics and Economics, Prague, September -3, MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION Diana Bílková Abstract Using L-moments

More information

χ 2 distributions and confidence intervals for population variance

χ 2 distributions and confidence intervals for population variance χ 2 distributions and confidence intervals for population variance Let Z be a standard Normal random variable, i.e., Z N(0, 1). Define Y = Z 2. Y is a non-negative random variable. Its distribution is

More information

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims International Journal of Business and Economics, 007, Vol. 6, No. 3, 5-36 A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims Wan-Kai Pang * Department of Applied

More information

A Robust Test for Normality

A Robust Test for Normality A Robust Test for Normality Liangjun Su Guanghua School of Management, Peking University Ye Chen Guanghua School of Management, Peking University Halbert White Department of Economics, UCSD March 11, 2006

More information

Market Risk Analysis Volume I

Market Risk Analysis Volume I Market Risk Analysis Volume I Quantitative Methods in Finance Carol Alexander John Wiley & Sons, Ltd List of Figures List of Tables List of Examples Foreword Preface to Volume I xiii xvi xvii xix xxiii

More information

A Test of the Normality Assumption in the Ordered Probit Model *

A Test of the Normality Assumption in the Ordered Probit Model * A Test of the Normality Assumption in the Ordered Probit Model * Paul A. Johnson Working Paper No. 34 March 1996 * Assistant Professor, Vassar College. I thank Jahyeong Koo, Jim Ziliak and an anonymous

More information

Statistical Intervals (One sample) (Chs )

Statistical Intervals (One sample) (Chs ) 7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and

More information

One sample z-test and t-test

One sample z-test and t-test One sample z-test and t-test January 30, 2017 psych10.stanford.edu Announcements / Action Items Install ISI package (instructions in Getting Started with R) Assessment Problem Set #3 due Tu 1/31 at 7 PM

More information

An Improved Saddlepoint Approximation Based on the Negative Binomial Distribution for the General Birth Process

An Improved Saddlepoint Approximation Based on the Negative Binomial Distribution for the General Birth Process Computational Statistics 17 (March 2002), 17 28. An Improved Saddlepoint Approximation Based on the Negative Binomial Distribution for the General Birth Process Gordon K. Smyth and Heather M. Podlich Department

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrich Alfons Vasicek he amount of capital necessary to support a portfolio of debt securities depends on the probability distribution of the portfolio loss. Consider

More information

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii) Contents (ix) Contents Preface... (vii) CHAPTER 1 An Overview of Statistical Applications 1.1 Introduction... 1 1. Probability Functions and Statistics... 1..1 Discrete versus Continuous Functions... 1..

More information

Inferences on Correlation Coefficients of Bivariate Log-normal Distributions

Inferences on Correlation Coefficients of Bivariate Log-normal Distributions Inferences on Correlation Coefficients of Bivariate Log-normal Distributions Guoyi Zhang 1 and Zhongxue Chen 2 Abstract This article considers inference on correlation coefficients of bivariate log-normal

More information

Simulation Wrap-up, Statistics COS 323

Simulation Wrap-up, Statistics COS 323 Simulation Wrap-up, Statistics COS 323 Today Simulation Re-cap Statistics Variance and confidence intervals for simulations Simulation wrap-up FYI: No class or office hours Thursday Simulation wrap-up

More information

NCSS Statistical Software. Reference Intervals

NCSS Statistical Software. Reference Intervals Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and

More information

Tests for One Variance

Tests for One Variance Chapter 65 Introduction Occasionally, researchers are interested in the estimation of the variance (or standard deviation) rather than the mean. This module calculates the sample size and performs power

More information

A New Test for Correlation on Bivariate Nonnormal Distributions

A New Test for Correlation on Bivariate Nonnormal Distributions Journal of Modern Applied Statistical Methods Volume 5 Issue Article 8 --06 A New Test for Correlation on Bivariate Nonnormal Distributions Ping Wang Great Basin College, ping.wang@gbcnv.edu Ping Sa University

More information

Ideal Bootstrapping and Exact Recombination: Applications to Auction Experiments

Ideal Bootstrapping and Exact Recombination: Applications to Auction Experiments Ideal Bootstrapping and Exact Recombination: Applications to Auction Experiments Carl T. Bergstrom University of Washington, Seattle, WA Theodore C. Bergstrom University of California, Santa Barbara Rodney

More information

Resampling techniques to determine direction of effects in linear regression models

Resampling techniques to determine direction of effects in linear regression models Resampling techniques to determine direction of effects in linear regression models Wolfgang Wiedermann, Michael Hagmann, Michael Kossmeier, & Alexander von Eye University of Vienna, Department of Psychology

More information

Data Analysis. BCF106 Fundamentals of Cost Analysis

Data Analysis. BCF106 Fundamentals of Cost Analysis Data Analysis BCF106 Fundamentals of Cost Analysis June 009 Chapter 5 Data Analysis 5.0 Introduction... 3 5.1 Terminology... 3 5. Measures of Central Tendency... 5 5.3 Measures of Dispersion... 7 5.4 Frequency

More information