Test Volume 12, Number 1. June 2003

Size: px

Start display at page:

Download "Test Volume 12, Number 1. June 2003"

Constance Sherman
5 years ago
Views:

1 Sociedad Española de Estadística e Investigación Operativa Test Volume 12, Number 1. June 2003 Power and Sample Size Calculation for 2x2 Tables under Multinomial Sampling with Random Loss Kung-Jong Lui Department of Mathematics and Statistics. San Diego State University. William G. Cumberland Department of Biostatistics. University of California, Los Angeles. Sociedad de Estadística e Investigación Operativa Test (2003) Vol. 12, No. 1, pp

2 Sociedad de Estadística e Investigación Operativa Test (2003) Vol. 12, No. 1, pp Power and Sample Size Calculation for 2x2 Tables under Multinomial Sampling with Random Loss Kung-Jong Lui Department of Mathematics and Statistics. San Diego State University. William G. Cumberland Department of Biostatistics. University of California, Los Angeles. Abstract Multinomial sampling, in which the total number of sampled subjects is fixed, is probably one of the most commonly used sampling schemes in categorical data analysis. When we apply multinomial sampling to collect subjects who are subject to a random exclusion from our data analysis, the number of subjects falling into each comparison group is random and can be small with a positive probability. Thus, the application of the traditional statistics derived from large sample theory for testing equality between two independent proportions can sometimes be theoretically invalid. On the other hand, using Fisher s exact test can always assure that the true type I error is less than or equal to a nominal α-level. Thus, we discuss here power and sample size calculation based on this exact test. For a desired power at a given α-level, we develop an exact sample size calculation procedure, that accounts for a random loss of sampled subjects, for testing equality between two independent proportions under multinomial sampling. Because the exact sample size calculation procedure requires intensive computations when the underlying required sample size is large, we also present an approximate sample size formula using large sample theory. On the basis of Monte Carlo simulation, we note that the power of using this approximate sample size formula generally agrees well with the desired power on the basis of the exact test. Finally, we propose a trial-and-error procedure using the approximate sample size as an initial estimate and Monte Carlo simulation to expedite the procedure for searching the minimum required sample size. Key Words: Sample size determination, Fisher s exact test, multinomial sampling, power AMS subject classification: 62F03, 62A05. Correspondence to: K.-J. Lui, Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182, USA. kjl@rohan.sdsu.edu Received: November 2001; Accepted: May 2002

3 142 K. J. Lui and W. G. Cumberland 1 Introduction Multinomial sampling, in which the total number of studied subjects is fixed, is probably one of the most commonly considered sampling designs in categorical data analysis (Bishop et al., 1975). Consider an epidemiological prevalence study, in which we take a random sample from a general population and want to compare the prevalence of disease between the exposure and the nonexposure subpopulations to a risk factor of interest. Or consider a clinical trial, in which we randomly assign each patient to receive treatment A or B with fixed probabilities, and wish to compare the response rate between two treatments. In either of the above cases, the number of subjects falling into the comparison groups is random. Furthermore, it is common that we may need to exclude some sampled subjects due to missing information from our data. As noted elsewhere (Skalski, 1992; Lui, 1994), sample size determination failing to take into account the potential loss of subjects can result in studies with inadequate power. Because the number of sampled subjects falling into the two comparison groups can be small with a positive probability under multinomial sampling with a random loss of sampled subjects, traditional statistics using large sample theory for testing equality between two independent proportions (Fleiss, 1981) can sometimes be theoretically invalid. However, using Fisher s exact test (Fisher, 1935; Irwin, 1935; Yates, 1934; Fleiss, 1981) can always assure that the true type I error is less than or equal to a nominal α-level regardless of the number of subjects from the subpopulations. This leads us to concentrate our discussion on power and sample size calculation on the basis of the exact test. Note that numerous publications on calculation of power and sample size based on the exact test under the product binomial sampling appear elsewhere (Bennett and Hsu, 1960; Haseman, 1978; Gail and Gart, 1973; Casagrande et al., 1978a; Gordon, 1994). Recently, an excellent and systematic review of sample size determination for testing differences in proportions under the two-sample design also appears in Sahai and Khurshid (1996). However, none of these papers focuses discussion on sample size calculation under multinomial sampling with a random exclusion of sampled subjects from data analysis as is done here. The purpose of this paper is to extend the sample size calculation procedure proposed elsewhere (Bennett and Hsu, 1960) to accommodate multinomial sampling with a random loss of sampled subjects. To provide readers with an insight of the effects due to different parameters, this paper cal-

4 Sample Size Calculation with Random Loss 143 culates the power based on the exact multinomial distribution in a variety of situations. Because the exact sample size calculation procedure involves intensive computations when the required sample size is large, this paper presents an approximate sample size formula using the large sample theory as well. Using Monte Carlo simulation, this paper finds that the power of using an approximate sample size formula derived from a method analogous to that proposed elsewhere (Casagrande et al., 1978b; Fleiss et al., 1980; Fleiss, 1981) can actually be quite accurate. Finally, this paper suggests a trial-and-error procedure using the approximate sample size as an initial estimate and Monte Carlo simulation to expedite the procedure for finding the minimum required sample size for a desired power at a nominal α-level. 2 Notations, Power, and Sample Size Determination Suppose that we take a random sample of n subjects, each having probability p e of falling into one comparison group, and probability 1 p e of falling into the other comparison group. For example, the probability p e may denote the population proportion of exposure in epidemiological prevalence studies or the probability of assigning subjects to an experimental treatment in clinical trials. Because the information regarding the exposure status or the outcome can sometimes be missing in prevalence studies, or the studied subjects can be lost to follow-up in clinical trials, we assume that each sampled subject has a positive probability p m to be excluded from our data analysis. For simplicity, we focus our discussion on the situation where the exclusion of a sampled subject is independent of both the exposure (or the treatment assignment) and the outcome status. Because the following discussion can be generally applied to test equality between the proportions of two comparison groups, we use the number 1 and 2 to designate these groups. Let N 1, N 2, and N 3 denote the random frequencies corresponding to groups 1 and 2, and the group of subjects who will be excluded from our comparison. The random vector N = (N 1, N 2, N 3 ) then follows a trinomial distribution: f N (n p e, p m ) = n! n 1!n 2!n 3! πn 1 1 πn 2 2 πn 3 3, (2.1) where n = (n 1, n 2, n 3 ), π 1 = (1 p m )p e, π 2 = (1 p m )(1 p e ), π 3 = p m, 0 n i n, and i n i = n.

5 144 K. J. Lui and W. G. Cumberland Let p 1 and p 2 denote the probabilities that a randomly selected subject from groups 1 and 2, respectively, has the outcome of interest. We consider first the situation for a one-sided test. Suppose that we want to test the null hypothesis H 0 : p 1 = p 2 versus the alternative hypothesis H a : p 1 > p 2. Let X i denote the number of subjects with the outcome of interest among the N i subjects from group i (i = 1, 2). Furthermore, let T = X 1 + X 2 denote the total number of subjects with the outcome of interest in the sample. Then, given N 1 = n 1, N 2 = n 2, and T = t, the conditional distribution of X 1 is well known to follow P (X 1 = x 1 t, n 1, n 2, p 1, p 2 ) = ( n1 b x=a )( n2 ) x 1 t x 1 φ x 1 ( n1 )( n2 ), (2.2) x t x φ x where a x 1 b, a = max(0, t n 2 ), b = min(t, n 1 ), and φ = p 1 (1 p 2 )/[(1 p 1 )p 2 ] is the odds ratio of possessing the outcome of interest between groups 1 and 2. When the null hypothesis H 0 : p 1 = p 2 (i.e., φ = 1) is true, the conditional distribution (2.2) of X 1 reduces to the hypergeometric distribution: P (X 1 = x 1 t, n 1, n 2, p 1 = p 2 ) = ( n1 )( n2 x 1 ). (2.3) ( n1 +n 2 t t x 1 ) Under the alternative hypothesis H a : p 1 > p 2, we expect the value of X 1 to be large. Thus, the critical region C(α) of a nominal α-level (one-sided test) consists of {X 1 : X 1 x 1 }, where x 1 is the smallest integer such that x 1 x P (X 1 = x 1 t, n 1, n 2, p 1 = p 2 ) α. The conditional power, given 1 n 1, n 2, and t, is then q(α, p 1, p 2 n 1, n 2, t) = P (X 1 = x 1 t, n 1, n 2, p 1, p 2 ), (2.4) x 1 C(α) where P (X 1 = x 1 t, n 1, n 2, p 1, p 2 ) is given by (2.2). Thus, the conditional power, given n 1 and n 2 is q(α, p 1, p 2 n 1, n 2 ) = n 1 +n 2 t=0 ( n1 x=a q(α, p 1, p 2 n 1, n 2, t)f T (t n 1, n 2 ), (2.5) where f T (t n 1, n 2 ) = b )( n2 ) x t x p x 1 (1 p 1 ) n1 x p t x 2 (1 p 2 ) n 2 (t x) and a = max(0, t n 2 ), b = min(t, n 1 ). Bennett and Hsu (1960) base their sample size calculation on (2.5) for studies in which the number of studied

6 Sample Size Calculation with Random Loss 145 subjects from each comparison group is fixed. We cannot directly apply (2.5) to calculate power for the situation in which n 1 and n 2 are random, nor when some n 3 subjects are randomly excluded from our data. Instead, we consider the expected power for a total sample size n with the given probabilities p e and p m. This expected power is q(α, p 1, p 2, n, p e, p m ) = n q(α, p 1, p 2 n 1, n 2 )f N (n p e, p m ), (2.6) where the summation is over all possible vector values for n = (n 1, n 2, n 3 ), and f N (n p e, p m ) is given in (2.1). For a desired power 1 β, we may use a trial-and-error procedure to find the minimum required sample size n such that the expected power q(α, p 1, p 2, n, p e, p m ) is 1 β. However, calculation of this expected power (2.6) can be very computationally intensive when the minimum required sample size n is large. Hence we need an approximate sample size formula for n. If n were very large, we would expect n i ( =. nπ i ) to be large as well, and the ratio n 2 /n 1 between groups 2 and 1 to be approximately equal to r = π 2 /π 1. Therefore, an approximation to the expected required sample size E(n 1 ) from group 1 for a desired power 1 β of rejecting the null hypothesis H 0 : p 1 = p 2 at α-level (one-sided test) when the alternative hypothesis H a : p 1 > p 2 is true is given by (Fleiss et al. 1980;Fleiss 1981, p. 45) { } n [Z 1a = ceiling α p(1 p)(r + 1) + Zβ rp1 (1 p 1 ) + p 2 (1 p 2 )] 2 r(p 1 p 2 ) 2, (2.7) where Z α is the upper 100(α)th percentile of the standard normal distribution, p = (p 1 +rp 2 )/(1+r), and ceiling {x} denotes the least integer greater than or equal to x. Note that when deriving sample size formula (2.7), we do not account for the continuity correction and hence using (2.7) tends to underestimate the expected required sample size from group 1 on the basis of the exact test (Casagrande et al., 1978b; Gordon, 1994). To alleviate this underestimation, we may want to apply the following adjustment formula which incorporates the continuity correction into the sample size determination (Fleiss et al., 1980; Fleiss, 1981; Casagrande et al., 1978b): n 1a = ceiling n 1a 4 ( ) 2 2(r + 1) n 1a r p 1 p 2. (2.8)

7 146 K. J. Lui and W. G. Cumberland These results suggest that an approximately minimum required sample size n with the continuity correction should be given by n a = ceiling{[n 1a + ceiling{n 1a r}]/(1 p m )}. (2.9) Note that the above discussions can be easily extended to accommodate hypothesis testing for a two-sided test. Consider testing the null hypothesis H 0 : p 1 = p 2 versus the alternative hypothesis H a : p 1 p 2. We reject the null hypothesis H 0 when X 1 is too large or too small. Thus, a critical region C(α) of a nominal α-level (two-sided test) consists of {X 1 : X 1 x 1 or X 1 x 1 }, where x 1 is the smallest integer such that x 1 x 1 P (X 1 = x 1 t, n 1, n 2, p 1 = p 2 ) α/2 and x 1 is the largest integer such that x 1 x P (X 1 = x 1 t, n 1, n 2, p 1 = p 2 ) α/2. With this critical 1 region C(α), we can calculate the expected power q(α, p 1, p 2, n, p e, p m ) (2.6) through use of (2.4) and (2.5). We can further find the minimum required sample size n for a desired power 1 β at a nominal α-level of two-sided test using (2.6). Similarly, we can substitute Z α for Z α/2 in (2.7) and apply (2.8) for the continuity correction to obtain an approximate sample size formula n a (2.9) for a two-sided test. 3 Power and Sample Size Calculation To illustrate the use of formula (2.6), we first calculate the expected power for the situations in which p e = 0.1, 0.30, p m = 0.10, 0.20, p 1 = 0.40, p 2 = 0.1, 0.20, 0.30, and n = 20 to 100 by 10, 120 to 200 by 20 at 0.05-level for one-sided and two sided tests using the exact multinomial distribution (2.1). For example, when p e = 0.1, p m = 0.10, p 1 = 0.40, p 2 = 0.1, and n = 180, the corresponding powers are and for one-sided and two-sided tests, respectively (Table 1). As we expect, we see that the power increases as either the total sample size n or the difference between p 1 and p 2 increases, but decreases as the probability of excluding a random selected subject p m increases. When the minimum required sample size n with the expected power q(α, p 1, p 2, n, p e, p m ) (2.6) greater or equal than a desired power 1 β at a nominal α-level is large, because the number of combinations of (n 1, n 2, n 3 ) under the multinomial distribution (2.1) can be very large, searching for the minimum required number n of subjects using a trial-and-error procedure will be extremely time consuming. To alleviate this problem, we propose

8 Sample Size Calculation with Random Loss 147 p e.1.3 p m p n One-Sided Test n Two-Sided Test Table 1: The exact power for testing the null hypothesis H 0 : p 1 = p 2 versus H a : p 1 > p 2 (one-sided test) or H a : p 1 p 2 (two-sided test) at 5% level, where p 1 = 0.40, and p 2 = 0.1, 0.2, 0.30; the probability of subjects falling into group 1, p e = 0.10, 0.30; the probability of excluding a randomly selected subject p m = 0.1, 0.20; and the total sample size n = 20 to 100 by 10, 120 to 200 by 20. using Monte Carlo simulation to generate 1000 repeated samples from the desired multinomial distribution (2.1). We then use the resulting empirical density for the random vector n rather than (2.1) when calculating the expected power (2.6). To further expedite this search procedure, we use the approximate sample size n a (2.9) as an initial estimate. If the power corre-

9 148 K. J. Lui and W. G. Cumberland sponding to n a (2.9) were less than the desired power, we would calculate powers using increasing sample sizes n{k} = n a + k max(int{n a /100}, 1), where max(v 1, v 2 ) denotes the maximum of v 1 and v 2, int{x} denotes the greatest integer x, for k = 1, 2,... until we first observe power greater or equal than 1 β. We note this sample size by n{k }. Similarly, if the power corresponding to n a (2.9) were larger than the desired power, we would then calculate powers using decreasing sample sizes n{k} = n a k max(int{n a /100}, 1) for k = 1, 2, until we obtain the first k such that the observed power q(α, p 1, p 2, n{k 1}, p e, p m ) < 1 β. Then, the minimum required sample size is again set equal to n{k }. Tables 2 and 3 summarize the approximate required sample n a (2.9), its corresponding power, and the final minimum required sample size estimate n{k } for one-sided and two-sided tests, respectively, for a desired power of 80% to reject the null hypothesis H 0 : p 1 = p 2 at the 0.05-level in the situations in which p 1 = 0.20, 0.30, 0.40, 0.50; p 2 ranges from 0.10 to p ; p e = 0.10, 0.30, 0.50, 0.70; and p m = 0.10, As shown in Tables 2 and 3, the power of using the approximate sample size formula n a (2.9) actually agrees reasonably well with the desired power 0.80 in almost all the situations considered here. For example, consider one of the worst cases for one-sided test: p e = 0.10, p m = 0.10, p 1 = 0.40, p 2 = 0.10 in Table 2. Here the corresponding power to the approximate sample size n a = 167 subjects (2.9) at 0.05 level (one-sided test) is 77.6%, that is less than the desired power 80% by only 2.5%. In this case, the final estimate of the minimum required sample size n{k } is 178 (Table 2). Similarly, when considering the same case as above for two-sided test (Table 3): p e = 0.10, p m = 0.10, p 1 = 0.40, p 2 = 0.10, the approximate sample size n a = 200, while the final estimate estimate n{k } is Discussion When we compare disease rates between subpopulations in sample surveys or response rates between treatments in clinical trials, this paper develops a sample size calculation procedure on the basis of the exact test for multinomial sampling with a random loss of subjects. If the required sample size is not large (less than 200), we can calculate the exact power (2.6) as those presented in Table 1 without any practical difficulty. These results not only provide us with an insight into the effects due to different parameters on the power, but also allow us to find the minimum required sample size for

10 Sample Size Calculation with Random Loss 149 p e p m p 1 p 2 n a n{k } n a n{k } n a n{k } n a n{k } (.787) (.787) (.824) (.818) (.782) (.784) (.804) (.804) (.794) (.793) (.801) (.803) (.798) (.797) (.800) (.801) (.776) (.773) (.800) (.799) (.791) (.795) (.805) (.806) (.798) (.797) (.799) (.799) (.783) (.783) (.808) (.808) (.792) (.792) (.797) (.797) (.783) (.786) (.801) (.799) 514 p e.5.7 p m p 1 p 2 n a n{k } n a n{k } n a n{k } n a n{k } (.822) (.815) (.825) (.822) (.810) (.807) (.808) (.809) (.802) (.801) (.802) (.801) (.801) (.801) (.801) (.801) (.830) (.824) (.821) (.816) (.807) (.805) (.807) (.810) (.800) (.800) (.802) (.802) (.813) (.812) (.820) (.820) (.802) (.803) (.806) (.807) (.810) (.809) (.813) (.814) 525 Table 2: The approximate sample size n a (2.9), its corresponding power (in parenthesis), as well as the final estimate of the minimum required sample size n{k } for a desired power 0.80 of rejecting the null hypothesis H 0 : p 1 = p 2 at 5% level (one-sided test). a desired power 80% at a nominal 0.05-level. For example, for the case of p e = 0.1, p m = 0.10, p 1 = 0.40, and p 2 = 0.1, Table 1 shows that the required sample size for a desired power of 0.80 of one-sided test by linear interpolation is approximately 178 (= (0.05/0.47)). This is actually identical to the minimum required sample size estimate n{k } found using Monte Carlo simulation in Table 2. We also see that the estimated required sample size using either n a (2.9) or n{k } (Table 2) tends to reach the minimum as p e = This is consistent with the well-known fact that given a fixed total sample size, equal sample allocation is generally optimal to maximize the power in comparison studies.

11 150 K. J. Lui and W. G. Cumberland p e p m p 1 p 2 n a n{k } n a n{k } n a n{k } n a n{k } (.750) (.750) (.827) (.826) (.782) (.775) (.796) (.794) (.788) (.789) (.803) (.802) (.796) (.797) (.799) (.799) (.771) (.775) (.812) (.813) (.783) (.782) (.795) (.796) (.794) (.793) (.799) (.798) (.768) (.770) (.798) (.797) (.788) (.789) (.797) (.796) (.781) (.781) (.797) (.797) 631 p e p m p 1 p 2 n a n{k } n a n{k } n a n{k } n a n{k } (.837) (.835) (.831) (.826) (.805) (.805) (.815) (.812) (.801) (.802) (.805) (.805) (.801) (.801) (.801) (.800) (.821) (.817) (.832) (.828) (.805) (.805) (.811) (.810) (.801) (.801) (.803) (.804) (.818) (.819) (.828) (.827) (.802) (.803) (.808) (.807) (.809) (.808) (.817) (.816) 651 Table 3: The approximate sample size n a (2.9), its corresponding power (in parenthesis), as well as the final estimate of the minimum required sample size n{k } for a desired power 0.80 of rejecting the null hypothesis H 0 : p 1 = p 2 at 5% level (two-sided test). Tables 2 and 3 demonstrate that using the approximate sample size formula n a (2.9) can actually agree well with the minimum required sample size estimate n{k } needed for a desired power in most situations. Thus, we can expedite the searching process for locating the minimum required sample size by using this approximate sample size n a as an initial estimate and applying the trial-and-error procedure. In summary, this paper has developed a sample size calculation procedure for a desired power 1 β at a given α-level for comparing the two independent proportions under multinomial sampling in the presence of random loss. This paper has presented an approximate sample size for-

12 Sample Size Calculation with Random Loss 151 mula and found that this approximation formula can be quite accurate in almost all the situations considered here. The results and the discussion presented here should be of use for biostatisticians, epidemiologists, and clinicians when they wish to employ multinomial sampling to collect subjects, but each of whom is subject to a random exclusion from studies. Acknowledgements The authors wish to thank the anonymous referee for many valuable comments to improve the clarity and scope of this paper, especially for the suggestion of the approximate sample size formula considered in this paper, and Ms. Ying Ying Ma for computational assistance in estimation of the required sample size. References Bennett, B. and Hsu, P. (1960). On the power function of the exact test for the 2x2 contingency table. Biometrika, 47: Bishop, Y., Fienberg, S., and Holland, P. (1975). Discrete Multivariate Analysis, Theory and Practice. MIT Press, Cambridge. Casagrande, J., Pike, M., and Smith, P. (1978a). The power function of the exact test for comparing two binomial distributions. Applied Statistics, 27: Casagrande, J., Pike, M., and Smith, P. (1978b). An improved approximate formula for comparing two binomial distributions. Biometrics, 34: Fisher, R. (1935). The logic of inductive inference. Journal of Royal Statistical Society, Series A, 98: Fleiss, J. (1981). Statistical Methods for Rates and Proportions, 2nd edn. Wiley and Sons, New York. Fleiss, J. L., Tytun, A., and Ury, H. K. (1980). A simple approximation for calculating sample sizes for comparing independent proportions. Biometrics, 36:

13 152 K. J. Lui and W. G. Cumberland Gail, M. and Gart, J. (1973). The determination of sample sizes for use with the exact conditional test in 2x2 comparative trials. Biometrics, 29: Gordon, I. (1994). Sample size for two independent proportions: a review. Australian Journal of Statistics, 36: Haseman, J. (1978). Exact sample sizes for use with the Fisher-Irwin test for 2x2 tables. Biometrics, 34: Irwin, J. D. (1935). Test of significance for differences between percentages based on small numbers. Metron, 12: Lui, K.-J. (1994). The effect of retaining probability variation on sample size calculations for normal variates. Biometrics, 50: Sahai, H. and Khurshid, A. (1996). Formulas and tables for determination of sample sizes and power in clinical trials for testing differences in proportions for the two-sample design: a review. Statistics in Medicine, 15:1 21. Skalski, J. (1992). Sample size calculations for normal variates under binomial censoring. Biometrics, 48: Yates, F. (1934). Contingency tables involving small numbers and the χ 2 test. Journal of the Royal Statistical Society, Supplement 1, pp

Tests for Two Independent Sensitivities

Chapter 75 Tests for Two Independent Sensitivities Introduction This procedure gives power or required sample size for comparing two diagnostic tests when the outcome is sensitivity (or specificity). In