Tolerance Intervals for Any Data (Nonparametric)

Similar documents
Confidence Intervals for One-Sample Specificity

Confidence Intervals for Paired Means with Tolerance Probability

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

Two-Sample Z-Tests Assuming Equal Variance

Superiority by a Margin Tests for the Ratio of Two Proportions

Tests for Multiple Correlated Proportions (McNemar-Bowker Test of Symmetry)

Confidence Intervals for Pearson s Correlation

Equivalence Tests for One Proportion

Tests for the Difference Between Two Linear Regression Intercepts

Tests for One Variance

Tests for Paired Means using Effect Size

Tests for Two Variances

Non-Inferiority Tests for the Ratio of Two Proportions

Non-Inferiority Tests for the Odds Ratio of Two Proportions

Tests for the Odds Ratio in a Matched Case-Control Design with a Binary X

Tests for Two Means in a Multicenter Randomized Design

Non-Inferiority Tests for Two Means in a 2x2 Cross-Over Design using Differences

Non-Inferiority Tests for the Ratio of Two Means

Two-Sample T-Tests using Effect Size

Confidence Intervals for an Exponential Lifetime Percentile

Tests for Two Independent Sensitivities

Equivalence Tests for the Odds Ratio of Two Proportions

Group-Sequential Tests for Two Proportions

Mixed Models Tests for the Slope Difference in a 3-Level Hierarchical Design with Random Slopes (Level-3 Randomization)

Non-Inferiority Tests for the Difference Between Two Proportions

Tests for Two Exponential Means

Tests for the Difference Between Two Poisson Rates in a Cluster-Randomized Design

Tests for Two Means in a Cluster-Randomized Design

Tests for Intraclass Correlation

Conover Test of Variances (Simulation)

Confidence Intervals for One Variance with Tolerance Probability

Non-Inferiority Tests for the Ratio of Two Means in a 2x2 Cross-Over Design

PASS Sample Size Software

Equivalence Tests for Two Correlated Proportions

Tests for the Matched-Pair Difference of Two Event Rates in a Cluster- Randomized Design

Confidence Intervals for the Median and Other Percentiles

Equivalence Tests for the Ratio of Two Means in a Higher- Order Cross-Over Design

Mendelian Randomization with a Binary Outcome

One-Sample Cure Model Tests

One Proportion Superiority by a Margin Tests

NCSS Statistical Software. Reference Intervals

Confidence Intervals for One Variance using Relative Error

Mendelian Randomization with a Continuous Outcome

Tests for Two ROC Curves

Two-Sample T-Test for Superiority by a Margin

Gamma Distribution Fitting

Two-Sample T-Test for Non-Inferiority

Equivalence Tests for the Difference of Two Proportions in a Cluster- Randomized Design

Conditional Power of One-Sample T-Tests

Point-Biserial and Biserial Correlations

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Conditional Power of Two Proportions Tests

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

R & R Study. Chapter 254. Introduction. Data Structure

Chapter 8 Statistical Intervals for a Single Sample

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Risk Analysis. å To change Benchmark tickers:

Elementary Statistics

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL

Confidence Intervals Introduction

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

You should already have a worksheet with the Basic Plus Plan details in it as well as another plan you have chosen from ehealthinsurance.com.

Binary Diagnostic Tests Single Sample

Data Simulator. Chapter 920. Introduction

Laboratory I.9 Applications of the Derivative

Getting started with WinBUGS

Lecture Quantitative Finance Spring Term 2015

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

8.1 Estimation of the Mean and Proportion

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

Probability Notes: Binomial Probabilities

Abdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009

Edgeworth Binomial Trees

Probability. An intro for calculus students P= Figure 1: A normal integral

* The Unlimited Plan costs $100 per month for as many minutes as you care to use.

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Confidence Intervals and Sample Size

A useful modeling tricks.

Much of what appears here comes from ideas presented in the book:

SAMPLE STANDARD DEVIATION(s) CHART UNDER THE ASSUMPTION OF MODERATENESS AND ITS PERFORMANCE ANALYSIS

1. NEW Sector Trading Application to emulate and improve upon Modern Portfolio Theory.

A NOTE ON FULL CREDIBILITY FOR ESTIMATING CLAIM FREQUENCY

Test Volume 12, Number 1. June 2003

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

CENTRAL SUSQUEHANNA INTERMEDIATE UNIT Application: Personnel. Absence Accumulation Process Step-by-step Instructions

Statistics 13 Elementary Statistics

Appendix A. Selecting and Using Probability Distributions. In this appendix

WEB APPENDIX 8A 7.1 ( 8.9)

On the Distribution and Its Properties of the Sum of a Normal and a Doubly Truncated Normal

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

ESG Yield Curve Calibration. User Guide

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Analytical method transfer: proposals for the location-scale approach and tolerance intervals

Adjusting the Black-Scholes Framework in the Presence of a Volatility Skew

Binomial Distributions

Statistical Methods in Practice STAT/MATH 3379

A lower bound on seller revenue in single buyer monopoly auctions

Statistical Tables Compiled by Alan J. Terry

Transcription:

Chapter 831 Tolerance Intervals for Any Data (Nonparametric) Introduction This routine calculates the sample size needed to obtain a specified coverage of a β-content tolerance interval at a stated confidence level for data without a specified distribution. These intervals are constructed so that they contain at least 100β% of the population with probability of at least 100(1 - α)%. For example, in water management, a drinking water standard might be that one is 95% confident that certain chemical concentrations are not exceeded more than 3% of the time. Difference between a Confidence Interval and a Tolerance Interval It is easy to get confused about the difference between a confidence interval and a tolerance interval. Just remember than a confidence interval is usually a probability statement about the value of a distributional parameter such as the mean or proportion. On the other hand, a tolerance interval is a probability statement about a proportion of the distribution from which the sample is drawn. Technical Details This procedure is primarily based on results in Guenther (1977) and Hahn and Meeker (1991). A tolerance interval is constructed from a random sample so that a specified proportion of the population is contained within the interval. The interval is defined by two limits, L 1 and L 2, which are constructed using order statistics LL 1 = XX (ii), LL 2 = XX (jj) where X (i) is the i th order statistic found by sorting the data in ascending order and selecting the i th sorted value. Proportion of the Population Covered An important concept is that of coverage. Coverage is the proportion of the population distribution that is between the two limits. In the nonparametric case, these population limits are defined by quantiles of the distribution. The coverage is the area under the (unknown) distribution between these limits. 831-1

Solving for N The tolerance limits are found by selecting the appropriate order statistics: X (i) and X (j) so that Pr XX (jj) XX (ii) PP = 1 αα Guenther (1977) provides the following two inequalities that can be solved simultaneously for i, j, and minimum N in the two-sided case. where EE(NN jj + ii + 1; NN, 1 PP) 1 αα EE(NN jj + ii + 1; NN, 1 PP δδ) αα nn EE(rr; nn, pp) = PPPP(XX rr) = bb(xx; nn, pp) = nn xx pp xx (1 pp) nn xx xx=rr It turns out that the solution is for minimum N and for the difference N - j + i + 1. For example, if the solution to a particular problem turns out to be N = 38 and N - j + i + 1 = 5, then any of the index pairs (1, 35), (2, 36), (3, 37), and (4, 38) will work. That is, any pair for which j i = 34. Note that here (2, 36) represents the order statistics: X (2) and X (36). In this example, any of the four pairs are solutions to the two inequalities. Usually, a reasonable choice is to pick one of the central pairs. In the program, we arbitrarily pick (2, 36). But remember that this choice is not unique. The solution is found using a smart searching algorithm that we developed for the one-proportion test which solves a similar problem. nn xx=rr Procedure Options This section describes the options that are specific to this procedure. These are located on the Design tab. For more information about the options of other tabs, go to the Procedure Window chapter. Design Tab The Design tab contains most of the parameters and options that you will be concerned with. Solve For Solve For Select the parameter you want to solve for. The parameter you select here will be shown on the vertical axis in the plots. Sample Size N A search is conducted for the minimum sample size that adheres to the parameter values. Remember that the search is based on two inequalities, so the particular values of the confidence level and alpha' may not be met exactly. Exceedance Margin δ A search is conducted for the value of δ that meets the requirements of the confidence level and alpha' inequalities. α' = Pr(p P + δ) α' is the probability that the sample coverage is P + δ. A search is conducted for the value of α' that meets the requirements of the other settings. 831-2

Coverage Probabilities Calculation Method Method Select the method to be used to calculate the coverage probabilities. When the sample sizes are reasonably large (i.e. greater than 100) and the coverage proportions are between 0.2 and 0.8, the two methods will give similar results. For smaller sample sizes and more extreme proportions (less than 0.2 or greater than 0.8), the normal approximation is not as accurate so binomial enumeration is more appropriate. The choices are Binomial Enumeration Each coverage probability is computed using exact binomial enumeration of all possible outcomes when N Max N for Binomial Enumeration (otherwise, the normal approximation is used). Binomial enumeration of all outcomes is possible because of the discrete nature of the data. This method is a little slower for larger N, but the answers are exact! Normal Approximation Approximate coverage probabilities are computed using the normal approximation to the binomial distribution. This method is faster, but less accurate. Max N for Binomial Enumeration (Method = None) When N is less than or equal to this value, probabilities are calculated using the binomial distribution and enumeration of all possible outcomes. This is possible because of the discrete nature of the data. When N is greater than this value, the normal approximation to the binomial is used when calculating probabilities. We have found that with the speed of modern computers, this value can be set very large. We have put the default at 100,000. One-Sided Limit or Two-Sided Interval Interval Type Specify whether the tolerance interval is two-sided or one-sided. A one-sided interval is usually called a tolerance bound rather than a tolerance interval because it only has one limit. Two-Sided Tolerance Interval A two-sided tolerance interval, defined by two limits, will be used. Upper Tolerance Limit An upper tolerance bound will be used. Lower Tolerance Limit A lower tolerance bound will be used. Sample Size N (Sample Size) Enter one or more values for the sample size. This is the number of individuals selected at random from the population to be in the study. You can enter a single value or a range of values. 831-3

Proportion of Population Covered Proportion Covered (P) Enter the proportion of the population that is covered by the tolerance interval. This the desired coverage proportion between the tolerance limits. It is the probability (area) between the tolerance limits based on the normal distribution. This is the proportion of the population that lies between the two limits. If a two-sided interval (L1, L2) is specified, this is the desired area of the probability distribution between L1 and L2. If F(x) is the CDF of the distribution, P = F(L2) - F(L1). If a lower limit (L1) is specified, this is the desired area of the probability distribution above L1. If F(x) is the CDF of the distribution, P = 1 - F(L1). If an upper limit (L2) is specified, this is the desired area of the probability distribution below L2. If F(x) is the CDF of the distribution, P = F(L2). The possible range is 0 < P < 1 - δ. Usually, one of the values 0.80, 0.90, 0.95, or 0.99 is used. You can enter a single value such as 0.9, a series of values such as 0.8 0.9 0.99, or a range such as 0.8 to 0.98 by 0.02. Confidence Level (1 - α) Specify the proportion of tolerance intervals (constructed with these parameter settings) that would have the same coverage. The absolute range is 0 < 1 - α < 1. The typical range is 0.8 < 1 - α < 1. Usually, one of the values 0.90, 0.95, and 0.99 is used. You can enter a single value such as 0.95, a series of values such as 0.9 0.95 0.99, or a range such as 0.8 to 0.95 by 0.01. Proportion Covered Exceedance Proportion Covered Exceedance Margin (δ) δ is added to P to set an upper bound of P' = P + δ on the coverage. Hence, δ represents a precision value for P. The value of P' is set to occur with a low probability (0.05 or 0.01). For example, if P = 0.9 and δ = 0.01, then the parameters are set so that a coverage of 0.9 occurs with high probability, but a coverage of 0.91 occurs with low probability. The possible range is 0 < δ < 1 - P. Usually 0.1, 0.05, or 0.01 is used. You can enter a single value such as 0.02 or a series of values such as 0.01 0.02 0.05 or a range such as 0.01 to 0.05 by 0.01 α' = Pr(p P + δ) α' is the probability that the sample coverage p is greater than P + δ. It is set to a small value such as 0.05 or 0.01. The range is 0.0 < α' < 0.5. Usually 0.1, 0.05, or 0.01 is used. You can enter a single value such as 0.05, a series of values such as 0.01 0.02 0.05, or a range of values such as 0.01 to 0.05 by 0.01. 831-4

Example 1 Calculating Sample Size Suppose a study is planned to determine the sample size required to compute a two-sided 95% tolerance interval the covers 90% of the population without making specific assumptions about the data distribution. The researcher wants to investigate using a δ of 0.01, 0.025, or 0.05 with an α of 0.05. Suppose a study is planned to determine the sample size required to compute a two-sided 95% normal tolerance interval the covers 90% of the population. The researcher wants to investigate using a δ of 0.01, 0.02, or 0.05 with an α of 0.05. Setup This section presents the values of each of the parameters needed to run this example. First load the Tolerance Intervals for Any Data (Nonparametric) procedure window. You may then make the appropriate entries as listed below, or open Example 1 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... Sample Size Method... Binomial Enumeration Max N for Binomial Enumeration... 100000 Interval Type... Two-Sided Tolerance Interval Proportion Covered (P)... 0.9 Confidence Level (1 α)... 0.95 Coverage Proportion Exceedance (δ)... 0.01 0.025 0.05 α' = Pr(p P + δ)... 0.05 Annotated Output Click the Calculate button to perform the calculations and generate the following output. Numeric Results Numeric Results for Nonparametric Two-Sided Tolerance Interval Proportion Covered Confidence Sample Proportion Exceedance Two-Sided Level Size Covered Margin Pr(p P + δ) Tolerance 1 - α N P δ P + δ α' Interval 0.950 9309 0.900 0.010 0.910 0.050 X(441), X(8867) 0.950 1387 0.900 0.025 0.925 0.050 X(60), X(1327) 0.950 298 0.900 0.050 0.950 0.050 X(11), X(288) References Guenther, William C. 1972. 'Tolerance Intervals for Univariate Distributions.' Naval Research Logistics Quarterly, Vol. 19, No. 2, Pages 309-333. Guenther, William C. 1977. Sampling Inspection in Statistical Quality Control. Griffin s Statistical Monographs, Number 37. London. Hahn, G. J. and Meeker, W.Q. 1991. Statistical Intervals. John Wiley & Sons. New York. Krishnamoorthy, K. and Mathew, T. 2009. Statistical Tolerance Regions. John Wiley, New York. Report Definitions Confidence Level (1 - α) is the proportion of studies with the same settings that produce tolerance intervals with a proportion covered of at least P. N is the number of subjects. Proportion Covered P is the proportion of the population covered. It is the probability between the tolerance interval limits. It is valid for any distribution. 831-5

Proportion Covered Exceedance Margin δ is the value that is added to P to set an upper bound on the coverage at P + δ. P + δ is the upper limit of the proportion covered P. It is a measure of the precision (closeness) of the actual coverage to P. α' = Pr(p P + δ) is the probability that the coverage computed from a random sample (p) is greater than P + δ. It is set to a small value such as 0.05 or 0.01. X(i) and X(j) are the ith and jth sample order statistics (i < j). The values within the parentheses are the indices of the order statistics. For example, X(53) means the 53rd observation after the N values are sorted in ascending order. These two values form the limits of the tolerance interval. Summary Statements A two-sided nonparametric tolerance interval computed from a sample of 9309 observations has a target coverage of 0.900 at a 0.950 confidence level. The probability that the coverage exceeds the target value by an amount 0.010 is 0.050. The order statistics that become the limits of the tolerance interval are X(441), X(8867). Dropout-Inflated Sample Size Dropout- Inflated Expected Enrollment Number of Sample Size Sample Size Dropouts Dropout Rate N N' D 20% 9309 11637 2328 20% 1387 1734 347 20% 298 373 75 Definitions Dropout Rate (DR) is the percentage of subjects (or items) that are expected to be lost at random during the course of the study and for whom no response data will be collected (i.e. will be treated as "missing"). N is the evaluable sample size at which the tolerance interval is computed. If N subjects are evaluated out of the N' subjects that are enrolled in the study, the design will achieve the stated tolerance interval. N' is the total number of subjects that should be enrolled in the study in order to end up with N evaluable subjects, based on the assumed dropout rate. After solving for N, N' is calculated by inflating N using the formula N' = N / (1 - DR), with N' always rounded up. (See Julious, S.A. (2010) pages 52-53, or Chow, S.C., Shao, J., and Wang, H. (2008) pages 39-40.) D is the expected number of dropouts. D = N' - N. This report shows the calculated sample size for each of the scenarios. Plots Section This plot shows the sample size versus the three value of δ. 831-6

Example 2 Calculating α Continuing Example 1, the researchers wants to show the impact of various sample sizes on α'. They decide to determine the value of α' for various value of N between 600 and 2200, keeping the other values the same except that they set δ to 0.025. Setup This section presents the values of each of the parameters needed to run this example. First load the Tolerance Intervals for Any Data (Nonparametric) procedure window. You may then make the appropriate entries as listed below, or open Example 2 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... α' = Pr(p P + δ) Method... Binomial Enumeration Max N for Binomial Enumeration... 100000 Interval Type... Two-Sided Tolerance Interval N (Sample Size)... 600 to 2200 by 200 Proportion Covered (P)... 0.9 Confidence Level (1 α)... 0.95 Coverage Proportion Exceedance (δ)... 0.025 Output Click the Calculate button to perform the calculations and generate the following output. Numeric Results Numeric Results for Nonparametric Two-Sided Tolerance Interval Proportion Covered Confidence Sample Proportion Exceedance Two-Sided Level Size Covered Margin Pr(p P + δ) Tolerance 1 - α N P δ P + δ α' Interval 0.950 600 0.900 0.025 0.925 0.342 X(23), X(576) 0.950 800 0.900 0.025 0.925 0.228 X(33), X(768) 0.950 1000 0.900 0.025 0.925 0.128 X(42), X(958) 0.950 1200 0.900 0.025 0.925 0.087 X(51), X(1149) 0.950 1400 0.900 0.025 0.925 0.049 X(61), X(1340) 0.950 1600 0.900 0.025 0.925 0.034 X(69), X(1530) 0.950 1800 0.900 0.025 0.925 0.020 X(79), X(1721) 0.950 2000 0.900 0.025 0.925 0.011 X(89), X(1912) 0.950 2200 0.900 0.025 0.925 0.006 X(98), X(2102) This report shows the impact on α' of various sample sizes. Since the values of the Tolerance Factor indices are not related to α' or δ, this report allows you to calculate appropriate indices for use with sample data. 831-7

Plots Section This plot shows the sample size versus α'. 831-8

Example 3 Validation using Guenther (1977) Guenther (1977) page 161 gives an example in which P = 0.8, 1 α= 0.9, P + δ = 0.95, and α = 0.05. He obtains a sample size of 38 with lower limit X(2) and upper limit X(36). Setup This section presents the values of each of the parameters needed to run this example. First load the Tolerance Intervals for Any Data (Nonparametric) procedure window. You may then make the appropriate entries as listed below, or open Example 3 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... Sample Size Method... Binomial Enumeration Max N for Binomial Enumeration... 100000 Interval Type... Two-Sided Tolerance Interval Proportion Covered (P)... 0.8 Confidence Level (1 α)... 0.9 Coverage Proportion Exceedance (δ)... 0.15 α' = Pr(p P + δ)... 0.05 Output Click the Calculate button to perform the calculations and generate the following output. Numeric Results Numeric Results for Nonparametric Two-Sided Tolerance Interval Proportion Covered Confidence Sample Proportion Exceedance Two-Sided Level Size Covered Margin Pr(p P + δ) Tolerance 1 - α N P δ P + δ α' Interval 0.900 38 0.800 0.150 0.950 0.050 X(2), X(36) PASS also calculates a sample size of 38. The values of X(i) and X(j) match as well. 831-9