Journal of Statistical Software

Similar documents
Statistics for Economics & Business

Estimating Proportions with Confidence

CAPITAL PROJECT SCREENING AND SELECTION

1 Random Variables and Key Statistics

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

Topic-7. Large Sample Estimation

Calculation of the Annual Equivalent Rate (AER)

We learned: $100 cash today is preferred over $100 a year from now

Chapter 5: Sequences and Series

Sampling Distributions and Estimation

Lecture 4: Probability (continued)

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

An Empirical Study of the Behaviour of the Sample Kurtosis in Samples from Symmetric Stable Distributions

The Time Value of Money in Financial Management

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

SUPPLEMENTAL MATERIAL

Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

Limits of sequences. Contents 1. Introduction 2 2. Some notation for sequences The behaviour of infinite sequences 3

ISBN Copyright 2015 The Continental Press, Inc.

A New Constructive Proof of Graham's Theorem and More New Classes of Functionally Complete Functions

When you click on Unit V in your course, you will see a TO DO LIST to assist you in starting your course.

CHAPTER 8 Estimating with Confidence

1 Estimating sensitivities

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

Non-Inferiority Logrank Tests

Research Article The Probability That a Measurement Falls within a Range of n Standard Deviations from an Estimate of the Mean

ECON 5350 Class Notes Maximum Likelihood Estimation

Chapter 10 - Lecture 2 The independent two sample t-test and. confidence interval

EXERCISE - BINOMIAL THEOREM

Section 3.3 Exercises Part A Simplify the following. 1. (3m 2 ) 5 2. x 7 x 11

5 Statistical Inference

Random Sequences Using the Divisor Pairs Function

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions


18.S096 Problem Set 5 Fall 2013 Volatility Modeling Due Date: 10/29/2013

MA Lesson 11 Section 1.3. Solving Applied Problems with Linear Equations of one Variable

STRAND: FINANCE. Unit 3 Loans and Mortgages TEXT. Contents. Section. 3.1 Annual Percentage Rate (APR) 3.2 APR for Repayment of Loans

B = A x z

CHAPTER 2 PRICING OF BONDS

Math 124: Lecture for Week 10 of 17

Economic Computation and Economic Cybernetics Studies and Research, Issue 2/2016, Vol. 50

0.07. i PV Qa Q Q i n. Chapter 3, Section 2

5. Best Unbiased Estimators

Anomaly Correction by Optimal Trading Frequency

DESCRIPTION OF MATHEMATICAL MODELS USED IN RATING ACTIVITIES

Models of Asset Pricing

Maximum Empirical Likelihood Estimation (MELE)

11.7 (TAYLOR SERIES) NAME: SOLUTIONS 31 July 2018

LESSON #66 - SEQUENCES COMMON CORE ALGEBRA II

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Solutions to Problem Sheet 1

1 The Power of Compounding

. (The calculated sample mean is symbolized by x.)

Models of Asset Pricing

Using Math to Understand Our World Project 5 Building Up Savings And Debt

Basic formula for confidence intervals. Formulas for estimating population variance Normal Uniform Proportion

Hopscotch and Explicit difference method for solving Black-Scholes PDE

Models of Asset Pricing

NOTES ON ESTIMATION AND CONFIDENCE INTERVALS. 1. Estimation

Forecasting bad debt losses using clustering algorithms and Markov chains

Chapter Four Learning Objectives Valuing Monetary Payments Now and in the Future

Rafa l Kulik and Marc Raimondo. University of Ottawa and University of Sydney. Supplementary material

BASIC STATISTICS ECOE 1323

The material in this chapter is motivated by Experiment 9.

FOUNDATION ACTED COURSE (FAC)

Chapter Four 1/15/2018. Learning Objectives. The Meaning of Interest Rates Future Value, Present Value, and Interest Rates Chapter 4, Part 1.

Department of Mathematics, S.R.K.R. Engineering College, Bhimavaram, A.P., India 2

ST 305: Exam 2 Fall 2014

Bayes Estimator for Coefficient of Variation and Inverse Coefficient of Variation for the Normal Distribution

Outline. Plotting discrete-time signals. Sampling Process. Discrete-Time Signal Representations Important D-T Signals Digital Signals

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

A point estimate is the value of a statistic that estimates the value of a parameter.

Chapter 8: Estimation of Mean & Proportion. Introduction

APPLICATION OF GEOMETRIC SEQUENCES AND SERIES: COMPOUND INTEREST AND ANNUITIES

CAPITAL ASSET PRICING MODEL

1. Suppose X is a variable that follows the normal distribution with known standard deviation σ = 0.3 but unknown mean µ.

Course FM Practice Exam 1 Solutions

Chpt 5. Discrete Probability Distributions. 5-3 Mean, Variance, Standard Deviation, and Expectation

Confidence Intervals Introduction

AY Term 2 Mock Examination

Appendix 1 to Chapter 5

INTERVAL GAMES. and player 2 selects 1, then player 2 would give player 1 a payoff of, 1) = 0.

Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Faculdade de Economia da Universidade de Coimbra

Overlapping Generations

between 1 and 100. The teacher expected this task to take Guass several minutes to an hour to keep him busy but

Class Sessions 2, 3, and 4: The Time Value of Money

Problem Set 1a - Oligopoly

Chapter 11 Appendices: Review of Topics from Foundations in Finance and Tables

Monopoly vs. Competition in Light of Extraction Norms. Abstract

ON THE RATE OF CONVERGENCE

Subject CT1 Financial Mathematics Core Technical Syllabus

of Asset Pricing R e = expected return

Subject CT5 Contingencies Core Technical. Syllabus. for the 2011 Examinations. The Faculty of Actuaries and Institute of Actuaries.

Reinforcement Learning

Kernel Density Estimation. Let X be a random variable with continuous distribution F (x) and density f(x) = d

NPTEL DEPARTMENT OF INDUSTRIAL AND MANAGEMENT ENGINEERING IIT KANPUR QUANTITATIVE FINANCE END-TERM EXAMINATION (2015 JULY-AUG ONLINE COURSE)

Transcription:

JSS Joural of Statistical Software Jue 2007, Volume 19, Issue 6. http://www.jstatsoft.org/ Ratioal Arithmetic Mathematica Fuctios to Evaluate the Oe-sided Oe-sample K-S Cumulative Samplig Distributio J. Radall Brow Ket State Uiversity Milto E. Harvey Ket State Uiversity Abstract Oe of the most widely used goodess-of-fit tests is the Kolmogorov-Smirov (K- S) family of tests which have bee implemeted by may computer statistical software packages. To calculate a p value (evaluate the cumulative samplig distributio), these packages use various methods icludig recursio formulae, limitig distributios, ad approximatios of ukow accuracy developed over thirty years ago. Based o a extesive literature search for the oe-sided oe-sample K-S test, this paper idetifies two direct formulae ad five recursio formulae that ca be used to calculate a p value ad the develops two additioal direct formulae ad four iterative versios of the direct formulae for a total of thirtee formulae. To esure accurate calculatio by avoidig catastrophic cacelatio ad elimiatig roudig error, each formula is implemeted i ratioal arithmetic. Liear search is used to calculate the iverse of the cumulative samplig distributio (fid the cofidece iterval badwidth). Extesive tables of badwidths are preseted for sample sizes up to 2, 000. The results cofirm the hypothesis that as the umber of digits i the umerator ad deomiator itegers of the ratioal umber test statistic icreases, the computatio time also icreases. I comparig the computatioal times of the thirtee formulae, the direct formulae are slightly faster tha their iterative versios ad much faster tha all the recursio formulae. Computatioal times for the fastest formula are give for sample sizes up to fifty thousad. Keywords: K-S samplig distributios, K-S oe-sided oe-sample probabilities, K-S cofidece bads, ratioal arithmetic. 1. Itroductio The Kolmogorov-Smirov (K-S) family of tests is oe of the most widely used goodess-offit tests ad is icluded i may oparametric statistics texts (see recet texts Gibbos

2 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio ad Chakraborti (2003), Spret ad Smeeto (2001), Coover (1999), Daiel (1990)). These iclude the oe-sided oe-sample, two-sided oe-sample, oe-sided two-sample, ad the twosided two-sample tests. The K-S family of tests also iclude restricted rage tests (comparig distributios over a portio of their rage) ad ratio tests (the ratio of oe distributio to aother). For sample size, the most commo K-S test is the two-sided oe-sample test which uses the maximum absolute distace D betwee the hypothesized cotiuous cumulative distributio F (x) ad the empirical cumulative distributio F (x), D = sup <x< F (x) F (x), as the radom variable. I a hypothesis testig applicatio, computig the test statistic d is relatively easier tha evaluatig the cumulative samplig distributio to determie the p value, P [D d]. The cumulative samplig distributio is a piecewise polyomial that is differet for each sample size ad whose complexity rapidly grows with icreasig so that it has ot eve bee geerated let aloe used for > 31 (see Rube ad Gambio (1982) ad Drew, Gle, ad Leemis (2000)). Cosequetly, the limitig distributio, various recursio formulae, ad various approximatios have bee used to evaluate the cumulative samplig distributio. I additio, may computer statistical software packages such as SPSS, STATISTICA, R, Numerical Recipes, ad IMSL iclude K-S tests. Although a recursio formula will theoretically determie the p value P [D d] for a particular value d of the test statistic, the complexity of the formula is such that roudoff error ad catastrophic cacelatio ca greatly reduce the accuracy of the calculatios. Sice most procedures used today were developed o pre-1978 computers where oly machie precisio was available, the accuracy of their results is ot kow exactly. Cosequetly, recursio formulae have oly bee used to geerate tables for sample sizes of 40 ad various approximatios of ukow accuracy have bee used for > 40. Usig recursio formula ad ratioal arithmetic, Brow ad Harvey (2005) were able to compute p values for sample sizes up to two thousad, = 2, 000. I additio to the two-sided oe-sample case (absolute differece betwee hypothesized ad empirical), the oe-sided oe-sample (differece betwee hypothesized ad empirical) cumulative samplig distributio is a complex series that ca also be evaluated by recursio formulae. Ideed, may computer statistical software packages that implemet both the two-sided ad oe-sided oe-sample K-S test use differet methods to calculate the p values. Table 1 summarizes the strategies used by some of these commercial statistical packages to calculate two-sided ad oe-sided oe-sample p values. Although these packages compute the K-S test statistic i the same way, there is cosiderable differece i the way they evaluate the cumulative samplig distributio (calculate p values). The Numerical Recipes statistical subrouties i Press, Teukolsky, Vetterlig, ad Flaery (1992) use a approximatio by Stephes (1970) to geerate p values for the two-sided oe-sample K-S cumulative samplig distributio. SPSS 15.0 (SPSS Ic. 2006) does ot state how the p value for the two-sided oe-sample K-S test is calculated. However, i 2002, the maual for SPSS 11.0 stated that the statistical software package used a modificatio of the limitig distributio derived by Feller (1948) ad used by Smirov (1948). Assumig SPSS has ot chaged how the p value is calculated, the 2002 method is listed i Table 1. The STATISTICA (StatSoft, Ic. 2006) software package uses the critical values tabulated by Massey (1950) ad Massey (1951) to geerate their p values. IMSL (Visual Numerics 2006) computes both the oe-sided ad twosided K-S test statistics ad the gives p values for each test. Specifically, for the oe-sided K-S test ad sample sizes 80, IMSL uses a recursio formula i Coover (1972) to compute the exact p values for the oe-sided samplig distributio, but for large sample sizes

Joural of Statistical Software 3 Statistical Type of Used To Software Oe-sample Calculate Package K-S Test p values IMSL Oe-sided For 80, recursio formula by Coover (1972). Two-sided For > 80, limitig distributio derived by Feller (1948). Double the correspodig oe-sided p value. Numerical Two-sided Approximatio by Stephes (1970). Recipes R Oe-sided For 100, direct formula by Smirov (1944). For > 100, limitig distributio derived by Feller (1948). Two-sided For 100, program by Marsaglia et al. (2003). For > 100, limitig distributio derived by Kolmogorov (1933). SPSS Two-sided Modificatio of limitig distributio derived by Feller (1948). STATISTICA Two-sided Critical values tabulated by Massey (1950) ad Massey (1951). Table 1: Statistical software packages ad the oe-sample K-S test. > 80, it uses the oe-sided limitig distributio. IMSL the doubles the oe-sided p values to get the correspodig two-sided p values. Like IMSL, the R statistical software package computes both the oe-sided ad two-sided K-S test statistics but uses differet methods to compute the p values. For large sample sizes > 100, R Developmet Core Team (2006) states that the asymptotic distributios are used (presumably the formula by Feller (1948) for the oe-sided K-S test ad the formula by Kolmogorov (1933) for the two-sided case). For small sample sizes 100, the oe-sided K-S test uses the direct formula by Smirov (1944) to calculate the p value ad the two-sided K-S test uses the matrix formula of Durbi (1973) as implemeted by Marsaglia, Tsag, ad Wag (2003). I 2002, the R statistical software package istead computed the oe-sided p values usig the techiques i Coover (1972) ad doubled the oe-sided p value to get the two-sided p value. It is ot clear i 2002 whether R also used the oe-sided limitig distributio for large sample sizes. I additio to the oe-sample K-S cumulative samplig distributios, there are samplig distributios for may other K-S type statistics icludig two-sample (differece betwee two empirical distributios), restricted rage (two distributios compared over a portio of their rage), ad ratios (the ratio of oe distributio to aother). For the two-sample K-S case with sample sizes m ad, the oe-sided cumulative samplig distributio is a simple formula for m = ad a complex series for m. The two-sided two-sample K-S cumulative samplig distributio is a complex series for m = while a recursio formula is eeded for m. There are may K-S oe-sided oe-sample restricted rage ad ratio cumulative samplig distributios whose formulae are kow but are very complex expressios. Give the advaces i computig power ad computatioal software i the past thirty years, it is time to recosider the etire area ad if possible, devise techiques to accurately ad

4 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio quickly evaluate K-S cumulative samplig distributios. Sice the etire K-S cumulative samplig distributio area is so large, the questio is where we should begi a comprehesive evaluatio of the alterate formulae for the various K-S cumulative samplig distributios. Because of the large umber ad complexity of the formulae, the oe-sided oe-sample K-S restricted rage ad ratio areas are ot good startig poits. Similarly, sice two-sample sizes require more computatioal work tha oe-sample size, the two-sample area should be doe after the oe-sample area. The oe-sample area cotais two tests, the two-sided oesample K-S test ad the oe-sided oe-sample K-S test. Sice Brow ad Harvey (2005) have already ivestigated the two-sided oe-sample case, this paper will ivestigate the oe-sided oe-sample case. This paper reviews the oe-sided oe-sample K-S cumulative samplig distributio formulae, devises ratioal arithmetic implemetatios of each formula, verifies the validity of each implemetatio by determiig if each implemetatio gets exactly the same p value over a broad rage of examples, develops a efficiet method to calculate the badwidth (the iverse of the cumulative samplig distributio), ad fially compares the computatioal times eeded for each implemetatio to determie the fastest formula. 2. Oe-sided oe-sample K-S samplig distributio formulae There are two oe-sided oe-sample radom variables: the oe-sided upper radom variable D + = sup <x< {F (x) F (x)} ad the oe-sided lower radom variable D = sup <x< {F (x) F (x)}. Sice by symmetry D + ad D have the same cumulative samplig distributio, D + is used to represet both cases. Based o a extesive literature search for the oe-sided oe-sample K-S test, this paper idetifies two direct formulae ad five recursio formulae that ca be used to calculate a p value, P [D + d + ], ad the develops two additioal direct formulae ad four iterative versios of the direct formulae for a total of thirtee formulae. Table 2 cotais a summary of the thirtee formulae which are developed i this sectio. 2.1. Direct formulae A closed form expressio of the oe-sided oe-sample K-S cumulative samplig distributio was developed by Smirov (1944) ad verified by may scholars icludig Feller (1948) ad Birbaum ad Tigey (1951). For 0 < d + 1 ad sample size, Smirov s formula deoted by SmirovD i this paper is show i the first row of Table 3 where (1 d + ) is the greatest iteger less tha or equal to (1 d + ). Dwass (1959) derived a differet formula deoted by DwassD that is also show i Table 3. Both the SmirovD ad DwassD formulae are also derived i Durbi (1973). A secod form of the Smirov distributio deoted by SmirovAltD ad a secod form of the Dwass distributio deoted by DwassAltD are derived by factorig 1/ 1 out of their respective formulae. The alterate forms, SmirovAltD ad DwassAltD, show i Table 3 may be faster tha the origial formulatios because the terms iside the summatio are simpler ratioal umbers. I most applicatios, the test statistic d + is less tha 0.5 ad usually much less tha 0.5 which meas the umber of terms (1 d + ) + 1 i the SmirovD ad SmirovAltD formulae ca be close to the sample size. I compariso, the umber of terms d + 1 i the DwassD ad DwassAltD formulae is much less tha the umber i the SmirovD ad SmirovAltD

Joural of Statistical Software 5 Type Formula Referece Formula Name Smirov (1944) Direct SmirovD Direct SmirovAltD Iterative SmirovI Iterative SmirovAltI Dwass (1959) Direct DwassD Direct DwassAltD Iterative DwassI Iterative DwassAltI Daiels (1945) Recursio Daiels Noe ad Vadewiele (1968) Recursio Noe Steck (1969) Recursio Steck Coover (1972) Recursio Coover Kotelikov ad Chmaladze (1983) Recursio Bolshev Table 2: Thirtee formulae to calculate a K-S oe-sided oe-sample p value formulae ad should therefore take less computatio time. O the other had, all the terms i the SmirovD ad SmirovAltD formulae are positive while the terms i the DwassD ad DwassAltD formulae alterate sigs. This meas that the Dwass formulae are much more susceptible to error tha the Smirov formulae. Note that implemetig the formulae i ratioal arithmetic removes the issue of computatioal error as all computatios are exact. This will be discussed i detail i Sectio 3.1. 2.2. Iterative formulae Each of the four formulae i Table 3 ca be trasformed ito a iterative formula which might be faster tha the origial formula. The Smirov formula will be used to illustrate the process. Let γ j be the value of the jth term i the series ad let x j be the iterative factor that coverts γ j 1 to γ j so that γ j = x j γ j 1. The followig derives x j for the SmirovD ad SmirovAltD formula. Note that the x j must be the same for the SmirovD ad SmiroAltD formulae sice the SmirovAltD formula is simply the SmirovD formula with 1/ 1 factored out. ( ) ( γ j = d + 1 d + j ) j ( d + + j ) j 1 j γ j 1 = ( ) ( d + 1 d + j 1 ) j+1 ( d + + j 1 j 1 x j = ( ) ( γ d + 1 d + j j j = ( ) γ ( j 1 d + 1 d + j 1 ) j+1 ( j 1 ) j ( d + + j ) j 2 ) j 1 d + + j 1 ) j 2

6 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio Type Formula to Compute P [D + d + ] for 0 < d + 1 (1 d+ ) SmirovD d + j=0 ( ) ( 1 j ) j ( ) j j 1 j d+ + d+ DwassD SmirovAltD d+ 1 d + d + 1 j=0 (1 d + ) j=0 ( ) ( 1 j ) j 1 ( ) j j j + d+ d+ ( ) ( d + j ) j ( d + + j ) j 1 j DwassAltD 1 d+ 1 d + j=0 ( ) ( j + d + ) j 1 ( j d + ) j j (1 d + ) is the greatest iteger less tha or equal to (1 d + ) Table 3: K-S oe-sided oe-sample direct formulae. = ( j + 1) (d+ + j) j ( d + j) [ 1 ] 1 j+1 [ d + 1 + j + 1 ] 1 j 2 d + + j 1 For the DwassD ad DwassAltD formulae, let y j be the iterative factor that coverts γ j 1 to γ j so that γ j = y j γ j 1. The followig derives y j for the DwassD ad DwassAltD formulae. γ j = ( ) ( d + 1 j ) j 1 ( ) j j j + d+ d+ γ j 1 = ( ) ( d + 1 j 1 ) j ( ) j 1 j 1 j 1 + d+ d+ x j = ( ) ( γ d + 1 j ) j 1 ( ) j j j j + d+ d+ = ( ) γ ( j 1 d + 1 j 1 j 1 + d+ = ( j + 1) (j [ d+ ) j ( j + 1 + d + ) 1 1 j + 1 + d + ) j ( ) j 1 j 1 d+ ] j 1 [ 1 + ] 1 j 1 j 1 d + Usig the same process as above, the iterative formula for SmirovD deoted by SmirovI ca be derived. Similarly, the iterative formulae for SmirovAltD, DwassD, ad DwassAltD ca be derived ad are deoted by SmirovAltI, DwassI, ad DwassAltI respectively. The results are show i Table 4.

Joural of Statistical Software 7 Type Smirov x j = ( j + 1) (d+ + j) j ( d + j) [ 1 Dwass y j = ( j + 1)(j [ d+ ) j( j + 1 + d + ) 1 Iterative Formula ] 1 j+1 [ d + 1 + j + 1 ] 1 j 1 [ j + 1 + d + 1 + ] 1 j 2 d + + j 1 ] 1 j 1 j 1 d + Name Iitial Value Iteratio P [ D + d +] (1 d+ ) SmirovI γ 0 = (1 d + ) γ j = x j γ j 1 (1 d+ ) SmirovAltI γ 0 = (1 d + ) γ j = x j γ j 1 d + DwassI γ 0 = d + (1 + d + ) 1 γ j = y j γ j 1 1 DwassAltI γ 0 = d + ( + d + ) 1 γ j = y j γ j 1 1 (1 d + ) is the greatest iteger less tha or equal to (1 d + ) j=0 j=0 γ j γ j γ j j=0 d + γ j j=0 / 1 Table 4: K-S oe-sided oe-sample iterative formulae. I additio to the four direct formulae ad four iterative formulae, five recursio formulae to compute the oe-sided oe-sample K-S p value have bee derived ad are preseted i chroological order i the ext five subsectios. 2.3. Daiels recursio formula Daiels (1945) derived a differece equatio that was later restated by Noe ad Vadewiele (1968). The form of Daiels recursio formula (referred to heceforth as Daiels) show below is derived by solvig the differece equatio for Q i (1). The recursio formulae use the test statistic d + = t/ or t = d +. P Q 0 (1) = 1 i 1 Q i (1) = ( D + t ) k=0 = 1 Q (1) ( ) [ ( ) i i t i k Q k (1) max k, 0 1] for i = 1, 2,,

8 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio 2.4. Noe ad Vadewiele recursio formula Sice the Daiels recursio formula has both positive ad egative terms, Noe ad Vadewiele (1968) derived a alterate recursio formula that has oly o-egative terms. Noe (1972) later added a correctio to this recursio formula. The particular form of the recursio formula (referred to heceforth as Noe) listed below cotaiig Noe s correctio is take from Shorack ad Weller (1986), page 363, formulas (24) through (28). P Q 0 (0) = 1 Q m (m) = 0 for 1 m + 1 ( ) i [ ( ) i m t Q i (m) = Q k (m 1) max k, 0 max ( D + t ) k=0 = 1 Q ( + 1) ( m t 1 for 0 i m 1, 1 m + 1 )] i k, 0 2.5. Steck recursio formula Steck (1969) derived the recursio formula (referred to heceforth as Steck) show below that was later listed i Shorack ad Weller (1986). P ( D + t ) b j = ( ) j 1 + t mi, 1 P 0 = 1 P 1 = b 1 P i = b i i i 2 m=0 = 1 P for j = 1, 2,, ( ) i [b i b m+1 ] i m P m for i = 2, 3,, m 2.6. Coover recursio formula Coover (1972) derived a recursio formula (referred to heceforth as Coover) that simplifies to the followig for a hypothesized cotiuous cumulative distributio F (x). P ( D + t ) e 0 = 1 k 1 e k = 1 = t j=0 j=0 ( j ( ) ( k 1 j j t ) k j e j for k = 1, 2,, t ) ( 1 j t ) j e j Although ot stated explicitly, this appears to be the recursio formula used by the IMSL ad R statistical software packages.

Joural of Statistical Software 9 2.7. Bolshev recursio formula Kotelikov ad Chmaladze (1983) used the recursio formula (referred to heceforth as Bolshev) show below that was later called the Bolshev recursio i Shorack ad Weller (1986). P ( D + t ) b j = ( j 1 + t mi P 0 = 1 P i = i 1 m=1 = 1 P ), 1 for j = 1, 2,, ( ) i [1 b i m+1 ] m P i m for i = 1, 2,, m 3. Computatioal ad research issues The thirtee formulae preseted i the last sectio are complex. Cosequetly, implemetig them raises certai computatioal issues that eed to be studied. The followig are the three major computatioal questios that eed to be resolved before the thirtee formulae are implemeted. 1. What type of computatioal arithmetic should be used? 2. What arithmetic form should be used to iput the test statistic d + to the thirtee formulae? 3. What is the most efficiet way to calculate the biomial coefficiets? These questios will be cosidered ad aswered i the order show above. After these implemetatio questios have bee resolved, the followig four research questios will be aswered. 1. What is the best way to calculate the badwidth (the iverse of the cumulative samplig distributio)? The aswer to this questio will be used to compute ad preset detailed badwidth tables. 2. What is the relatioship betwee computatio time ad sample size? 3. What is the fastest formulae? 4. What is the relatioship betwee the accuracy of the test statistic d + ad the computatio time? 3.1. Computatioal optios Usig curret computatioal software, the formulae ca be implemeted usig either ratioal arithmetic, arbitrary precisio arithmetic, or machie precisio arithmetic. Ratioal arithmetic stores every umber as a ratio of two itegers (a ratioal umber) where each iteger

10 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio ca have as may decimal digits as eeded to express the umber exactly. Although the speed of ratioal arithmetic declies as the umber of digits i the umerator/deomiator itegers icrease, it has the advatage of o error as log as o irratioal umbers are used. Coversely, machie precisio arithmetic specifies the umber of decimal digits (usually less tha twety ad determied by the computer hardware) to use i computatios so it is subject to roudoff error ad catastrophic cacelatio. Catastrophic cacelatio occurs whe oe umber is subtracted from aother umber of about the same value. For example, if 123.345689 is subtracted from 123.3456799 both with ie decimal digits of precisio, the the result is 0.000010 with two decimal digits of precisio. Although machie precisio is fast, it is possible to sigificatly degrade the accuracy ad eve worse, ot be aware that the accuracy has bee reduced. Arbitrary precisio arithmetic is like machie precisio except that the umber of decimal digits of precisio is ot depedet o the computer hardware ad the user ca specify the umber of decimal digits of precisio. Although arbitrary precisio is slower tha machie precisio, it is faster tha ratioal arithmetic. I additio, the software system Mathematica, Wolfram (2003), keeps track of the resultig precisio rp so that for the example above, Mathematica would also kow that the result 0.000010 had a precisio of rp = 2. The trick i usig arbitrary precisio arithmetic is specifyig the precisio to be used i iteral calculatios (iteral precisio ip) so that the fial aswer has a specified desired precisio dp or greater. I other words, the user must specify both ip ad dp so that the fial aswer has rp dp. Sice all the K-S formulae ca be modified so that o irratioal umbers are used, this paper will use ratioal arithmetic implemetatios as they produce exact p values with o error. Future research will develop machie precisio ad arbitrary precisio implemetatios whose accuracy will the be verified by the ratioal arithmetic implemetatios i this paper. I terms of accuracy, ratioal arithmetic gives the exact probability (o error) as log as the test statistic d ca be expressed exactly as a ratioal umber; it caot be a irratioal umber like π/100. The oly way d + ca be a irratioal umber is if the hypothesized distributio F (x) for some x is a irratioal umber because by defiitio the empirical distributio F (x) is a ratioal umber (i/ for i = 0, 1, 2,..., ). I such cases d ca be approximated arbitrarily closely by ratioal umbers above ad below d +. These are the used to calculate the p value P [D + d + ] to ay desired accuracy. Thus, ratioal arithmetic either provides the exact p value if d + is a ratioal umber or ca get as close as the user desires if d + is a irratioal umber. 3.2. Test statistic complexity Usig ratioal arithmetic, the sample size ad the test statistic d + ca produce a p value with may digits i the umerator ad deomiator itegers. For example, whe = 200 ad d + = 13/200; d + has two digits i the umerator ad three digits i the deomiator ([2/3] umerator/deomiator digits) while the correspodig p value P [D + d + ] has [456/459] umerator/deomiator digits. To show how the umber of umerator/deomiator digits ca grow, cosider aother example with = 2, 000 ad d + = 83/2000 where d + has [2/4] umerator/deomiator digits while the correspodig p value P [D + d + ] has [6599/6602] umerator/deomiator digits. The large umber of umerator/deomiator digits for the p values P [D + d + ] suggest that computatioal time might vary with the umber of umerator/deomiator digits i test statistic d +. This hypothesis is tested i Sectio 8.

Joural of Statistical Software 11 A ratioal umber implemetatio of each of the thirtee formula has two iputs, the sample size that is by defiitio a ratioal umber (a iteger) ad the test statistic d + that is a umber betwee zero ad oe, 0 d + 1. If d + is either zero, oe, or a irratioal umber, the d + ca be expressed as either a ratioal umber or a arbitrary precisio umber. For example, with = 100 ad d + = 0.183683, d + ca be used i the program as the ratioal umber d + = 183683/1000000 or as the arbitrary precisio umber d + = 0.183683 while all the rest of the computatios i the Mathematica program are i ratioal arithmetic. Sice Mathematica treats ratioal umbers ad arbitrary precisio umbers differetly, the same Mathematica program that implemets the SmirovD formula i ratioal arithmetic will yield differet probabilities depedig o whether d + is used as ratioal umber or a arbitrary precisio umber i the program. Whe used as a ratioal umber i the Mathematica program, d + = 183683/1000000 produces a ratioal umber probability with 597 digits i the umerator ad 600 digits i the deomiator ([597/600] umerator/deomiator digits) that whe [ coverted to a arbitrary precisio umber with 20 decimal digits of accuracy yields P D 100 + d+] = 0.0010000109813850096033. However, whe used as a arbitrary precisio [ umber i the Mathematica program, d + = 0.183683 yields a differet p value, P D 100 + d+] = 29398345/29398022169 = 0.0010000109813850110101, tha that produced by the ratioal umber d +. Sice the correct probability is the oe produced by the ratioal umber iput, the iput d + is always coverted to a ratioal umber before it is used i ay Mathematica program. 3.3. Calculatig a series of biomial coefficiets Sice all thirtee formulae use the biomial coefficiet i almost every term, a importat ( ) k cosideratio is how to calculate them. Although the otatio varies i each formulae, let j represet the biomial coefficiet. The four iterative formulae iclude the biomial coefficiet i the iterative formulae for x j ad y j so that the biomial coefficiet is automatically icluded ad eed ot be calculated separately. However, every oe of the direct formulae ad the recursio formulae use the biomial coefficiet ( ) i each term of the summatios. I each of k these summatios, the biomial coefficiet for each succeedig term chages by addig j oe to j so that each formula uses a series of biomial coefficiets. The efficiecy of the method for calculatig the series of biomial coefficiets as ratioal umbers will effect the speed of the implemetatio for each formulae. The two followig competig methods ca be used to compute every biomial coefficiet i the series as a ratioal umber. Ratioal Number Biomial Fuctio (RNBF) Use the Mathematica Biomial fuctio to geerate each coefficiet as a ratioal umber. Ratioal Number ( ) Iterative ( Calculatio ) (RNIC) Use ratioal arithmetic to iteratively k k compute from by multiplyig it by k j + 1. j j 1 j Brow ad Harvey (2006) compared the RNBF method versus the RNIC method ad for a small umber of terms t foud little differece i the computatio times. However, for a large umber of terms, the RNIC method is faster tha the RNBF method ad the differece

12 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio Mathematica Listed Formula Type Fuctio I Name Formula Name Sectio SmirovD Direct SmirovDKS1SidedRTRatioal 1 DwassD Direct DwassDKS1SidedRTRatioal 2 SmirovAltD Direct SmirovAltDKS1SidedRTRatioal 3 DwassAltD Direct DwassAltDKS1SidedRTRatioal 4 SmirovI Iterative SmirovIKS1SidedRTRatioal 5 DwassI Iterative DwassIKS1SidedRTRatioal 6 SmirovAltI Iterative SmirovAltIKS1SidedRTRatioal 7 DwassAltI Iterative DwassAltIKS1SidedRTRatioal 8 Daiels Recursio DaielsKS1SidedRTProbRatioal 9 Noe Recursio NoeKS1SidedRTProbRatioal 10 Steck Recursio SteckKS1SidedRTProbRatioal 11 Coover Recursio CooverKS1SidedRTProbRatioal 12 Bolshev Recursio BolshevKS1SidedRTProbRatioal 13 Table 5: Mathematica fuctio ame for the thirtee formulae listed i file KS1SidedOeSampleRatioal.b i time icreases as the sample size icreases. This implies that as the umber of terms grow, a RNBF implemetatio will evetually exceed the time eeded by the correspodig RNIC implemetatio. I additio, RNBF uses the Mathematica Biomial fuctio so ay code usig the RNBF method must be implemeted i Mathematica while the RNIC method ca be implemeted usig ay ratioal arithmetic software. Thus, RNIC is more portable tha RNBF ad is aother reaso for adoptig RNIC. As a result, this paper will use the RNIC method exclusively to calculate the biomial coefficiets for the thirtee formulae. 4. Implemetatios of the thirtee formulae All the Mathematica code geerated for this paper is cotaied i oe Mathematica file amed KS1SidedOeSampleRatioal.b which is divided ito 23 sectios. Each sectio cotais oe program ad sample output. Table 5 cotais a list of all the formulae, their type, the Mathematica fuctio ame implemetig them, ad the sectio umber i file KS1SidedOeSampleRatioal.b that cotais the Mathematica code ad sample output. 5. Calculatig the oe-sided badwidth d + (, α, ρ) I additio to calculatig the p value for hypothesis testig, the oe-sided oe-sample K- S cumulative samplig distributio ca be used to costruct a oe-sided cofidece bad aroud the empirical distributio F (x). The badwidth of a oe-sided cofidece bad with cofidece coefficiet 1 α ad sample size is the value of the test statistic d + that satisfies

Joural of Statistical Software 13 P (D + d + ) = α. Determiig a badwidth d + for a particular sample size ad cofidece coefficiet 1 α meas evaluatig the iverse of the cumulative samplig distributio which ca oly be doe by search techiques such as biary search. Ulike the p value, a badwidth d + caot i practice be determied exactly because the search techique may ot coverge to the exact value. For example, biary search with startig values of 0 ad 1 would ever fid d + = 1/3 ad would iterate forever. Thus, search techiques are desiged to stop whe a specified accuracy is reached. Let d + (, α, ρ) represet the badwidth rouded to ρ sigificat digits for sample size ad cofidece coefficiet 1 α. Note that badwidth d + (, α, ρ) is also the hypothesis testig critical value for a α level of sigificace. Fidig the badwidth d + (, α, ρ) is a three step process: (1) use a approximatio to fid a iitial value close to d + (, α, ρ), (2) use the iitial value to fid upper ad lower bouds o d + (, α, ρ), ad (3) use a search procedure to determie d + (, α, ρ) betwee the lower ad upper bouds. The first step will use the approximatio of Maag ad Dicaire (1971) to fid the iitial value ( [6d + + 1] 2 ) by solvig α exp for d + yieldig d + l(α) 18 2 1. Sice the iitial 6 value foud by the approximatio i the first step should be fairly close to the actual value, the secod step gradually icreases the distace away from the iitial value util a lower ad upper boud o the actual value is foud. Although there are may search techiques that ca be used i the third step to determie the badwidth, this paper will cosider the two most commo techiques: biary search ad liear search. Note that biary search takes the midpoit betwee the upper ad lower bouds as the ext value to test while liear search uses liear iterpolatio to fid the ext value. Prelimiary computatioal experiece showed that liear search was always faster tha biary search so the followig liear search algorithm is used to fid the badwidth d + (, α, ρ). Liear search algorithm for calculatig the badwidth d + (, α, ρ) Step 1 (Fid Iitial Value): Calculate d + = l(α) 2 1 6 to ρ digits of precisio. If d + > 1, set d + = 1. If d + < 0, set d + = 0. Go to Step 2. Step 2 (Determie If Iitial Value Is Lower Or Upper Boud): Calculate p = P [D + d + ]. If p > α, the d + is a lower boud, set d + L = d+ set p L = p, ad go to Step 3. Otherwise, d + is a upper boud, set d + U = d+, ad go to Step 6. Step 3 (Determie a Upper Boud): Covert d + L to a umerator iteger dumerator+ L ad a deomiator iteger ddeomiator L + where ddeomiator+ L is a power of te ad d + L = dumerator+ L /ddeomiator+ L. If the umber of digits of precisio does ot exceed four, ρ 4, set icremet ic = 1. Otherwise ρ > 4 ad set the icremet ic = 10 ρ 5. Go to Step 4. Step 4 (Costruct ad Test a Possible Upper Boud): Set dtry = dumerator L + + ic, calculate p = P [D + dtry/ddeomiator L + ]. If p > α, the a ew lower boud has bee foud ad go to Step 5. Otherwise, the iitial upper boud has bee foud, set dumerator U + = dtry, set dumerator+ U = dumerator+ L, set p U = p, ad go to Step 9.

14 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio Step 5 (New Lower Boud Foud): Set dumerator L + = dtry, set p L = p, set ic = ic 10, ad go to Step 4. Step 6 (Determie a Lower Boud): Covert d + U to a umerator iteger dumerator+ U ad a deomiator iteger ddeomiator U + where ddeomiator+ U is a power of te ad d + U = dumerator U + /ddeomiator+ U. If the umber of digits of precisio does ot exceed four, ρ 4, set icremet ic = 1. Otherwise ρ > 4 ad set the icremet ic = 10 ρ 5. Go to Step 7. Step 7 (Costruct ad Test a Possible Lower Boud): Set dtry = dumerator U + ic, calculate p = P [D + dtry/ddeomiator U + ]. If p < α, the a ew upper boud has bee foud ad go to Step 8. Otherwise, the iitial lower boud has bee foud, set dumerator L + = dtry, set dumerator+ L = dumerator+ U, set p L = p, ad go to Step 9. Step 8 (New Upper Boud Foud): Set dumerator U + = dtry, set p U = p, set ic = ic 10, ad go to Step 7. Step 9 (Liear Search Iteratio): If dumerator U + dumerator+ L 1, go to Step 12. Set dtry = dumerator L + + (dumerator+ U dumerator+ L ) (p L α)/(p L p U ). If dtry dumerator U +, set dtry = dumerator+ U 1. If dtry dumerator+ L, set dtry = dumerator L + + 1. Calculate p = P [D+ dtry/ddeomiator L + ]. If p > α, the a ew lower boud has bee foud ad go to Step 10. Otherwise, a ew upper boud has bee foud ad go to Step 11. Step 10 (Liear Search New Lower Boud): Set dumerator + L = dtry, p L = p, ad go to Step 9. Step 11 (Liear Search New Upper Boud): Set dumerator + U = dtry, p U = p, ad go to Step 9. Step 12 (Determie Whether to Use Lower or Upper Boud): Calculate p = P [D + (dumerator L + 10 + 5)/(ddeomiator+ L 10)]. If p < α, the use the lower boud by settig d + = dumerator L + /ddeomiator+ L ad go to Step 13. Otherwise, use the upper boud by settig d + = dumerator U + /ddeomiator+ U ad go to Step 13. Step 13 (Badwidth Foud): Termiate the algorithm with the badwidth d +. The liear search algorithm is implemeted usig the direct formulae: SmirovD, SmirovAltD, DwassD, ad DwassAltD. The sectio umber i file KS1SidedOeSampleRatioal.b cotaiig the Mathematica fuctio that implemets the liear search algorithm for each direct formula is listed i Table 6. Tables 7 ad 8 cotai computatioal experiece of the liear search algorithms for each direct formula to fid the badwidths. Sice the DwassAltD liear search algorithm is faster tha the other three direct formula implemetatios, it will be used i the remaider of the paper to fid badwidths.

Joural of Statistical Software 15 Direct I Mathematica File KS1SidedOeSampleRatioal.b Formula Mathematica Fuctio Name Sectio Number DwassD KS1SidedBadwidthByLiearSearchDwassD 15 DwassAltD KS1SidedBadwidthByLiearSearchDwassAltD 14 SmirovD KS1SidedBadwidthByLiearSearchSmirovD 17 SmirovAltD KS1SidedBadwidthByLiearSearchSmirovAltD 16 Table 6: Direct formula implemetatios of the liear search algorithm to fid badwidths The Mathematica fuctio KS1SidedOeSampleBadwidthsToFile cotaied i Sectio 18 of the KS1SidedOeSampleRatioal.b file fids badwidths usig liear search with DwassAltD ad writes these badwidths to a comma delimited file for iput ito Excel ad a text file that ca be used as the iput ito timig programs. The text file cotais badwidths where every digit i a badwidth is output separately so the badwidth ca be recostructed to ay desired accuracy. These text files will be used as iput files to produce the computatioal experiece i Sectios 6, 8, ad 9. Tables 9, 10, ad 11 cotai the badwidths to six digits of precisio (ρ = 6) for α = 0.2, 0.1, 0.05, 0.02, 0.01, 0.001 ad represetative sample sizes from = 2 through = 2, 000. 6. Computatioal experiece comparig all thirtee formulae This sectio compares the computer time eeded to calculate the same p value by all thirtee formulae with the objective of determiig the fastest formula. Usig the program i Sectio 5, badwidths are geerated for sample sizes = 1000, 2500, 5000 ad cofidece coefficiets α = 0.001, 0.01, 0.1, 0.25, 0.5, 0.9. The resultig badwidths with ρ = 20 digits of precisio are put ito data file KS1SidedOeSampleBadwidthsN1000to5000.dat ad Excel file KS1SidedOeSampleBadwidthsN1000to5000.csv. The six cofidece coefficiets for α = 0.001, 0.01, 0.1, 0.25, 0.5, 0.9 were chose so that the etire rage of α s from 0 to 1 had some represetatio but the rage of greatest p value iterest from 0.001 to 0.1 has the most represetatio. To illustrate the results, Table 12 cotais the ρ = 20 badwidths for both α = 0.001 ad α = 0.9. I additio, Table 12 cotais all the ρ = 3 badwidths i both decimal ad ratioal form. 6.1. Direct formulae computatioal experiece I comparig the computatioal times across various sample sizes, the questio is what values of the test statistic d + should be used for compariso. The two alteratives are comparig the computatioal times for a fixed value of the test statistic d + or comparig the computatioal times eeded to produce a specified p value α. The difficulty with comparig the computatio times for a fixed value of the test statistic d + is that the p value will vary with. For example, P [D + 1,000 293/5000] 0.001 while P [D+ 5,000 293/5000] 1.14493 10 15. A more useful

16 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio Time I Secods (Time) ad Number Iteratios (Iters) Sample p value Direct ρ = 3 ρ = 6 ρ = 9 ρ = 12 Size α Formula Time Iters Time Iters Time Iters Time Iters 100 0.001 SmirovD 0.063 4 0.14 7 0.281 9 0.454 11 SmirovAltD 0.047 4 0.141 7 0.25 9 0.39 11 0.01 SmirovD 0.046 4 0.157 7 0.25 8 0.468 10 SmirovAltD 0.032 4 0.125 7 0.218 8 0.407 10 0.1 SmirovD 0.063 4 0.141 6 0.203 6 0.313 7 SmirovAltD 0.047 4 0.093 6 0.172 6 0.281 7 0.25 SmirovD 0.078 5 0.156 6 0.266 7 0.406 8 SmirovAltD 0.047 5 0.125 6 0.203 7 0.36 8 0.5 SmirovD 0.063 4 0.156 6 0.265 7 0.407 8 SmirovAltD 0.062 4 0.125 6 0.219 7 0.375 8 0.9 SmirovD 0.093 5 0.157 6 0.281 7 0.453 8 SmirovAltD 0.078 5 0.141 6 0.234 7 0.375 8 250 0.001 SmirovD 0.344 4 0.906 6 2.406 9 4.578 11 SmirovAltD 0.156 4 0.61 6 1.719 9 3.562 11 0.01 SmirovD 0.407 4 1.078 6 2.422 8 4.047 9 SmirovAltD 0.219 4 0.719 6 1.796 8 3.188 9 0.1 SmirovD 0.421 4 0.954 5 1.812 6 2.734 6 SmirovAltD 0.219 4 0.641 5 1.312 6 2.157 6 0.25 SmirovD 0.516 5 1.094 6 2.265 7 3.204 7 SmirovAltD 0.25 5 0.703 6 1.687 7 2.5 7 0.5 SmirovD 0.406 4 1.172 6 2.25 7 3.359 7 SmirovAltD 0.219 4 0.797 6 1.656 7 2.656 7 0.9 SmirovD 0.469 4 1.219 6 2.39 7 4.328 9 SmirovAltD 0.266 4 0.812 6 1.781 7 3.438 9 500 0.001 SmirovD 2.016 4 6.703 7 14.734 8 26.297 9 SmirovAltD 0.906 4 4.062 7 10.469 8 19.406 9 0.01 SmirovD 2.047 4 6.485 6 15.406 8 27.219 9 SmirovAltD 0.797 4 3.797 6 10.672 8 20.093 9 0.1 SmirovD 2.078 4 6.375 6 10.078 5 18.282 6 SmirovAltD 0.907 4 3.843 6 6.938 5 13.453 6 0.25 SmirovD 2.609 5 5.906 5 13.203 6 17.922 6 SmirovAltD 1.062 5 3.563 5 8.625 6 12.641 6 0.5 SmirovD 2.719 5 5.719 5 10.015 5 17.938 6 SmirovAltD 1.078 5 3.5 5 6.828 5 13.031 6 0.9 SmirovD 2.891 4 8.656 6 16 7 27.281 8 SmirovAltD 1.531 4 5.61 6 11.656 7 21.141 8 Note: All timigs o a Petium IV ruig at 2.4 GHz. Table 7: SmirovD ad SmirovAltD liear search algorithms calculatig badwidth d + (, α, ρ)

Joural of Statistical Software 17 Time I Secods (Time) ad Number Iteratios (Iters) Sample p value Direct ρ = 3 ρ = 6 ρ = 9 ρ = 12 Size α Formula Time Iters Time Iters Time Iters Time Iters 1, 000 0.001 DwassD 0.469 4 1.75 6 4.765 8 8.407 9 DwassAltD 0.172 4 0.922 6 3.172 8 6.093 9 0.01 DwassD 0.422 4 1.25 5 2.765 6 5.406 7 DwassAltD 0.157 4 0.734 5 1.813 6 3.921 7 0.1 DwassD 0.329 4 0.906 5 1.656 5 3.172 6 DwassAltD 0.141 4 0.531 5 1.11 5 2.281 6 0.25 DwassD 0.281 4 0.64 5 1.625 6 2.516 6 DwassAltD 0.125 4 0.375 5 1.093 6 1.828 6 0.5 DwassD 0.203 4 0.625 6 0.969 5 1.859 6 DwassAltD 0.079 4 0.39 6 0.672 5 1.375 6 0.9 DwassD 0.094 4 0.281 6 0.641 7 1.109 8 DwassAlt 0.047 4 0.203 6 0.469 7 0.906 8 2, 500 0.001 DwassD 5.5 4 13.438 5 29 6 60.047 7 DwassAltD 1.734 4 5.735 5 14.906 6 36.75 7 0.01 DwassD 3.609 4 11.641 5 25.047 6 46.25 7 DwassAltD 0.157 4 0.734 5 1.813 6 3.921 7 0.1 DwassD 2.968 4 6.813 4 17.187 6 26.703 6 DwassAltD 0.875 4 3 4 9.109 6 15.969 6 0.25 DwassD 2.282 4 5.359 5 11.063 5 21.953 6 DwassAltD 0.687 4 2.172 5 5.891 5 13.406 6 0.5 DwassD 1.609 4 4.328 5 7.875 5 15.672 6 DwassAltD 0.516 4 1.844 5 4.265 5 9.703 6 0.9 DwassD 0.828 4 2.094 5 4.766 6 7.171 6 DwassAltD 0.328 4 1.094 5 2.891 6 4.765 6 5, 000 0.001 DwassD 27.829 4 74.593 5 158.86 6 307.14 7 DwassAltD 7.188 4 27.812 5 76.157 6 174.437 7 0.01 DwassD 16.391 4 68.344 6 136.437 6 240.375 7 DwassAltD 2.625 4 25.25 6 66.422 6 135.438 7 0.1 DwassD 12.828 4 34.125 4 77.953 5 151.047 6 DwassAltD 2.671 4 12.766 4 37.969 5 86.437 6 0.25 DwassD 11.828 4 31.61 5 61.281 5 115.469 6 DwassAltD 3.047 4 11.797 5 29.984 5 66.25 6 0.5 DwassD 10.687 4 30.594 6 61.375 6 91.937 6 DwassAltD 3.141 4 12.5 6 32.281 6 54.532 6 0.9 DwassD 4 4 11.844 5 24.781 6 36.625 6 DwassAltD 1.343 4 5.422 5 13.797 6 22.594 6 Note: All timigs o a Petium IV ruig at 2.4 GHz. Table 8: DwassD ad DwassAltD liear search algorithms calculatig badwidth d + (, α, ρ)

18 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio Sample Badwidth d + (, α, ρ = 6) Size α = 0.2 α = 0.1 α = 0.05 α = 0.02 α = 0.01 α = 0.001 2.552786.683772.776393.858579.900000.968377 3.472674.564810.636045.728558.784557.900000 4.412407.492653.565216.640745.688870.822172 5.370169.446980.509449.579665.627180.750000 6.340585.410373.467993.534303.577407.695706 7.317415.381476.436069.497468.538440.650714 8.298083.358313.409623.467651.506543.613676 9.281849.339102.387464.442728.479596.582099 10.268074.322602.368663.421350.456624.555002 11.256238.308292.352421.402834.436703.531346 12.245918.295770.338151.386604.419178.510472 13.236761.284698.325490.372204.403621.491890 14.228543.274807.314170.359308.389695.475202 15.221128.265886.303973.347677.377127.460107 16.214400.257784.294720.337119.365709.446371 17.208264.250387.286269.327476.355275.433799 18.202638.243601.278511.318621.345693.422236 19.197453.237346.271357.310453.336852.411555 20.192652.231555.264734.302887.328661.401649 21.188188.226173.258577.295853.321044.392427 22.184023.221153.252835.289292.313936.383816 23.180125.216455.247462.283151.307283.375750 24.176468.212048.242420.277388.301039.368174 25.173028.207902.237677.271966.295163.361040 26.169784.203992.233205.266852.289621.354308 27.166718.200297.228977.262018.284381.347940 28.163814.196798.224974.257440.279417.341905 29.161059.193478.221175.253094.274706.336174 30.158440.190321.217563.248964.270227.330724 31.155945.187316.214125.245030.265962.325531 32.153566.184450.210845.241278.261893.320577 33.151294.181712.207713.237695.258007.315843 34.149121.179094.204718.234268.254290.311314 35.147039.176587.201849.230985.250730.306975 36.145044.174183.199098.227838.247316.302813 37.143128.171876.196458.224817.244038.298817 38.141287.169659.193921.221913.240889.294975 39.139516.167526.191480.219120.237858.291279 40.137810.165472.189130.216431.234940.287718 41.136167.163492.186865.213838.232128.284286 42.134581.161582.184680.211338.229414.280974 43.133049.159739.182570.208923.226794.277775 44.131570.157957.180532.206590.224263.274684 45.130139.156234.178560.204333.221814.271694 46.128754.154567.176653.202150.219445.268800 47.127413.152952.174805.200035.217150.265997 48.126113.151388.173015.197986.214926.263280 49.124853.149870.171279.195999.212769.260646 Table 9: Badwidth d + (, α, ρ = 6) to six digits of precisio for = 2 to = 49

Joural of Statistical Software 19 Sample Badwidth d + (, α, ρ = 6) Size α = 0.2 α = 0.1 α = 0.05 α = 0.02 α = 0.01 α = 0.001 50.123630.148398.169594.194070.210677.258089 60.113108.135735.155106.177484.192675.236081 70.104898.125858.143806.164548.178632.218900 80.0982602.117874.134673.154091.167280.205005 90.0927478.111245.127091.145411.157855.193466 100.0880746.105627.120666.138054.149868.183683 120.0805279.0965573.110293.126179.136974.167887 140.0746462.0894905.102212.116928.126930.155578 160.0698946.0837829.0956867.109458.118818.145637 180.0659516.0790476.0902733.103261.112090.137390 200.0626109.0750364.0856880.0980124.106391.130404 220.0597330.0715815.0817390.0934923.101483.124388 240.0572200.0685651.0782913.0895463.0971989.119135 260.0550007.0659015.0752471.0860622.0934160.114497 280.0530219.0635268.0725333.0829563.0900438.110363 300.0512430.0613923.0700941.0801648.0870129.106647 320.0496325.0594599.0678861.0776379.0842694.103283 340.0481654.0576997.0658748.0753362.0817705.100220 360.0468214.0560875.0640327.0732283.0794819.0974138 380.0455844.0546036.0623373.0712882.0773755.0948313 400.0444408.0532319.0607700.0694949.0754286.0924442 420.0433794.0519588.0593156.0678307.0736217.0902290 440.0423907.0507732.0579611.0662807.0719391.0881660 460.0414669.0496653.0566954.0648326.0703669.0862385 480.0406012.0486271.0555094.0634756.0688936.0844322 500.0397876.0476515.0543950.0622005.0675094.0827351 520.0390212.0467325.0533452.0609994.0662054.0811365 540.0382975.0458648.0523540.0598655.0649743.0796272 560.0376128.0450438.0514163.0587926.0638096.0781992 580.0369636.0442655.0505272.0577754.0627054.0768454 600.0363470.0435262.0496829.0568094.0616567.0755597 620.0357603.0428229.0488795.0558904.0606590.0743366 640.0352012.0421527.0481140.0550146.0597082.0731710 660.0346676.0415130.0473834.0541788.0588009.0720586 680.0341576.0409017.0466852.0533800.0579338.0709956 700.0336695.0403166.0460170.0526156.0571040.0699783 720.0332018.0397560.0453767.0518832.0563088.0690035 740.0327531.0392182.0447625.0511806.0555461.0680684 760.0323222.0387017.0441726.0505058.0548136.0671704 780.0319079.0382051.0436055.0498570.0541093.0663071 800.0315091.0377271.0430597.0492327.0534316.0654762 820.0311250.0372667.0425339.0486312.0527787.0646758 840.0307546.0368228.0420269.0480513.0521492.0639041 860.0303971.0363944.0415377.0474917.0515417.0631595 880.0300519.0359807.0410652.0469513.0509550.0624403 900.0297182.0355807.0406085.0464289.0503880.0617451 Table 10: Badwidth d + (, α, ρ = 6) to six digits of precisio for = 50 to = 900

20 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio Sample Badwidth d + (, α, ρ = 6) Size α = 0.2 α = 0.1 α = 0.05 α = 0.02 α = 0.01 α = 0.001 920.0293953.0351939.0401667.0459235.0498394.0610727 940.0290828.0348194.0397391.0454344.0493084.0604217 960.0287801.0344566.0393249.0449606.0487941.0597912 980.0284866.0341050.0389233.0445013.0482955.0591801 1000.0282020.0337639.0385338.0440558.0478120.0585873 1050.0275262.0329541.0376092.0429982.0466639.0571800 1100.0268969.0322000.0367481.0420133.0455949.0558695 1150.0263089.0314955.0359437.0410933.0445962.0546453 1200.0257579.0308353.0351899.0402312.0436604.0534983 1250.0252402.0302151.0344818.0394212.0427812.0524206 1300.0247525.0296309.0338147.0386583.0419532.0514056 1350.0242922.0290793.0331850.0379381.0411715.0504474 1400.0238566.0285575.0325893.0372568.0404319.0495409 1450.0234437.0280629.0320245.0366109.0397308.0486816 1500.0230515.0275931.0314882.0359976.0390651.0478656 1550.0226784.0271462.0309780.0354140.0384317.0470893 1600.0223229.0267203.0304918.0348580.0378282.0463496 1650.0219836.0263139.0300278.0343275.0372523.0456437 1700.0216594.0259256.0295845.0338204.0367020.0449692 1750.0213491.0255539.0291602.0333352.0361753.0443237 1800.0210518.0251978.0287536.0328703.0356708.0437052 1850.0207666.0248562.0283637.0324244.0351867.0431120 1900.0204927.0245281.0279892.0319961.0347219.0425423 1950.0202294.0242128.0276292.0315844.0342751.0419946 2000.0199760.0239092.0272827.0311882.0338450.0414676 Table 11: Badwidth d + (, α, ρ = 6) to six digits of precisio for = 920 to = 2, 000 approach is to compare the computatio times eeded to produce the same p value across various sample sizes. I order to do this, we eed to calculate the value of the test statistic that yields a specified p value α for a sample size. I other words, we will use the badwidth d + (, α, ρ) where P [D + d + (, α, ρ)] α (see Sectio 5). For ρ = 3, the Mathematica fuctio TimigKS1SidedOeSampleRatioalDirectFormulae cotaied i Sectio 19 of the KS1SidedOeSampleRatioal.b file iputs the test statistics listed Table 12 ad produces the timigs i Table 13. The fastest direct formula i Table 13 was DwassAltD followed i order by DwassD, SmirovAltD, ad SmirovD. Sice the umber of terms i the SmirovD ad SmirovAltD formulae for the same sample size icrease with icreasig α, we would expect that the time would also icrease with α. This is the patter followed by the times i Table 13 with two exceptios which will be dealt with i Sectio 8: the time decreases for = 1, 000 goig from α = 0.25 to α = 0.5 ad the time also decreases for = 2, 500 goig from α = 0.01 to α = 0.1. 6.2. Iterative formulae computatioal experiece For ρ = 3, the Mathematica fuctio TimigKS1SidedDirectVersusIterFormulae cotaied i Sectio 20 of the KS1SidedOeSampleRatioal.b file iputs the test statistics listed

Joural of Statistical Software 21 Sample Badwidth d + (, α, ρ = 20) Size α = 0.001 α = 0.9 1, 000 0.058587291690890652166 0.0070941136544958142815 2, 500 0.037098569056693520814 0.0045244445380207103595 5, 000 0.026247865445378139343 0.0032128340598027961926 Sample Badwidth d + (, α, ρ = 3) Size α = 0.001 α = 0.01 α = 0.1 α = 0.25 α = 0.5 α = 0.9 1, 000 0.0586 0.0478 0.0338 0.0262 0.0185 0.00709 293/5000 239/5000 169/5000 131/5000 37/2000 709/100000 2, 500 0.0371 0.0303 0.0214 0.0166 0.0117 0.00452 371/10000 303/10000 107/5000 83/5000 117/10000 113/25000 5, 000 0.0262 0.0214 0.0151 0.0117 0.00829 0.00321 131/5000 107/5000 151/10000 117/10000 829/10000 321/10000 Table 12: Badwidth d + (, α, ρ) to produce P [D + d + (, α, ρ)] α Sample Time i Secods to calculate P [D + d + (, α, 3)] Size Formula α = 0.001 α = 0.01 α = 0.1 α = 0.25 α = 0.5 α = 0.9 1, 000 SmirovD 2.609 2.641 2.656 2.703 1.891 4.188 SmirovAltD 0.828 0.843 0.844 0.860 0.406 1.922 DwassD 0.125 0.093 0.063 0.047 0.047 0.031 DwassAltD 0.047 0.047 0.015 0.016 0.000 0.016 2, 500 SmirovD 31.875 32.046 30.282 30.500 32.593 46.469 SmirovAltD 6.219 6.266 4.656 4.687 6.360 11.515 DwassD 0.922 0.734 0.500 0.390 0.282 0.157 DwassAltD 0.203 0.157 0.094 0.062 0.063 0.031 5, 000 SmirovD 203.969 204.594 225.344 225.906 364.453 371.469 SmirovAltD 14.656 14.734 31.422 32.063 90.782 90.797 DwassD 4.234 3.438 2.547 1.969 2.218 0.875 DwassAltD 0.312 0.250 0.375 0.297 0.547 0.234 Note: All timigs o a Petium IV ruig at 2.4 GHz. Table 13: Time i secods for direct formulae to calculate P [D + d + (, α, 3)] usig ratioal arithmetic

22 Evaluatig the Oe-sided Oe-sample K-S Test Samplig Distributio Sample Test Time i Secods to Compute P [D + d + (, α, 3)] for Formula Size p value Statistic Smirov SmirovAlt Dwass DwassAlt α d + (, α, 3) Direct Iter Direct Iter Direct Iter Direct Iter 1, 000 0.001 0.0586 2.672 4.266 0.844 1.828 0.125 0.219 0.046 0.079 0.01 0.0478 2.640 4.313 0.843 1.844 0.110 0.187 0.031 0.063 0.1 0.0338 2.687 4.344 0.859 1.860 0.062 0.125 0.016 0.047 0.25 0.0262 2.719 4.359 0.859 1.875 0.063 0.109 0.016 0.031 0.5 0.0185 1.922 3.437 0.422 1.282 0.031 0.047 0.015 0.016 0.9 0.00709 4.250 7.406 1.953 3.875 0.032 0.046 0.016 0.016 2, 500 0.001 0.0371 31.875 59.797 6.250 19.140 0.922 1.813 0.203 0.375 0.01 0.0303 32.062 59.766 6.281 19.234 0.750 1.454 0.156 0.312 0.1 0.0214 30.282 55.187 4.688 15.875 0.484 0.953 0.078 0.156 0.25 0.0166 30.391 55.453 4.703 15.875 0.375 0.735 0.062 0.125 0.5 0.0117 32.516 60.672 6.422 19.500 0.265 0.547 0.063 0.125 0.9 0.00452 46.500 81.265 11.516 27.484 0.156 0.282 0.047 0.062 5, 000 0.001 0.0262 203.609 361.234 14.532 74.937 4.235 8.109 0.312 0.688 0.01 0.0214 204.343 362.703 14.563 74.969 3.453 6.640 0.250 0.547 0.1 0.0151 224.938 428.328 31.047 112.187 2.578 5.094 0.375 0.735 0.25 0.0117 225.578 429.797 31.093 112.141 1.969 3.937 0.297 0.547 0.5 0.00829 363.484 668.344 89.422 221.594 2.250 4.359 0.547 1.031 0.9 0.00321 364.781 668.907 89.703 221.906 0.875 1.672 0.234 0.438 Note: All timigs o a Petium IV ruig at 2.4 GHz. Table 14: Comparig time i secods for direct ad iterative formulae usig ratioal arithmetic Table 12 ad produces the timigs show i Table 14 for all direct ad iterative formulae. I this timig program, the p values produced by the direct ad iterative formulae for the same sample size ad test statistic d + are compared ad a error message is writte if they are ot all exactly equal. For all the computatioal experiece performed, o error message was ever geerated which is a idicatio that the Mathematica implemetatios of the direct ad iterative formulae are correct. The results clearly show that the direct formulae are faster tha their correspodig iterative formulae. Sice the DwassAltD direct formula is the fastest of all the direct ad iterative formulae, it will be used as a compariso i the computatioal experiece of the recursio formulae.