Simulation of probability distributions commonly used in hydrological frequency analysis

Size: px

Start display at page:

Download "Simulation of probability distributions commonly used in hydrological frequency analysis"

Adrian Bryan
5 years ago
Views:

1 HYDROLOGICAL PROCESSES Hydrol. Process. 2, 5 6 (27) Published online May 26 in Wiley InterScience ( DOI: 2/hyp.676 Simulation of probability distributions commonly used in hydrological frequency analysis Ke-Sheng Cheng,* Jie-Lun Chiang and Chieh-Wei Hsu Department of Bioenvironmental Systems Engineering, National Taiwan University, Taipei, Taiwan, ROC Abstract: Random variable simulation has been applied to many applications in hydrological modelling, flood risk analysis, environmental impact assessment, etc. However, computer codes for simulation of distributions commonly used in hydrological frequency analysis are not available in most software libraries. This paper presents a frequency-factor-based method for random number generation of five distributions (normal, log normal, extreme-value type I, Pearson type III and log-pearson type III) commonly used in hydrological frequency analysis. The proposed method is shown to produce random numbers of desired distributions through three means of validation: () graphical comparison of cumulative distribution functions (CDFs) and empirical CDFs derived from generated data; (2) properties of estimated parameters; (3) type I error of goodness-of-fit test. An advantage of the method is that it does not require CDF inversion, and frequency factors of the five commonly used distributions involves only the standard normal deviate. Copyright 26 John Wiley & Sons, Ltd. KEY WORDS random number generation; random variable simulation; hydrological frequency analysis; goodness-of-fit test Received 23 August 24; Accepted August 25 INTRODUCTION In various statistical applications, particularly in simulation studies, it is often desired to generate random samples of specified random variables. Recent developments in hydrological modelling, flood risk analysis, environmental impact assessment, etc. have demonstrated the usefulness of random variable simulation (National Research Council, 2). Although computer codes for simulation of random variables with uniform and normal (or Gaussian) distributions are widely available, simulation of other non-gaussian continuous random variables frequently used in hydrological study are less common. Computer simulation of random variables is the task of using computers to generate many random numbers that are independent and identically distributed. It is also known as random number generation (RNG). In fact, these computer-generated random numbers form a deterministic sequence, and the same list of numbers will be cycled repeatedly. This cycle can be made to be so long that the lack of true independence is unimportant (Larget, 22). Therefore, such computer codes are often termed pseudo-random number generators (PRNGs). There exist mathematical transformation methods to obtain other distributions from uniform variates (Devroye, 986; Hellekalek, 997). For this reason, most PRNGs found in software libraries produce uniform random numbers in the unit interval (, ). However, transformations from a uniform distribution to the several distributions commonly used in hydrological frequency * Correspondence to: Ke-Sheng Cheng, Department of Bioenvironmental Systems Engineering, National Taiwan University, Taipei, Taiwan, ROC. rslab@ntu.edu.tw analysis is not easy, and computer codes for simulation of these distributions are not available in most software libraries. Therefore, the purpose of this paper is to present a method for RNG of five distributions (normal, log normal, extreme-value type I (EV), Pearson type III (PT3) and log-pearson type III (LPT3)) commonly used in hydrological frequency analysis. The method utilizes the frequency factor, which is familiar to hydrologists for transformation from uniform variates to desired distributions. GENERATING RANDOM NUMBERS: PROBABILITY INTEGRAL TRANSFORMATION AND REJECTION METHODS Suppose that we are interested in generating n values of a random variable X that has a continuous cumulative distribution function (CDF) F X Ð. A commonly used method for such a purpose is the probability integral transform (PIT) method or the inversion method (Mood et al., 974). The PIT method is based on the property that a random variable X with CDF F X Ð can be transformed into a random variable U with uniform distribution over the interval (, ) by defining U D F X X Conversely, if U is uniformly distributed over the interval (, ), then X D F X U has CDF F X Ð. Thus, to generate a value, say x, of a random variable X having a continuous CDF F X Ð, it suffices to generate a value, say u, of a random variable U that is uniformly distributed Copyright 26 John Wiley & Sons, Ltd.

2 52 K.-S. CHENG, J.-L. CHIANG AND C.-W. HSU over the interval (, ). The value x is then obtained by x D F X u 2 Another method for generating random deviates is the rejection method (Press et al., 993), which does not require that the CDF be readily computable, much less the inverse of that function. The rejection method utilizes a comparison function g x that lies everywhere above the desired probability distribution f x. The method first generates a uniform deviate between zero and A, where A is the total area under the comparison function, and uses it to get a corresponding x. Then a second uniform deviate y between zero and g x is generated and used to decide whether to accept (if f x ½ y) or reject (if f x < y) thatx. The non-rejected x values would have the desired distribution f x. GENERAL EQUATION FOR HYDROLOGICAL FREQUENCY ANALYSIS Hydrological frequency analysis is the work of determining the magnitudes of hydrological variables corresponding to a given frequency or recurrence interval. The recurrence interval, also called the return period, is defined as the average interval over a long period of time during which a corresponding magnitude of some hydrological variable is met or exceeded. Rainfall and streamflow are two major types of hydrological variable that are used in frequency analysis. For example, annual maximum flow records are used to estimate the magnitude of flood of 5- year return period Q 5, i.e. on average, year of every 5-year sequence is expected to experience a flood of at least Q 5. Two methods of hydrological frequency analysis are commonly applied: the plotting position method and the frequency factor method. The former is a straightforward plotting technique to obtain the CDF by use of certain plotting position formulas (Chow et al., 988). The frequency factor method is described below. A random variable X has CDF F X Ð with mean and standard deviation. The magnitude of X corresponding to return period T, denoted by x T,isdefinedas P X ½ x T D /T 3 Chow (95) proposed the following general equation for hydrological frequency analysis: x T D C K T 4 where K T, the frequency factor, is a function of T and is distribution specific. Apparently, if X is normally distributed, then the frequency factor K T corresponds to the standard normal deviate with exceedance probability /T. Frequency factors of commonly used distributions in hydrological frequency analysis have been developed (Kite, 988). Table I shows the probability density functions and frequency factors of five distributions commonly used for hydrological frequency analysis. Table I. Probability density functions and frequency factors of distributions commonly used for hydrological frequency analysis Distribution, X Probability density function f X x [ Normal f X x D p exp 2 2 <x<c Log normal f X x D ( ) ] x 2, Frequency factor K T Standard normal deviate z with exceedance probability /T [ p exp ( ) y y 2 ] 2xy 2, K T D expf[ln C C2 V ]/2 Z [ln C C 2 V ]/2g y C V <x<c C V D /, coefficient of variation of X D e y C2 y /2, 2 D e 2 y 2 z: standard normal deviate with exceedance y and y are respectively the mean and probability /T standard deviation of Y D ln X p [ ( )]} EV f X x D exp[ x ˇ e x ˇ ], K T D 6 {Ð5772 C ln ln T T <x<c D / p 6, ˇ D Ð5772/ and are respectively the mean and standard deviation of X PT3 f X x D ˇ LPT3 ε x<c ( x ε )ˇ e [ x ε / ], K T ³ z C z 2 6 C 3 ( ) 2 z3 6z ( ) 6 z 2 3 ( ) 4 ( ) 6 C z z: standard normal deviate with exceedance D / pˇ, ˇ D 2/ 2,εD pˇ, and are respectively the mean, standard probability /T deviation and skewness coefficient of X ) ( f X x D ln x ε )ˇ ( ln x ε x ˇ e, Same as K T of the PT3 distribution (K T is to be ε ln x<c substituted into y T D ln x T D y C K T y ) D y / pˇ, ˇ D 2/ y 2,εD y y pˇ y, y and y are respectively the mean, standard deviation and skewness coefficient of Y D ln X Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

3 HYDROLOGICAL FREQUENCY ANALYSIS 53 Table II. Distribution parameters designated for generation of random numbers Parameter Distribution Normal Log normal EV PT3 LPT3 Mean a a Standard deviation a a Coefficient of skewness a Ð396 Ð5 Ð5 a a Parameters are assigned for Y D ln X when X is a random variable of log normal or LPT3 distribution. Normal (µ=. σ=); n= Normal (µ=. σ=); n= LN (µ y =. σ y =); n= LN (µ y =. σ y =); n= EV (µ=. σ=); n= EV (µ=. σ=); n= PT3 (µ=. σ=, γ=.5); n=5.6 PT3 (µ=. σ=, γ=.5); n= LPT3 (µ y =. σ y =, γ y =.5); n= X LPT3 (µ y =. σ y =, γ y =.5); n= X Figure. Graphical comparison of ECDF (dots) and CDF (solid curves). (For log normal and log-pearson distributions, values of Y D ln X are used in X-axis) Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

4 54 K.-S. CHENG, J.-L. CHIANG AND C.-W. HSU Table III. Sample mean and standard deviation of the estimated parameters (mean, standard deviation and coefficient of skewness) with respect to sample size n ranging from 5 to 5 for normal distribution ( D, D ) Estimated Summary n (sample size) parameter statistic Number of random samples N D x Mean Ð25 Ð53 Ð6 Ð Ð3 Ð Ð8 Ð2 Ð2 Ð SD Ð346 Ð9 Ð82 Ð696 Ð655 Ð63 Ð525 Ð493 Ð479 Ð457 s Mean Ð9993 Ð9955 Ð9994 Ð9986 Ð Ð999 Ð9993 Ð9989 Ð9985 Ð9996 SD Ð9 Ð683 Ð587 Ð56 Ð449 Ð423 Ð395 Ð354 Ð324 Ð33 O Mean Ð77 Ð2 Ð66 Ð8 Ð98 Ð9 Ð3 Ð9 Ð Ð3 SD Ð494 Ð2637 Ð294 Ð785 Ð6 Ð495 Ð363 Ð252 Ð87 Ð4 Number of random samples N D x Mean Ð2 Ð3 Ð Ð6 Ð4 Ð5 Ð7 Ð9 Ð Ð4 SD Ð422 Ð2 Ð82 Ð74 Ð64 Ð574 Ð536 Ð5 Ð474 Ð443 s Mean Ð9944 Ð998 Ð9987 Ð9986 Ð9986 Ð9982 Ð9994 Ð Ð9989 Ð9994 SD Ð5 Ð7 Ð577 Ð55 Ð448 Ð46 Ð375 Ð353 Ð33 Ð34 O Mean Ð57 Ð2 Ð22 Ð Ð24 Ð5 Ð4 Ð25 Ð8 Ð5 SD Ð436 Ð2677 Ð233 Ð84 Ð62 Ð453 Ð364 Ð239 Ð78 Ð3 Table IV. Sample mean and standard deviation of the estimated parameters (mean, standard deviation and coefficient of skewness) with respect to sample size n ranging from 5 to 5 for log normal distribution ( y D, y D ) Estimated Summary n (sample size) parameter statistic Number of random samples N D y Mean Ð2 Ð22 Ð28 Ð47 Ð Ð6 Ð4 Ð7 Ð3 Ð4 SD Ð43 Ð999 Ð8 Ð7 Ð67 Ð574 Ð544 Ð484 Ð482 Ð457 s y Mean Ð9947 Ð999 Ð9976 Ð998 Ð9976 Ð999 Ð9993 Ð998 Ð9979 Ð9975 SD Ð48 Ð7 Ð587 Ð494 Ð44 Ð49 Ð379 Ð36 Ð337 Ð33 O y Mean Ð9 Ð Ð56 Ð54 Ð46 Ð5 Ð62 Ð54 Ð37 Ð42 SD Ð3987 Ð2672 Ð252 Ð849 Ð637 Ð464 Ð322 Ð229 Ð98 Ð2 Number of random samples N D y Mean Ð Ð Ð4 Ð Ð2 Ð Ð2 Ð9 Ð5 Ð SD Ð43 Ð999 Ð83 Ð79 Ð635 Ð574 Ð532 Ð496 Ð474 Ð448 s y Mean Ð996 Ð9959 Ð9983 Ð9985 Ð9986 Ð9993 Ð9989 Ð9994 Ð9994 Ð9988 SD Ð Ð78 Ð58 Ð496 Ð447 Ð47 Ð378 Ð356 Ð334 Ð37 O y Mean Ð4 Ð2 Ð6 Ð6 Ð9 Ð5 Ð Ð3 Ð Ð8 SD Ð4 Ð2664 Ð233 Ð82 Ð599 Ð448 Ð35 Ð247 Ð8 Ð2 y, s y and O y are respectively the sample estimates of mean, standard deviation and skewness coefficient of Y D ln X respectively. Table V. Sample mean and standard deviation of the estimated parameters (mean, standard deviation and coefficient of skewness) with respect to sample size n ranging from 5 to 5 for EV distribution ( D, D, D Ð396) Estimated Summary n (sample size) parameter statistic Number of random samples N D x Mean Ð2 Ð2 Ð7 Ð7 Ð44 Ð3 Ð4 Ð34 Ð9 Ð28 SD Ð47 Ð976 Ð85 Ð73 Ð63 Ð565 Ð55 Ð5 Ð463 Ð46 s Mean Ð9897 Ð2 Ð9956 Ð9966 Ð9952 Ð9985 Ð997 Ð9958 Ð9994 Ð7 SD Ð45 Ð Ð863 Ð734 Ð667 Ð595 Ð57 Ð54 Ð487 Ð47 O Mean Ð48 Ð729 Ð53 Ð394 Ð425 Ð393 Ð27 Ð344 Ð49 Ð42 SD Ð6397 Ð4846 Ð4279 Ð3654 Ð336 Ð34 Ð2584 Ð2569 Ð242 Ð2364 Number of random samples N D x Mean Ð25 Ð8 Ð5 Ð2 Ð5 Ð5 Ð6 Ð Ð Ð SD Ð49 Ð99 Ð82 Ð74 Ð63 Ð576 Ð532 Ð5 Ð472 Ð445 s Mean Ð9937 Ð9936 Ð9976 Ð9972 Ð9977 Ð9984 Ð9985 Ð9983 Ð9993 Ð999 SD Ð455 Ð35 Ð85 Ð733 Ð665 Ð64 Ð554 Ð58 Ð49 Ð47 O Mean Ð766 Ð42 Ð48 Ð39 Ð434 Ð398 Ð345 Ð39 Ð429 Ð397 SD Ð6342 Ð4778 Ð44 Ð3479 Ð326 Ð2998 Ð2687 Ð2575 Ð2557 Ð2353 Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

5 HYDROLOGICAL FREQUENCY ANALYSIS 55 Table VI. Sample mean and standard deviation of the estimated parameters (mean, standard deviation and coefficient of skewness) with respect to sample size n ranging from 5 to 5 for PT3 distribution ( D, D, D Ð5) Estimated Summary n (sample size) parameter statistic Number of random samples N D x Mean Ð25 Ð3 Ð22 Ð23 Ð Ð9 Ð2 Ð24 Ð8 Ð8 SD Ð392 Ð976 Ð77 Ð694 Ð62 Ð582 Ð54 Ð498 Ð475 Ð448 s Mean Ð983 Ð9942 Ð9927 Ð9945 Ð9952 Ð9954 Ð9989 Ð9994 Ð9978 Ð2 SD Ð666 Ð4 Ð947 Ð825 Ð758 Ð72 Ð637 Ð568 Ð568 Ð529 O Mean Ð5675 Ð5277 Ð537 Ð5526 Ð524 Ð56 Ð5284 Ð5293 Ð598 Ð539 SD Ð6938 Ð557 Ð439 Ð425 Ð3647 Ð347 Ð3283 Ð332 Ð2732 Ð2738 Number of random samples N D x Mean Ð6 Ð Ð4 Ð5 Ð5 Ð9 Ð8 Ð Ð8 Ð3 SD Ð42 Ð7 Ð86 Ð74 Ð626 Ð576 Ð54 Ð497 Ð47 Ð444 s Mean Ð9876 Ð992 Ð9947 Ð9977 Ð996 Ð9969 Ð997 Ð9982 Ð9986 Ð9976 SD Ð62 Ð62 Ð957 Ð83 Ð743 Ð682 Ð636 Ð59 Ð56 Ð526 O Mean Ð569 Ð545 Ð535 Ð5364 Ð539 Ð5249 Ð5265 Ð5279 Ð5248 Ð525 SD Ð656 Ð5257 Ð45 Ð45 Ð3699 Ð3382 Ð322 Ð333 Ð299 Ð2745 Table VII. Sample mean and standard deviation of the estimated parameters (mean, standard deviation and coefficient of skewness) with respect to sample size n ranging from 5 to 5 for LPT3 distribution ( y D, y D, y D Ð5) Estimated Summary n (sample size) parameter statistic Number of random samples N D y Mean Ð5 Ð32 Ð8 Ð Ð5 Ð2 Ð8 Ð4 Ð7 Ð SD Ð358 Ð37 Ð8 Ð688 Ð624 Ð572 Ð53 Ð5 Ð46 Ð452 s y Mean Ð982 Ð993 Ð9956 Ð9929 Ð9956 Ð9973 Ð9935 Ð9947 Ð998 Ð9993 SD Ð68 Ð93 Ð965 Ð844 Ð74 Ð67 Ð632 Ð588 Ð575 Ð522 O y mean Ð5947 Ð5533 Ð5283 Ð525 Ð5289 Ð5222 Ð56 Ð5269 Ð5279 Ð5324 SD Ð676 Ð4998 Ð4629 Ð395 Ð3899 Ð3265 Ð34 Ð2964 Ð294 Ð266 Number of random samples N D y Mean Ð8 Ð Ð Ð7 Ð9 Ð3 Ð Ð8 Ð Ð7 SD Ð4 Ð993 Ð85 Ð7 Ð64 Ð584 Ð534 Ð52 Ð475 Ð45 s y Mean Ð9883 Ð9928 Ð9944 Ð996 Ð9968 Ð997 Ð9967 Ð9979 Ð9975 Ð9979 SD Ð648 Ð68 Ð955 Ð83 Ð75 Ð685 Ð627 Ð593 Ð56 Ð535 O y Mean Ð5849 Ð549 Ð5326 Ð5262 Ð5283 Ð5296 Ð532 Ð5285 Ð5299 Ð528 SD Ð686 Ð56 Ð4423 Ð392 Ð3644 Ð3393 Ð322 Ð346 Ð2857 Ð2827 y, s y and O y are respectively the sample estimates of mean, standard deviation and skewness coefficient of Y D ln X respectively. Suppose that a random sample fx,x 2,...,x n g of a hydrological variable X of known distribution type is available. The magnitude of X corresponding to return period T, x T, can be estimated by. Calculating the sample mean x, sample variance s 2 and skewness coefficient O from the random sample fx,x 2,...,x n g. 2. Determining the value of frequency factor K T using the appropriate distribution or equation in Table I. Readers are reminded that frequency factors of random variables of normal, log normal and PT3 distributions do not explicitly relate to return period T. The relation between K T and T is embedded in the value of the standard normal deviate z, which satisfies P Z ½ z D /T. 3. x T is estimated by Ox T D x C K T s. GENERATING RANDOM NUMBERS USING FREQUENCY FACTORS u is a one-to-one single-value relation. After a random number, say u, ofa uniform distribution over the interval (, ) is generated, let us set As shown in Equation (2), generation of random numbers by the PIT method requires inversion of the CDF F X Ð. It is not always easy to determine the inverse function of F X Ð. As for the rejection method, one needs to choose a comparison function whose indefinite integral is known analytically, and which is analytically invertible. In this section we propose an alternative method that can avoid CDF inversion by using the general equation of frequency analysis. The CDF of a continuous random variable is a nondecreasing function and x D F X T D u 5 Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

6 56 K.-S. CHENG, J.-L. CHIANG AND C.-W. HSU Normal (N=) Log-Normal (N=) EV (N=) Normal (N=) Log-Normal (N=) EV (N=) PT3 (N=).5 PT3 (N=) LPT3 (N=) LPT3 (N=) Figure 2. Uncertainty in estimation of mean reduces as sample size increases. (The centre line represents mean of x and upper and lower lines are one standard deviation away from the centre line) This yields P X ½ x T D T D u D P U ½ u 6 approach is that, even though there are five types of random variable in Table I, determination of K T involves only the standard normal deviate z. with T determined by Equation (5). Frequency factor K T can be calculated using the appropriate distribution or equation in Table I. Finally, the magnitude of x T is calculated by x T D C K T. Similarly, a set of random numbers fu,u 2,...,u n g of a uniform distribution over interval (, ) can be transformed to random numbers fx,x 2,...,x n g of the desired distribution. Unlike the PIT method, no CDF inversion is involved in the above calculation, and the proposed method is hereafter referred to as the frequency factor transformation (FQFT) method. Another advantage of the FQFT TEST AND VALIDATION In order to demonstrate the applicability of the FQFT approach, random numbers of normal, log normal, EV, PT3 and LPT3 distributions are generated and tested. Specific distribution parameters designated for generating random numbers are shown in Table II. For each type of distribution N, random samples (each of size n) were generated and used in subsequent analysis. In this study, the sample size n was set to vary from 5 to 5 in Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

7 HYDROLOGICAL FREQUENCY ANALYSIS Normal (N=) Normal (N=) Log-Normal (N=) Log-Normal (N=) EV (N=) EV (N=) PT3 (N=) PT3 (N=).2.5 LPT3 (N=) LPT3 (N=) Figure 3. Uncertainty in estimation of standard deviation reduces as sample size n increases. (The centre line represents mean of s and upper and lower lines are one standard deviation away from the centre line) increments of 5 and the number of random samples N was set to and. We adopt three means to test the validity of the random numbers generated: () graphical comparison of the CDF and the empirical CDF (ECDF) derived from generated data; (2) properties of estimated parameters; (3) type I error of goodness-of-fit (GOF) test. Graphical comparison of CDF and ECDF Figure graphically illustrates the closeness of the CDF and ECDF with regard to sample sizes of 5 and 5. Each ECDF in Figure is based on one single random sample of size 5 or 5 and it may change when another random sample is used. It can be seen that even at sample size of 5 the ECDF is fairly close to the CDF of the designated distribution. At a sample size of 5, all ECDFs become almost indistinguishable from their corresponding CDFs. Properties of parameter estimators From each of the N random samples generated, the distribution parameters mean, standard deviation and coefficient of skewness can be estimated by x D n n id x i 7 Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

8 58 K.-S. CHENG, J.-L. CHIANG AND C.-W. HSU.5.3 Normal (N=).5.3 Normal (N=) Log-Normal (N=).5.3 Log-Normal (N=) EV (N=).7 EV (N=) PT3 (N=) PT3 (N=) LPT3 (N=) LPT3 (N=) Figure 4. Uncertainty in estimation of skewness coefficient reduces as sample size n increases. (The centre line represents mean of O and upper and lower lines are one standard deviation away from the centre line) n s D p x i x 2 n id n x i x 3 id O D n n n 2 s The coefficient of skewness is very sensitive to sample size n. Bobee and Robitaille (977) suggested using the following sample-size-adjusted coefficient of skewness for PT3 and LPT3 distributions ( O [n n ]/2 DO C 8Ð5 ) n 2 n Furthermore, from a total of N random samples, the sample mean and standard deviation of the above estimated parameters were calculated, with respect to sample size n ranging from 5 to 5, and listed in Tables III VII. Figures 2 4 demonstrate that, with and random samples (N D or ), sample means (the centre line) of the estimated parameters (including mean, standard deviation and coefficient of skewness) are very close to the theoretical values designated for RNG. It is also seen clearly that standard deviations of all parameter estimators decrease with increase of the sample size n, indicating the unbiased nature of the estimator and reduction of uncertainty in parameter estimation. Such characteristics of parameter estimators suggest the random Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

9 HYDROLOGICAL FREQUENCY ANALYSIS α= α=.3.2. Normal (µ=, σ=); N= α= α=.3.2 Normal (µ=, σ=); N= α= α=.3.2. LN (µy=,σy=); N= α= α=.3.2 LN (µy=, σy=); N= α= α=.3.2 EV (µ=, σ=); N= α= α=.3.2 EV (µ=, σ=); N= α= α= PT3 (µ=, σ=, γ=.5); N= α= α=.3.2 PT3 (µ=, σ=, γ=.5); N= α= α= LPT3 (µy=, σy=, γy=.5); N= sample size (n).2..9 α= α=.3.2. LPT3 (µy=, σy=, γy=.5); N= sample size (n) Figure 5. Type I error Ǫ of chi-square GOF test with respect to sample size n samples generated are indeed from the desired distributions. Type I error of GOF test Each random sample of size n is generated from a theoretical distribution with designated parameters and the GOF test can be applied to test whether the random sample is drawn from the theoretical distribution. The widely applied chi-square GOF test is adopted in this study. A random sample x,x 2,...,x n consists of n observed values of a hypothesized distribution. These observed values fall into k mutually exclusive categories, and the following statistic T D k O i E i 2 id E i has a chi-square distribution, for large n, with k degrees of freedom. In Equation (), O i and E i respectively represent the observed and theoretical expected Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

10 6 K.-S. CHENG, J.-L. CHIANG AND C.-W. HSU frequencies falling in the ith category. There are various criteria for determination of sample size n and number of categories k. It is usually felt that n should be large enough that no expected frequency is less than unity and no more than 2% of the expected frequencies are less than five (Milton and Arnold, 23). Mann and Wald (942) initiated a study of choice of categories and recommended that the categories be chosen to have equal probabilities under the hypothesized distribution. They found that, for a sample of size n (large) and significance level, the number of equiprobable categories should be approximately [ ] 2n k Ł 2 /5 D 4 c 2 2 where c is the standard normal deviate with exceedance probability. D Agostino and Stephens (986) further recommended that the number of equiprobable categories should fall between the k Ł value determined by Equation (2) for D Ð5 and half that value. Since the value of k Ł in Equation (2) increases slowly with and it overstates the number of categories required, the value of D Ð5 for the c calculation can be used for various levels of significance (D Agostino and Stephens, 986). To be more specific, c Ð5 D Ð645 and half the value of k Ł in Equation (2) is [ ] 2n k D Ð5k Ł 2 /5 D 2 c 2 D Ð88n 2/5 3 Therefore, k ³ 2n 2/5 is a convenient choice for the number of mutually exclusive, equiprobable categories for the chi-square GOF test and is adopted in this study. The null hypothesis of the chi-square GOF test assumes that the observed sample is drawn from the hypothesized distribution. The null hypothesis is rejected, at level of significance, if the value of the test statistic T calculated from the random sample x,x 2,...,x n exceeds,k 2, the ( )th quantile of the chi-square distribution with k degrees of freedom. Although all random samples are generated based on a theoretical distribution, there is no guarantee that all random samples will not be rejected by the GOF test, since the level of significance is imposed. If a random sample is rejected by the GOF test using the theoretical distribution as the hypothesized distribution, then a type I error is conducted. Theoretically, at level of significance, there will be % of the total number of random samples rejected by the GOF test. In practice, N r samples out of the totality are rejected and the probability of conducting a type I error is estimated as Ǫ D N r /N 4 As N increases, Ǫ should become increasingly close to the level of significance. Figure 5 demonstrates type I error Ǫ of the chi-square GOF test with respect to sample size n. With random samples, most values of Ǫ fall between Ð4 and Ð6 for D Ð5, and between Ð9 and Ð for D Ð. As the number of random samples N increases to, almost all values of Ǫ are very close to the level of significance. Such results indicate that random samples generated by the FQFT method do comply with the desired distributions. It can also be seen in Figure 5 that Ǫ fluctuates steadily with respect to sample size n, indicating that an increase in sample size does not help to stabilize the type I error Ǫ. CONCLUSIONS The proposed FQFT method is capable of generating random numbers of five distributions commonly used in hydrological frequency analysis. The ECDFs and the distribution parameters estimated from random samples are very close to, or indistinguishable from, theoretical values. Sample estimates of distribution parameters are unbiased, and estimation uncertainties reduce with increasing sample size. The type I error of the chi-square GOF test on the random samples generated fluctuates slightly around the level of significance. An advantage of the FQFT method is that it does not require CDF inversion and the frequency factor of the five commonly used distributions involves only the standard normal deviate. ACKNOWLEDGEMENTS We are grateful to the Council of Agriculture (Taiwan, ROC) for funding a project that led to the initiation of this study. REFERENCES Bobee B, Robitaille R The use of the Pearson type III and log Pearson type III distribution revisited. Water Resources Research 3(2): Chow VT. 95. A general formula for hydrologic frequency analysis. Transactions, American Geophysical Union 32: Chow VT, Maidment DR, Mays LW Applied Hydrology. McGraw-Hill: New York. D Agostino RB, Stephens MA Goodness-of-Fit Techniques. Marcel Dekker: New York. Devroye L Non-Uniform Random Variate Generation. Springer- Verlag: New York. Hellekalek P A note on pseudorandom number generators. EUROSIM Simulation News Europe 2: 6 8. Kite GW Frequency and Risk Analysis in Hydrology. Water Resources Publications. Larget B. 22. Random number generation. larget/math496/random.html (accessed 3 March 26). Mann HB, Wald A On the choice of the number of class intervals in the application of the chi-squared test. Annals of Mathematical Statistics 3: Milton JS, Arnold JC. 23. Introduction to Probability and Statistics. McGraw-Hill: New York. Mood AM, Graybill FA, Boes DC Introduction to the Theory of Statistics. McGraw-Hill: New York. National Research Council. 2. Risk Analysis and Uncertainty in Flood Damage Reduction Studies. National Academy Press. Press WH, Flannery BP, Teukolsky SA, Vetterling WT Numerical Recipes in C. The Art of Scientific Computing. Cambridge University Press: Copyright 26 John Wiley & Sons, Ltd. Hydrol. Process. 2, 5 6 (27) DOI: 2/hyp

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -26 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY Lecture -26 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture Hydrologic data series for frequency