Applications of Good s Generalized Diversity Index. A. J. Baczkowski Department of Statistics, University of Leeds Leeds LS2 9JT, UK

Size: px

Start display at page:

Download "Applications of Good s Generalized Diversity Index. A. J. Baczkowski Department of Statistics, University of Leeds Leeds LS2 9JT, UK"

Kerry Phelps
5 years ago
Views:

1 Applications of Good s Generalized Diversity Index A. J. Baczkowski Department of Statistics, University of Leeds Leeds LS2 9JT, UK Internal Report STAT 98/11 September 1998

2 Applications of Good s Generalized Diversity Index A. J. Baczkowski Department of Statistics, University of Leeds, Leeds LS2 9JT, UK Internal Report STAT 98/11 September 1998 Abstract This report uses the moments of the generalized diversity index h(α, β) to obtain percentage points of the index, suggest transformations to normality, and to propose tests of hypothesis for this diversity index. 1 Introduction Good (1953) proposed a generalized diversity index which includes as special cases both Shannon s and Simpson s indices. This index can be further generalized as described in Baczkowski et al. (1997a,1998). The first four moments of this generalized index are given in Baczkowski (1996). By calculating the skewness and kurtosis it is possible to suggest suitable approximating distributions for the index; see Baczkowski et al. (1997b). Given these approximating distributions it is possible to obtain percentage points for the sample index, to suggest transformations to normality, and to obtain simple tests of hypothesis for the diversity index. Suppose that a population consists of s species having ordered relative abundances π = (π 1, π 2,..., π s ). Good (1953) suggested measuring diversity using an index of the form s H(α, β) = πi α { ln(π i )} β, i=1 defined for non-negative integer values of α and β. This attempted to give a more general diversity measure which included as special cases both H(1, 1), Shannon s (1948) index, and H(2, 0), Simpson s (1949) index. In practice a sample of size n is available, of which n i are observed belonging to species i. The relative abundance of species i can be estimated using p i = n i /n, and the generalized diversity index estimated by, s h(α, β) = p α i { ln(p i )} β. i=1 2

3 Baczkowski et al. (1997a,1998) further generalized Good s index so that (α, β) take values in the real plane R 2. They determined the range of values (α, β) for which H(α, β) satisfies two key properties of Pielou (1975, p.7), namely, P1: for fixed s, the index increases as the relative abundances become more equal, P2: if the relative abundances are equal then the index is an increasing function of s. For 0 < α the valid region is given by 0 β 4α(1 α), while for α 1 the valid region for β satisfies 0 β α α. In Baczkowski (1996) the mean µ h = E[h(α, β)] and the central moments µ r = E[{h(α, β) µ h } r ] for r = 2, 3, 4 are evaluated for large sample sizes n. The coefficients of skewness and kurtosis, given by β 1 = µ 3 /µ 1/5 2 and β 2 = µ 4 /µ 2 2 respectively, are also derived. In Baczkowski et al. (1997b) these moments are used to fit suitable distributions to h(α, β). The moments for certain special cases are easily written down. For example, putting α = 1 and β = 1 gives, for Shannon s index, E[h(1, 1)] H(1, 1) s 1 2n 1 H( 1, 0) H( 1, 0) H( 2, 0) + 12n n 3, H(1, 2) H(1, 1)2 µ 2 {h(1, 1)} + s 1 H(1, 1)H( 1, 0) H( 1, 1) H( 1, 0) + 1 n 2n2 6n 3. These latter two results agree with the expressions obtained by Bowman et al. (1971). The expressions for µ 3 {h(1, 1)} and µ 4 {h(1, 1)} were only derived by Bowman et al. (1971) to terms of order O(n 2 ), and can be shown to be, to terms of order O(n 3 ), µ 3 {h(1, 1)} H(1, 3) 3H(1, 2)H(1, 1) 3H(1, 2) + 2H(1, 1)3 + 3H(1, 1) 2 n 2 s 1 n 3, µ 4 {h(1, 1)} 3{H(1, 2) H(1, 1)2 } 2 n {H(1, 4) 4H(1, 3)H(1, 1) 8H(1, 3) 3H(1, 2)2 n3 + 12H(1, 2)H(1, 1) H(1, 2)H(1, 1) + 3H(1, 2)s + 9H(1, 2) 6H(1, 1) 4 16H(1, 1) 3 3H(1, 1) 2 s 9H(1, 1) 2 }. Putting α = 2 and β = 0 gives, for Simpson s index, E[h(2, 0)] = H(2, 0) + the terms in n 2 and n 3 being exactly zero. 1 H(2, 0) n + O(n 4 ), µ 2 {h(2, 0)} 4{H(3, 0) H(2, 0)2 } + 2{ 6H(3, 0) + 5H(2, 0)2 + H(2, 0)} n n 2 2{H(2, 0){3H(2, 0) + 1} 4H(3, 0)} n 3, 3

4 µ 3 {h(2, 0)} 8[H(2, 0){9H(3, 0) 5H(2, 0)2 } 4H(4, 0)] n 2 8{24H(4, 0) 45H(3, 0)H(2, 0) 4H(3, 0) + 22H(2, 0)3 + 3H(2, 0) 2 } n 3, µ 4 {h(2, 0)} 48[H(3, 0) H(2, 0)2 ] 2 n n 3 {25H(5, 0) 64H(4, 0)H(2, 0) 45H(3, 0) H(3, 0)H(2, 0) 2 + 3H(3, 0)H(2, 0) 57H(2, 0) 4 3H(2, 0) 3 }. In section 2 the moments of h(α, β) are determined for several different models and, in the case of h(1, 1), compared with the results from 5000 simulations. It is found that the theoretical results agree with those determined from simulation studies. In section 3 the calculated moments are used to fit a suitable Pearson distribution to the the diversity index. This allows the lower and upper percentage points to be calculated as well as the minimum width confidence interval. A comparison with the confidence interval obtained using the best fitting Gaussian distribution is made. Section 4 considers using the moments of h(α, β) to obtain Edgeworth expansions of the diversity index. is discussed. The use of such expansions in obtaining cumulative probabilities These expansions are used in section 5 to give series transformations to normality of the diversity index. The merits of standardising the index to have zero mean and variance unity prior to obtaining the series expansion are discussed in section 5.1. The Cornish-Fisher expansion reviewed in section 5.2 seems to require that h(α, β) be suitably standardised before the series expansion is undertaken. Section 6 suggests a functional transformation to normality might be used. In section 6.2 it is found that a logarithm transformation of h(α, β) does not give near-normality. The use of the moments of h(α, β) in one- and two-sample tests of hypothesis are discussed in section 7 and examples given. 4

5 2 Moments for several different models The program in appendix D evaluates the moments of h(α, β) in the general case. In the equiprobable case, so that π i = 1/s i, it is necessary to evaluate the moments of h(α, β) using a different procedure, see Baczkowski (1996). The moments of h(α, β) are evaluated for several different species abundance models. The two key models considered here are the the broken-stick model of MacArthur (1957) and the equiprobable model. Table 1 below gives the population proportions π i for the broken-stick model in the cases s = 3 and s = 10. Table 1: Relative frequencies π i for broken-stick model π s = 3 s = 10 π π π π π π π π π π Examples 1 to 4 below tabulate the moments of h(α, β) for a range of α and β values for the broken-stick model with s = 3 and s = 10 with sample sizes n = 100 and n = Also given for each example are the results of 5000 simulations for h(1, 1) produced by the MINITAB programs listed in Appendix A. For each set of simulations are produced a dotplot together with summary statistics, including the sample mean h, the sample moments m r for r = 2, 3, 4, and the sample skewness b 1 and kurtosis b 2. Example 5 repeats the above for the equiprobable model with s = 10 and n =

6 Example 1. Broken-stick model with s=10 and n=100. Table 2: Moments for broken-stick model with s = 10 and n = 100. α β µ h µ µ µ β1 β simulations for broken-stick model with s=10 and n=100. Each dot represents up to 20 points. :::.: :::::::::... :::::::::::::...:::::::::::::::::..::::::::::::::::::::...::::::::::::::::::::::::::.....:::::::::::::::::::::::::::::::::: h(1,1) MEAN MEDIAN STDEV MIN MAX Q1 Q h = , m = , m = , m = , b1 = , b 2 =

7 Example 2. Broken-stick model with s=10 and n=1000. Table 3: Moments for broken-stick model with s = 10 and n = α β µ h µ µ µ β1 β simulations for broken-stick model with s=10 and n=1000. Each dot represents up to 22 points. ::::::...::::::::. :::::::::::::..:::::::::::::::..:::::::::::::::::::.::::::::::::::::::::::......::::::::::::::::::::::::::::::: h(1,1) MEAN MEDIAN STDEV MIN MAX Q1 Q h = , m = , m = , m = , b1 = , b 2 =

8 Example 3. Broken-stick model with s=3 and n=100. Table 4: Moments for broken-stick model with s = 3 and n = 100. α β µ h µ µ µ β1 β ******** ********** *********** ******** ********** *********** *********** ******** ********** *********** ******** ********** *********** *********** ******** ********** *********** ******** ********** *********** *********** simulations for broken-stick model with s=3 and n=100. Each dot represents up to 27 points. : :.:.:::: ::.::::::::.:.::::::::::::..::::::::::::::::...:::::::::::::::::::......:::::::::::::::::::::::::: h(1,1) MEAN MEDIAN STDEV MIN MAX Q1 Q h = , m = , m = , m = , b1 = , b 2 =

9 Example 4. Broken-stick model with s=3 and n=1000. Table 5: Moments for broken-stick model with s = 3 and n = α β µ h µ µ µ β1 β *********** *********** *********** simulations for broken-stick model with s=3 and n=1000. Each dot represents up to 26 points. : :..::::::. ::::::::::..::::::::::::::..::::::::::::::::...::::::::::::::::::::.....:::::::::::::::::::::::::: h(1,1) MEAN MEDIAN STDEV MIN MAX Q1 Q h = , m = , m = , m = , b1 = , b 2 =

10 Example 5. Equiprobable model with s=10 and n=1000. Table 6: Moments for equiprobable model with s = 10 and n = α β H(α, β) µ h µ µ µ β1 β simulations for equiprobable model with s=10 and n=1000. Each dot represents up to 23 points. :::.....:::::::: ::::::::::::..:::::::::::::::..::::::::::::::::::.:.:::::::::::::::::::::......:.::::::::::::::::::::::::::::: h(1,1) MEAN MEDIAN STDEV MIN MAX Q1 Q h = , m = , m = , m = , b1 = , b 2 =

11 It can be seen in the above examples that the simulated results are close to the theoretical moments even for sample sizes n as small as 100. An alternative model for species abundance is that due to Sugihara (1980); see Baczkowski (1997) for further discussion of this model. The table below gives the population proportions π i for the Sugihara model with s = 3 species. Table 7: Relative frequencies π i for Sugihara model with s = 3 π 1 π 2 π The theoretical moments for the Sugihara model with s = 3 and n = 100 are given in Example 6 below. Example 6. Sugihara (1980) model with s=3 and n=100. Table 8: Moments for Sugihara model with s = 3 and n = 100. α β µ h µ µ µ β1 β ******** ********** *********** ******** ********** *********** ******** *********** ******** ********** *********** ******** ********** *********** ******** *********** ******** ********** *********** ******** ********** *********** ******** ***********

12 3 Fitting a Pearson distribution The Pearson system of classifying distributions is based on the two parameters β 1 and β 2 ; see, for example, Pearson and Merrington (1951). In this section the effect of fitting a suitable member of this class of distributions for h(α, β) is considered. A review of properties of the beta and gamma distributions is considered initially since it is found that the distribution of h(α, β) can be well approximated by these distributions. 3.1 Beta distribution of the first kind A random variable X is said to have a beta distribution (of the first kind) if its probability density function satisfies f(x) = xp 1 (1 x) q 1, 0 x 1; p, q > 0, B(p, q) where B(p, q) is the beta-function, satisfying B(p, q) = Γ(p)Γ(q) Γ(p + q) where Γ(.) denotes the gamma (factorial) function. If both p > 1 and q > 1 then X has a unique mode at (p 1)/(p + q 2). non-central moments of X satisfy These give, Thus, µ r = E[X r ] = B(p + r, q) B(p, q) = µ = E[X] = p p + q, and µ r = µ 2 = µ 3 = µ 4 = Γ(p + r)γ(p + q) Γ(p)Γ(p + q + r) for r 1. (p + r 1) (p + q + r 1) µ r 1 for r > 2. p(p + 1) (p + q)(p + q + 1), p(p + 1)(p + 2) (p + q)(p + q + 1)(p + q + 2), p(p + 1)(p + 2)(p + 3) (p + q)(p + q + 1)(p + q + 2)(p + q + 3). The The central moments µ r = E[(X µ) r ] satisfy, µ 2 = µ 3 = µ 4 = pq (p + q) 2 (p + q + 1), 2pq(q p) (p + q) 3 (p + q + 1)(p + q + 2), 3p 2 q 2 (p + q + 2) + 6pq(q p) 2 (p + q) 4 (p + q + 1)(p + q + 2)(p + q + 3). 12

13 The skewness and kurtosis are thus respectively given by, β 1 = µ2 3 µ 3 2 = 4(p + q + 1)(q p)2 pq(p + q + 2) 2, and β 2 = µ 4 µ 2 2 = 3(p + q + 1) (p + q + 3) + 3(p + q + 2)β 1 2(p + q + 3). 3.2 Gamma distribution A random variable X has a gamma distribution with parameter λ and index r if its probability density function has the form f(x) = λr x r 1 e λx, 0 x < ; λ, r > 0. Γ(r) The mode is at x = (r 1)/λ. The moments satisfy, µ = r λ, µ 2 = r(r + 1) λ 2, µ r(r + 1)(r + 2) 3 = λ 3, µ r(r + 1)(r + 2)(r + 3) 4 = λ 4. The central moments, skewness, and kurtosis are given by µ 2 = r λ 2, µ 3 = 2r λ 3, µ 3r(r + 2) 4 = λ 4, β 1 = 4 r, β 2 = The values of β 1 and β 2 lie on the line 2β 2 3β 1 6 = 0. 3(r + 2). r 3.3 Beta distribution of second kind A random variable X has a beta distribution of the second kind if its probability density function satisfies f(x) = x p 1, 0 x < ; p, q > 0. B(p, q)(1 + x) p+q The rth moment exists only if r < q and is then given by, µ r = B(p + r, q r) B(p, q) = Γ(p + r)γ(q r). Γ(p)Γ(q) The transformation x = z/(1 z), z = x/(1 + x), makes Z a beta distribution of the first kind with parameters p and q. The transformation x = pz/q, z = qx/p, makes Z F 2p,2q. 3.4 Pearson system of curves Karl Pearson proposed a system of frequency curves which arose as a solution of a simple differential equation. This lead to seven different types of Pearson curve, distinguishable by their skewness β 1 and kurtosis β 2. Of interest here are types I, III, and VI as suitable 13

14 Table 9: Summary of Pearson curves types I, III, and VI. Type Equation Origin for y Limits for y I f(y) = y 0 ( 1 + y a 1 ) m1 (1 y a 2 ) m2 Mode a 1 y a 2 III f(y) = y 0 e λy ( 1 + y a) aλ Mode a y < VI f(y) = y 0 (y a) q 2 y q 1 At a before a y < start of curve approximations to the distribution of h(α, β); see Baczkowski (1996). Pearson random variables in this section by Y. density function f(y) for these three types. We denote the Table 9 gives the form of probability The type III distribution gives the gamma distribution with parameter λ and index r = 1 + aλ by using the transformation x = a + y. skewness and kurtosis lie on the line 2β 2 3β 1 6 = 0. For the type III distribution the The type I distribution gives a beta distribution of the first kind by using the transformation It can be shown that x = a 1 + y a 1 + a 2, p = m 1 + 1, q = m m 1 a 1 = m 2 a 2. For the type I distribution, values of (β 1, β 2 ) satisfy 2β 2 3β 1 6 < 0. The type VI distribution gives a beta distribution of the first kind by using the transformation x = a/y, where q = q 2 +1 and p = q 1 q 2 1. The transformation x = a/(y a), y = a + (a/x), gives a beta distribution of the second kind, again with q = q and p = q 1 q 2 1. The transformation ( y = a 1 + mx ) n gives X F m,n, where m = 2(q 2 + 1) and n = 2(q 1 q 2 1). For the type VI distribution, values of (β 1, β 2 ) satisfy 2β 2 3β 1 6 > Fitting a Pearson distribution Suppose the data values are denoted by y with sample mean y, sample moments m r (y) for r = 2, 3, 4, sample skewness b 1 (y), and kurtosis b 2 (y). Since the skewness and kurtosis are invariant to linear transformation they are used to determine the appropriate Pearson distribution for the y-values. The specific parameters of this Pearson distribution can be fitted using the method of moments. For further details see Pearson and Hartley (1954). 14

15 As an example, consider fitting a Pearson type I distribution. minimum a 1 and maximum a 2. obtained using where c = a 1 + a 2. This gives The y-values have The usual beta distribution parameterization X is x = a 1 + y = a 1 + y, a 1 + a 2 c E[Y ] = ce[x] a 1, µ 2 (Y ) = c 2 µ 2 (X), µ 3 (Y ) = c 3 µ 3 (X), µ 4 (Y ) = c 4 µ 4 (X), where the moments of X are found from section 3.1. Once the pair {β 1 (Y ), β 2 (Y )} has given the type of distribution, equating β 1 (Y ) = β 1 (X) and β 2 (Y ) = β 2 (X), gives two equations which can be solved to give the powers m 1 and m 2 of the Pearson I distribution (and thus the parameters p and q for the beta distribution). The parameters c and a 1 may then be obtained using any of c = { } µ2 (Y ) 1/2 µ 2 (X) { } m2 (y) 1/2, µ 2 (X) { } µ3 (Y ) 1/3 { } m3 (y) 1/3 c =, µ 3 (X) µ 3 (X) { } µ4 (Y ) 1/4 { } m4 (y) 1/4 c =, µ 4 (X) µ 4 (X) The BASIC program in Appendix B below estimates c using the mean of the three c values above. Finally p a 1 = ce[x] E[Y ] c p + q y, a 2 = c a 1. It is not claimed that these estimates are optimal, but that they give a suitable approximating distribution to that of y which is sufficient for the purposes required of obtaining percentage points for the tail probabilities of the distribution. For fitting a Pearson curve to data values y it would be possible to derive maximum likelihood estimates of the required parameters. However, to fit a suitable distribution to h(α, β) we use the calculated theoretical moments of section 2. It is not then possible to obtain a set of maximum likelihood parameter estimates. Furthermore, it is not necessary that a 1 = 0 or a 2 = s 1 α {log(s)} β, the minimum and maximum theoretical values of h(α, β), since we are only interested in approximating to the exact distribution of h(α, β). 3.6 Newton method of deriving parameter estimates Suppose that a Pearson I distribution is to be fitted. To obtain the estimates of p and q for given (β 1, β 2 ) a 2-D Newton method may be used. Recall the 1-D Newton method to obtain the solution x = x 0 for the equation y(x) = y 0. Suppose y 0 = y(x 1 + h) = y(x 1 ) + hy (x 1 ) +..., 15

16 where h is a small increment about the trial solution x 1 and denotes differentiation with respect to x here. This gives h y 0 y(x 1 ) y (x 1 ) so that a better estimate of the solution x 0 than using x 1 is x 2 = x 1 + h x 1 + y 0 y(x 1 ) y. (x 1 ) This procedure is then iterated until (hopefully) convergence occurs. In 2-D we require the solution of the two equations where x = (x 1, x 2 ). Taylor expansions give, y 1 (x) = y 10, y 2 (x) = y 20, y 10 = y 1 (x 1 + h) y 1 (x 1 ) + h 1 y 1 x 1 + h 2 y 1 x , and Write ( h = h 1 h 2 ), y = y 20 = y 2 (x 1 + h) y 2 (x 1 ) + h 1 y 2 x 1 + h 2 y 2 x ( y 10 y 1 (x 1 ) y 20 y 2 (x 1 ) ), A = ( a 11 a 12 a 21 a 22 ) = ( y 1 / x 1 y 1 / x 2 y 2 / x 1 y 2 / x 2 ). The two equations can then be written y = Ah, whence h = A 1 y, where ( ) A 1 = 1 a 22 a 12, det A a 21 a 11 and det A = a 11 a 22 a 12 a 21. This gives, h 1 = a 22{y 10 y 1 (x 1 )} a 12 {y 20 y 2 (x 1 )}, det A and h 2 = a 11{y 20 y 2 (x 1 )} a 21 {y 10 y 1 (x 1 )}. det A The new estimate for the required solution is then x 2 = x 1 + h. For the Pearson type I distribution we have (x 1, x 2 ) = (p, q). The function y 1 denotes the skewness β 1 of the Pearson I curve written as a function of p and q, while y 2 denotes the corresponding kurtosis β 2. Also, y 10 and y 20 are the observed values of skewness and kurtosis respectively. These give, a 11 = β 1 p = 4(3p + q + 2)(p2 q 2 )(q + 1) p 2 q(p + q + 2) 3, 16

17 a 12 = β 1 q = 4(p + 3q + 2)(q2 p 2 )(p + 1) pq 2 (p + q + 2) 3, a 21 = β 2 p = 6(4p3 + 2p 2 q + 12p 2 3pq 2 5pq + 6p q 3 5q 2 6q)(p + q)(q + 1) p 2 q(p + q + 2) 2 (p + q + 3) 2, a 22 = β 2 q = 6(4q3 + 2pq q 2 3p 2 q 5pq + 6q p 3 5p 2 6p)(p + q)(p + 1) pq 2 (p + q + 2) 2 (p + q + 3) 2, so that det A = 24(p + 1)(q + 1)(p q)(p + q)4 (p + q + 1)(p + q + 2) p 3 q 3 (p + q + 2) 5 (p + q + 3) 2. Appendix B gives a BASIC program for fitting a type I Pearson distribution to a given set of moments. Example 1 Shannon s index with s = 3 and n = 1000 for the broken-stick model gives µ h = 0.900, µ 2 = , µ 3 = , µ 4 = , β 1 = , β 2 = The fitted Pearson type I distribution has p = , q = , c = , a 1 = Example 2 Shannon s index with s = 3 and n = 100 for the broken-stick model gives µ h = 0.891, µ 2 = , µ 3 = , µ 4 = , β 1 = 0.400, β 2 = The fitted Pearson type I distribution has p = 93.57, q = 14.86, c = , a 1 = Example 3 Shannon s index with s = 10 and n = 1000 for the broken-stick model gives µ h = 1.966, µ 2 = , µ 3 = , µ 4 = , β 1 = 0.098, β 2 = The fitted Pearson type I distribution has p = , q = , c = , a 1 = These were found using a grid search method as the iterative method described above breaks down in this case. Example 4 Shannon s index with s = 3 and n = 100 for the Sugihara model gives µ h = 0.846, µ 2 = , µ 3 = , µ 4 = , β 1 = 0.358, β 2 = The fitted Pearson type I distribution has p = , q = 17.76, c = , a 1 = As in example 3, these were found using a grid search method. 17

18 3.7 Obtaining percentage points of Pearson curves. Pearson and Hartley (1954) give tables (Biometrika tables) of percentage points for Pearson curves for given β 1 and β 2. The tables give lower and upper percentage points of the standardised deviate z = (y µ)/σ. These can be used to obtain the approximate percentage points for any given diversity index h(α, β). Example 1. Using Biometrika tables. Suppose that β 1 = 1, β 2 = 4, and µ 3 > 0. The relevant section of the tables of Pearson and Hartley (1954) are shown in Table 10. Table 10: Lower 5% and upper 5% point of standardised deviate. Lower 5% points Upper 5% points β 1 β 1 β Since µ 3 > 0, the lower and upper 5% points are 1.26 and respectively. From symmetry considerations, if µ 3 < 0 then the lower and upper 5% points are 1.93 and respectively. The shape of the distribution is indicated in Table 11 below. Table 11: Summary description of standardised deviate (y µ)/σ for β 1 = 1, β 2 = 4. Case µ 3 < 0 Case µ 3 > 0 Lower 5% Mean Mode Upper 5% Lower 5% Mode Mean Upper 5% point point point point > < For arbitrary (β 1, β 2 ), linear interpolation in both β 1 and β 2 is used. Suppose that the values X 00, X 01, X 11, and X 10 are tabulated and it is desired to obtain the percentage point at the point dividing the region [0, 1] [0, 1] at (θ, φ). The situation is shown in Table 12 below. Using linear interpolation gives X θ0 θx 10 + (1 θ)x 00, X 0φ φx 01 + (1 φ)x 00, X 1φ φx 11 + (1 φ)x 10, and X θ1 θx 11 + (1 θ)x 01. At the desired central location we would have the estimate X θφ θ{φx 11 + (1 φ)x 10 } + (1 θ){φx 01 + (1 φ)x 00 } = (1 θ)(1 φ)x 00 + θ(1 φ)x 10 + θφx 11 + (1 θ)φx

19 Table 12: Interpolation in two dimensions to obtain percentage points of Pearson curves. β 1 increasing β 2 X 00 X θ0 X 10 increasing X 0φ X θφ X 1φ X 01 X θ1 X 11 For example, if β 1 = 0.95 and β 2 = 4.15 then φ = 1 2 and θ = 3 4 points are so the percentage ( ) + ( ) + ( ) + ( ) = and ( ) + ( ) + ( ) + ( ) = The appropriate signs are then determined by the sign of µ 3. Example 2 Consider the case α = 2 and β = 0 corresponding to Simpson s index. For the brokenstick model with s = 10 and n = 1000 we obtain µ h = 0.172, µ 2 = σh 2 = , µ 3 > 0, β 1 = , β 2 = The given values of β 1 and β 2 suggest fitting a type VI distribution. The percentage points can be obtained from Table 13. Table 13: 2.5% points for standardised deviate (y µ)/σ. Lower 2.5% points Upper 2.5% points β 1 β 1 β The lower and upper 2.5% points for {h(2, 0) µ h }/σ h are and Thus the lower and upper 2.5% points for the diversity index h(2, 0) are and respectively. Example 3 Shannon s index with s = 3 and n = 1000 for broken-stick model gives µ h = 0.900, σh 2 = , µ 3 = , µ 4 = , β 1 = , β 2 = The fitted Pearson type I distribution has p = , q = , c = , a 1 =

20 The upper and lower 2.5% percentage points can be obtained from Table 14 below. Table 14: 2.5% points for standardised deviate (y µ)/σ. Lower 2.5% points Upper 2.5% points β 1 β 1 β With θ = and φ = 0.1, the lower and upper 2.5% points for {h(1, 1) µ h }/σ h are and Thus the lower and upper 2.5% points for h(1, 1) are and respectively. As a check, numerical integration of the fitted beta distribution gives the same values. Example 4 Shannon s index with s = 3 and n = 100 for broken-stick model gives µ h = 0.891, σh 2 = , µ 3 = , µ 4 = , β 1 = 0.400, β 2 = The fitted Pearson type I distribution has p = 93.57, q = 14.86, c = , a 1 = The upper and lower 2.5% percentage points can be obtained from Table 15 below. Table 15: 2.5% points for standardised deviate (y µ)/σ. Lower 2.5% points Upper 2.5% points β 1 β 1 β The lower and upper 2.5% points for {h(1, 1) µ h }/σ h are and respectively. Thus the lower and upper 2.5% points for h(1, 1) are and respectively. As a check, numerical integration gives the same values. Example 5 Shannon s index with s = 10 and n = 1000 for broken-stick model gives µ h = 1.966, σh 2 = , µ 3 = , µ 4 = , β 1 = 0.098, β 2 = The values of β 1 and β 2 imply a Pearson type I distribution, but unfortunately it is so close to a normal distribution that the parameter estimates are unreliable. The upper and lower 2.5% percentage points can be obtained from Table 16 below. 20

21 Table 16: 2.5% points for standardised deviate (y µ)/σ. Lower 2.5% points Upper 2.5% points β 1 β 1 β The lower and upper 2.5% points for {h(1, 1) µ h }/σ h are 2.01 and Thus the lower and upper 2.5% points for h(1, 1) are and respectively. 3.8 Confidence intervals for the diversity index Suppose that it is required to obtain a 95% confidence interval for h(α, β). The percentage points for the fitted Pearson distribution can be used to obtain the approximate upper and lower 2.5% values for the distribution. Unfortunately, as the distribution of h(α, β) is not symmetric, this approximate confidence interval will not have minimum width. Numerical integration of the area under the fitted Pearson distribution will allow the minimum width confidence interval to be derived. Appendix C gives a BASIC program which performs this calculation. A simplistic assumption is that the diversity index h(α, β) is an approximate Gaussian variable with mean µ h and variance σh 2. An approximate 95% confidence interval is then given by µ h ± 1.96σ h. Example 1 Shannon s index with s = 3 and n = 1000 for the broken-stick model gives µ h = 0.900, σh 2 = , β 1 = , β 2 = The fitted Pearson type I distribution gives lower and upper 2.5% points for h(1, 1) as and respectively. The minimum width 95% confidence interval is (0.864, 0.935). The 95% confidence interval using a Gaussian assumption is (0.864, 0.936). Example 2 Shannon s index with s = 3 and n = 100 for the broken-stick model gives µ h = 0.891, σh 2 = , β 1 = 0.400, β 2 = The fitted Pearson type I distribution gives lower and upper 2.5% points for h(1, 1) as and respectively. The minimum width 95% confidence interval is (0.775, 1.000). The 95% confidence interval using a Gaussian assumption is (0.777, 1.005). 21

22 Example 3 Shannon s index with s = 10 and n = 1000 for the broken-stick model gives µ h = 1.966, σh 2 = , β 1 = 0.098, β 2 = For the corresponding Pearson type I distribution the lower and upper 2.5% points for h(1, 1) are and respectively. The 95% confidence interval using a Gaussian assumption is (1.921, 2.011). A minimum width confidence interval is not available because precise parameter estimates for the Pearson type I distribution could not be found due to the closeness to a Gaussian distribution in this case. Example 4 Shannon s index with s = 3 and n = 100 for the Sugihara model gives µ h = 0.846, σh 2 = , µ 3 = , µ 4 = , β 1 = 0.358, β 2 = For the corresponding Pearson type I distribution the lower and upper 2.5% points for h(1, 1) are and respectively. The minimum width 95% confidence interval is (0.717, 0.968). The 95% confidence interval using a Gaussian assumption is (0.719, 0.973). 4 Fitting a Gram-Charlier Type A series Following Kendall and Stuart (1977), suppose that a random variable X has probability density function f(x), and can be expanded as a power series of orthogonal Chebyshev- Hermite polynomials. Thus, where and f(x) = c r = 1 r! c r H r (x)α(x) r=0 f(x)h r (x)dx, α(x) = 1 2π e 1 2 x2, and H r (x) denote the Chebyshev-Hermite polynomials with H 0 = 1, H 1 = x, H 2 = x 2 1, H 3 = x 3 3x, H 4 = x 4 6x For arbitrary X the expansion gives { f(x) α(x) 1 + µh (µ 2 1)H (µ 3 3µ)H } 24 (µ 4 6µ 2 + 3)H For X standardised to have zero mean this gives { f(x) α(x) (µ 2 1)H µ 3H } 24 (µ 4 6µ 2 + 3)H

23 For standardised X, so having zero mean and variance unity, this gives { f(x) α(x) µ 3H } 24 (µ 4 3)H , where the moments µ 3 and µ 4 are for the standardised variable. These three expansions adjust for skewness and kurtosis. The latter series for the standardised variable may be written as { f(x) α(x) κ 3H } 24 κ 4H , where κ 3 and κ 4 denote the cumulants of the standardised measure. This is the Edgeworth form of the Type A series. Recall the link between cumulants κ r and moments µ r, κ 1 = µ, κ 2 = µ 2, κ 3 = µ 3, κ 4 = µ 4 3µ 2 2. Thus, for the standardised variable, κ 3 = β 1 and κ 4 = β Using the Edgeworth expansion One use for the Edgeworth expansion is to derive approximate cumulative probabilities for the variable X. These are easily found since { f(x) α(x) κ 3H } 24 κ 4H gives, on integration, since Example 1 x0 f(x)dx x0 x0 { 1 α(x)dx α(x 0 ) 6 κ 3H 2 (x 0 ) + 1 } 24 κ 4H 3 (x 0 ) α(x)h r (x)dx = α(x 0 )H r 1 (x 0 ). Shannon s index with s = 3 and n = 100 for the broken-stick model gives µ h = 0.891, σ 2 h = , β 1 = 0.400, β 2 = Table 17 below shows the cumulative probabilities evaluated for two values x 0 which give cumulative probabilities and for the fitted Pearson type I distribution. It also gives the corresponding cumulative probability for the three Gram-Charlier series: Series 1 for the unstandardised variable. Series 2 for the variable standardised to have zero mean. Series 3, the Edgeworth expansion with variable standardised to have zero mean and variance unity. 23

24 Table 17: Cumulative probabilities for Pearson fitted curve and Gram-Charlier expansions. Value x 0 Pearson I Series 1 Series 2 Series 3 curve As can be seen from Table 17, only the standardised variable has a satisfactory series expansion. This is perhaps because the probability density function of h(α, β), being well represented by a beta type I distribution and thus being a high order polynomial, is not well represented by such a short series expansion. 5 Polynomial transformation to normality Following Kendall and Stuart (1977), suppose that a random variable X has cumulants κ r (x) for r = 1, 2, 3, 4, and it is desired to obtain a suitable polynomial transformation, z = a 0 + a 1 x + a 2 x 2 + a 3 x , which makes the variable Z an approximate Gaussian variable. We suppose that κ r (x) has order n 1 r, where n denotes the sample size. While Kendall and Stuart develop the theory to allow Z to have arbitrary mean and variance, suppose that we simply require Z to have the same mean and variance as X, so that µ z = κ 1 (x) and σ 2 z = κ 2 (x). Let l 3 = κ 3(x) σ 3 z, l 4 = κ 4(x) σz 4. Then l 3 = O(n 1 2 ) and l 4 = O(n 1 ). It can then be shown that the required transformation is, to order n 1, z = x 1 6 l 3(x 2 1) 1 24 l 4(x 3 3x) l2 3(4x 3 7x). Omission of the last two terms gives normality to order n 1 2 only. To obtain cumulative probabilities for X it is possible to write X as a polynomial series of the Gaussian random variable Z. See Kendall and Stuart (1977, p ). Examples 1 and 2 below compare the order n 1 2 and n 1 expansions for the variable X and its standardised form. Examples 3 and 4 compare the results of this series transformation to normality for Shannon s index h(1, 1) when using sample moments from simulations and when using the theoretical moments calculated in sections 2. For examples 3 and 4 the statistic h(1, 1) is not standardised prior to transformation. Section 5.1 considers the effect of standardising the diversity index prior to obtaining the series expansion. 24

25 Example 1 Suppose that U 1 and U 2 are independent uniform random variables on the interval [0, 1) and let X = U 1 + U 2. For three thousand independent simulations of the variable X, estimates of the mean m, standard deviation s, skewness b 1 and kurtosis b 2 were obtained. These sample moments were used to derive the polynomial expansions to order n 1 2 and to order n 1. Summary statistics for these expansions were obtained and are given in Table 18. The simulated x-values were then standardised, taking (x m)/s, and the polynomial expansions again derived. The results are shown in Table 18. Table 18: Comparing series expansions for example 1. Statistic Mean m St.dev. s Skewness b 1 Kurtosis b 2 X O(n 1 2 ) series O(n 1 ) series (X m)/s O(n 1 2 ) series O(n 1 ) series The order n 1 2 expansion, ignoring the κ 4 term, has not altered the kurtosis very much. For this example, the best result would appear to be the order n 1 expansion for the standardised x-values, this giving skewness closest to zero and kurtosis closest to 3.0, the Gaussian case. Example 2 Suppose that U 1 and U 2 are independent uniform random variables on the interval [0, 1) and let X = U 1 + 2U 2. Three thousand independent simulations of the variable X were done and the study of example 1 above repeated. The results are tabulated below in Table 19 below. As for example 1, the best result would appear to be the order n 1 expansion for the standardised x-values. Unfortunately it would be unwise to generalize this conclusion; Kendall and Stuart advise examining each case individually. Example 3 For 5000 simulations of h(1, 1) for the broken-stick model with s = 3 and n = 100 the sample moments m, s, b 1 and b 2 were obtained. These were used to derive polynomial expansions to orders n 1 2 and n 1. The summary statistics for these series expansions were obtained and are tabulated below. As a follow up, the population moments calculated in 25

26 Table 19: Comparing series expansions for example 2. Statistic Mean m St.dev. s Skewness b 1 Kurtosis b 2 X O(n 1 2 ) series O(n 1 ) series (X m)/s O(n 1 2 ) series O(n 1 ) series section 2 were then used to derive the series expansions. The summary statistics using these series expansions are also shown below. Table 20: Series expansions using sample and population moments for example 3. Statistic m s b1 b 2 h(1, 1) Using sample moments: O(n 1 2 ) series O(n 1 ) series Using population moments: O(n 1 2 ) series O(n 1 ) series It can be seen that using the moments obtained in section 2 and taking the order n 1 expansion gives a marginally better approximation to normality. Example 4 For 5000 simulations of h(1, 1) for the broken-stick model with s = 3 and n = 1000 summary statistics were obtained. These were used to derive series expansions to orders n 1 2 and n 1. The summary statistics for these series expansions were obtained and are tabulated below. As in example 3, the moments derived in section 2 were also used to derive the series expansions. The results are also shown below. It can be seen that using the moments obtained in section 1 does not here give a better approximation to normality. Furthermore, the order n 1 2 approximation seems to give results marginally closer to normality. 26

27 Table 21: Series expansions using sample and population moments for example 4. Statistic m s b1 b 2 h(1, 1) Using sample moments: O(n 1 2 ) series O(n 1 ) series Using population moments: O(n 1 2 ) series O(n 1 ) series Example 5 For 5000 simulations of h(1, 1) for the equiprobable model with s = 10 and n = 1000 summary statistics were obtained. These were used to derive series expansions to orders n 1 2 and n 1. The summary statistics for these series expansions were obtained and are tabulated below. This procedure was again repeated but using the population moments given in section 2 to obtain the series expansions. Table 22: Series expansions using sample and population moments for example 5. Statistic m s b1 b 2 h(1, 1) Using sample moments: O(n 1 2 ) series O(n 1 ) series Using population moments: O(n 1 2 ) series O(n 1 ) series The conclusions are similar to those of example Standardising the variable before obtaining the polynomial transformation The results of examples 1 and 2 in section 5 suggested that the transformation to normality was better for standardised variables. Section 5.2 also suggests that the inverse transformation of x in terms of z is valid if x is standardised first. Thus suppose that y = (x µ x )/σ x where µ x = κ 1 (x) and σ 2 x = κ 2 (x). Then y has cumulants κ 1 (y) = 0, κ 2 (y) = 1, κ 3 (y) = κ 3 (x)/σ 3 x, and κ 4 (y) = κ 4 (x)/σ 4 x. 27

28 Thus, if Z is to have zero mean and unit variance, then z = y 1 6 l 3(y 2 1) 1 24 l 4(y 3 3y) l2 3(4y 3 7y), where l 3 = κ 3 (x)/σ 3 x = µ 3 (x)/σ 3 x and l 4 = κ 4 (x)/σ 4 x = {µ 4 (x)/σ 4 x} 3. Example 1 Consider now 5000 simulations of h(1, 1) for several different models. For each set of simulations were calculated the sample mean m, the sample variance s 2, and the third and fourth moments, m 3 and m 4. For each model the diversity index was standardised using the sample moments, taking {h(1, 1) m}/s. The sample moments were then used to obtain the appropriate order n 1 2 and n 1 series expansions. The summary statistics for these transformations were then obtained and the results shown below in Table 23. The table shows, for each model, the summary statistics for the standardised index, and for the two series transformations to normality. Table 23: Series expansions for standardised Shannon s index obtained using sample moments for example 1. Model m s b1 b 2 Broken-stick with s = 3 and n = O(n 1 2 ) series O(n 1 ) series Broken-stick with s = 3 and n = O(n 1 2 ) series O(n 1 ) series Broken-stick with s = 10 and n = O(n 1 2 ) series O(n 1 ) series Broken-stick with s = 10 and n = O(n 1 2 ) series O(n 1 ) series Equiprobable with s = 10 and n = O(n 1 2 ) series O(n 1 ) series It can be seen that the order n 1 series gives a good approximation to normality for the broken-stick model. For the equiprobable model however the results are not so clear. Inspection of the dotplot for the simulated values and their transformed values suggests that the distribution 28

29 of the order n 1 2 transformed values appears to be lower truncated, while the order n 1 transformed values has a very long left tail. In practice only a single observed value of a diversity index based on s species and n observations will be available. A polynomial transformation to normality would have to be based on the calculated population moments of section 2. Example 2 For each of the models considered in example 1 above the population moments have been obtained in section 2. Thus the mean µ h, the variance σh 2, and the higher moments µ 3 and µ 4 are known for Shannon s index h(1, 1). These values were used to standardise the index, calculating {h(1, 1) µ h }/σ h. These population moments were also used to obtain the coefficients l 3 and l 4 in the series transformations to normality. For the same simulations of Shannon s index as in example 1 the sample moments of the standardised index and the series transformations to normality were obtained and are shown in Table 24 below. Note that the standardised values do not necessarily have zero mean and unit variance. Table 24: Series expansions for standardised Shannon s index obtained using calculated moments for example 2. Model m s b1 b 2 Broken-stick with s = 3 and n = O(n 1 2 ) series O(n 1 ) series Broken-stick with s = 3 and n = O(n 1 2 ) series O(n 1 ) series Broken-stick with s = 10 and n = O(n 1 2 ) series O(n 1 ) series Broken-stick with s = 10 and n = O(n 1 2 ) series O(n 1 ) series Equiprobable with s = 10 and n = O(n 1 2 ) series O(n 1 ) series It can be seen that the order n 1 2 expansions give similar results to those in example 29

30 1 when the diversity index was properly standardised, whereas the order n 1 expansion breaks down in example Cornish-Fisher transformation In the last section a power series expansion of x was obtained which give a closer approximation to normality than x itself. Inversion of this series gives x in terms of z. Suppose that X has cumulants κ r (x) and it is desired to make Z have mean µ z and variance σ 2 z. Let l 1 = κ 1(x) µ z, l 2 = κ 2(x) σz 2 σ z σ 2 z, l 3 = κ 3(x) σ 3 z, l 4 = κ 4(x) σz 4. Then the series expansion of x in powers of z, to terms of order n 1 has the form x = z + l l 3(z 2 1) l 2z l 4(z 3 3z) 1 36 l2 3(2z 2 5z); see Kendall and Stuart (1977). Omission of the last three terms gives an expansion up to order n 1 2. Unfortunately this expression does NOT seem to give the Cornish-Fisher expansion of x in terms of a standardised variable z as used by, for example, Johnson (1978). Instead consider the standardised variable y = (x µ x )/σ x having zero mean, unit variance, and cumulants κ 3 (y) = µ 3 (x)/σ 3 x and κ 4 (y) = {µ 4 (x) 3σ 4 x}/σ 4 x. standard normal variable we have l 1 = l 2 = 0, l 3 = κ 3 (y), l 4 = κ 4 (y) so that y = z + µ 3(x) 6σ 3 x (z 2 1) + {µ 4(x) 3σ 4 x} 24σ 4 x Thus x can be written, to order n 1, x = µ x + σ x z + µ 3(x) 6σ 2 x (z 2 1) + {µ 4(x) 3σ 4 x} 24σ 3 x (z 3 3z) {µ 3(x)} 2 36σx 6 (2z 2 5z). Exclusion of the last two terms gives the expansion to order n 1 2. (z 3 3z) {µ 3(x)} 2 36σx 5 (2z 2 5z). For Z a Example 1 Suppose that X has mean µ x, variance σx, 2 and third moment µ 3 (x). Then the sample mean X based on n independent observations has mean µ x, variance σx/n, 2 and third moment µ 3 (x) = µ 3 (x)/n 2. If Z is a standard normal variable, then x µ x + σ x z + µ 3(x) n 6nσx 2 (z 2 1) Johnson (1978) uses Cornish-Fisher transformations of both x and the sample variance s 2 to give an improved t-statistic which allows for skewness of the variable X. Careful choice of constants for the test statistic ensures that the term in z 2 vanishes. 30

Continuous random variables

Continuous random variables probability density function (f(x)) the probability distribution function of a continuous random variable (analogous to the probability mass function for a discrete random variable),