A Note About Maximum Likelihood Estimator in Hypergeometric Distribution

Comuicacioes e Estadística Juio 2009, Vol. 2, No. 1 A Note About Maximum Likelihood Estimator i Hypergeometric Distributio Ua ota sobre los estimadores de máxima verosimilitud e la distribució hipergeométrica Hawe Zhag a hawezhag@usta.edu.co Abstract The method of maximum likelihood estimatio is oe of the most importat statistical techiques, ad it is widely used by statistical scietists. However for the hypergeometric distributio Hg(, R, N, the maximum likelihood estimators of N ad R are ot clear i most of the statistical texts. I this paper, rigorous procedures i order to fid the maximum likelihood estimator of N ad R i a hypergeometric distributio are preseted. Key words: Hypergeometric distributio, maximum likelihood estimatio.. Resume El método de la estimació de máxima verosimilitud es ua de las técicas más importates de la estadística y es utilizado ampliamete por los profesioales estadísticos. Si embargo, para la distribució hipergeométrica Hg(, R, N, los estimadores de máxima verosimilitud de N y R o so claros e la mayoría de los textos estadísticos. E este artículo, se preseta los procedimietos rigurosos para ecotrar estos estimadores de máxima verosimilitud. Palabras clave: Distribució hipergeométrica, estimació de máxima verosimilitud.. 1 Itroductio The method of maximum likelihood estimatio is oe of the most importat statistical techique, ad is oe of the cocepts that every studet of statistics should a Docete. CIEES, Uiversidad Sato Tomás 1

2 Hawe Zhag be familiarized with. The maximum likelihood estimatio of parameters i most of the probabilistic distributio is easy to hadel, but the problem of estimatig the populatio size i the capture-recapture problem (see Ardilly & Tillé (2006, that is, the problem of estimatig N i a hypergeometric distributio Hg(, R, N, is ot very clear i statistical iferece texts, sice this problem is ot cosidered i most of them, for example, Mood & Boes (1974, Bickel & Doksum (2001 ad Casella & Berger (2002. Although the estimator of maximum likelihood of N is give by Shao (2003, the procedure is ot quite clear. Also we may eed to estimate the parameter R, that is the size of a subpopulatio, ad the same situatio occurs with this problem of estimatio. The goal of this paper is to provide the rigorous procedures i order to fid the maximum likelihood of the parameters N ad R. 2 Estimatig N Suppose that X Hg(, R, N, ad we eed to estimate the poblatio size N, the likelihood fuctio is: ( R N R L(N = L(x, N = x( x ( N, (1 if max( N + R, 0 x mi(r,. The fuctio L(N is a discreet fuctio, so the way to fid the maximum is ot takig derivatives with respect to N, but fidig out where L(N is a creasig fuctio of N, ad where is decreasig. I this way, we compute the ratio D(N = L(N/L(N 1, ad we fid out where D(N > 1 ad where D(N < 1. With a simple algebraic reasoig, we have: which is equivalet to D(N = D(N = ( N R ( N 1 x ( N ( N R 1 x > 1, (N R!(N 1!(N!(N R 1 + x! (N R 1!N!(N 1!(N R + x! > 1. Cacelig all posible factorials, it ca be show that D(N > 1 if ad oly if (N R(N > N(N R + x, so the values of N where D(N > 1 are defied by N < R/x. Similarly, D(N < 1 for N > R/x. Based o the former argumet, some teachers mistakely establish that the maximum likelihood estimator of N is R/X. The previous statemet is ot correct, sice accordig to the defiitio of the maximum likelihood estimator, the realizatio of the estimator must fall i the parameter space. Ad clearly the realizatio of the statistic R/X may be fractioal, ad it is immediately disqualified to be the maximum likelihood estimator of N. Comuicacioes e Estadística, juio 2009, Vol. 2, No. 1

A Note About Maximum Likelihood Estimator i Hypergeometric Distributio 3 Likelihood 0.0 0.1 0.2 0.3 0.4 0.5 0.6 5 7 9 11 13 15 17 19 21 23 25 N Figure 1: Likelihood fuctio with R = 5, = 4 ad x = 3. Shao (2003, pg.277 gives the correct maximum likelihood estimator of N, that is, ˆN MV = [R/X] where [a] deote the iteger part of a. Note that the creasig ad decreasig property of L(N leads two cadidates to be the maximum likelihood estimator of N: [R/X] ad [R/X] + 1. However Shao draw the coclusio without a further commet. To verify that the maximum likelihood estimator N is, ideed, [R/X], we recall that D(N < 1 if N > R/x. It is clear that [R/X] + 1 > R/x, so D([R/X] + 1 < 1 that is L([R/X] + 1/L([R/X] < 1, so we coclude that L([R/X] + 1 < L([R/X]. (2 Further more we ca coclude that [R/X] is the uique maximum likelihood estimator sice i (2 we ca ever have the equality. Suppose that R = 5, = 4 ad x = 3, i this case, [R/x] = [6.67] = 6 is the maximum likelihood estimate of N. Figure 1 illustrates the likelihood fuctio L(N where clearly the maximum situates at 6. 3 Estimatig R Suppose that the parameter of iterest is R, the umber of idividuals with some specific characteristic i a populatio of size N, ad i a sample of size, x idividuals with this characteristic are selected. So we cosider the likelihood fuctio as a fuctio of R, that is ( R N R L(R = x( x ( N, (3 Comuicacioes e Estadística, juio 2009, Vol. 2, No. 1

4 Hawe Zhag ad accordig to the discussed i the previous sectio, we compute the ratio D(R = L(R/L(R + 1, ad we fid out for what values of R, D(R > 1 ad for what values, D(R < 1. We have is equivalet to D(R = D(R = ( R N R x( > 1 (4 ( R+1 x x ( N R 1 x (R + 1 x(n R > 1, (5 (R + 1(N R + x which is equivalet to (R + 1 x(n R > (R + 1(N R + x. Ad this x(n + 1 leads to D(R > 1 if ad oly if R >, that is, L(R is decreasig for itegers larger tha x(n + 1 ad icreasig for itegers smaller tha x(n + 1. Ad the ituitio leads that ˆR MV = x(n + 1 true whe is a iteger, ad i this case D(R = 1, so ( ( x(n + 1 x(n + 1 L = L + 1, ad the maximum likelihood estimator may be either of 1, ad is ot uique. x(n + 1, but this is x(n + 1 ad x(n + 1 + x(n + 1 x(n + 1 If is ot iteger, the value of R that maximizes L(R is [ ] x(n + 1 or [ ] + 1 cosiderig that the estimator of R must take values i the set of the itegers. However, recallig the previous argumet we ca state that ( L [ ( x(n + 1 ] < L [ x(n + 1 ] + 1 sice [ x(n + 1 ] < x(n + 1, ad i this case, ˆR x(n + 1 x(n + 1 MV = [ ] + 1 = [ ]. Summarizig, we have that x(n + 1 x(n + 1 x(n + 1 1 or if is a iteger ˆR MV = x(n + 1 x(n + 1 [ ] if is ot a iteger (6 Comuicacioes e Estadística, juio 2009, Vol. 2, No. 1

A Note About Maximum Likelihood Estimator i Hypergeometric Distributio 5 Likelihood 0.0 0.1 0.2 0.3 0.4 2 4 6 8 10 12 14 16 R Figure 2: Likelihood fuctio with N = 19, = 5 ad x = 2. x(n + 1 Suppose that N = 19, = 5 ad x = 2, i this case, = 8 is a iteger, so the maximum likelihood estimate of R may be 7 or 8. Figure 2 illustrates the likelihood fuctio L(R where clearly, the maximum of the fuctio situates at 7 ad 8. x(n + 1 Now suppose that the N = 20, = 5 ad x = 2, the = 8.4, so the maximum likelihood estimate of R is [8.4] = 8. Figure 3 illustrates the likelihood fuctio L(R where the maximum of the fuctio situates at 8. 4 Coclusio I this paper, the procedures of maximum likelihood estimatio i hypergeometric distributio are preseted, ad we poit out that because the parameters N ad R are itegers, the maximizatio of the likelihood fuctio eeds to be hadled carefully with more detailed aalysis. Refereces Ardilly, P. & Tillé, Y. (2006, Samplig Methods: Spriger. Exercises ad Solutios., Bickel, P. & Doksum, K. (2001, Mathematical Statistics. Basic Ideas ad Selected Tipics., Vol. I, secod ed, Pretice-Hall. Casella, G. & Berger, R. (2002, Statistical Iferece., secod ed, Duxbury Pres. Comuicacioes e Estadística, juio 2009, Vol. 2, No. 1

6 Hawe Zhag Likelihood 0.0 0.1 0.2 0.3 0.4 2 4 6 8 10 12 14 16 R Figure 3: Likelihood fuctio with N = 20, = 5 ad x = 2. Mood, A.M, G. F. & Boes, D. (1974, Itroductio to the theory of statistics., Iteratioal editio. McGraw Hill. Shao, J. (2003, Mathematical statistics., secod ed, Spriger. Comuicacioes e Estadística, juio 2009, Vol. 2, No. 1