Discriminating Between The Log-normal and Gamma Distributions

Similar documents
5. Best Unbiased Estimators

Bayes Estimator for Coefficient of Variation and Inverse Coefficient of Variation for the Normal Distribution

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

Parametric Density Estimation: Maximum Likelihood Estimation

1 Random Variables and Key Statistics

14.30 Introduction to Statistical Methods in Economics Spring 2009

1 Estimating sensitivities

. (The calculated sample mean is symbolized by x.)

CHAPTER 8 Estimating with Confidence

Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

Estimating Proportions with Confidence

A New Constructive Proof of Graham's Theorem and More New Classes of Functionally Complete Functions

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

Department of Mathematics, S.R.K.R. Engineering College, Bhimavaram, A.P., India 2

ECON 5350 Class Notes Maximum Likelihood Estimation

Discriminating between the log-normal and generalized exponential distributions

Statistics for Economics & Business

Exam 1 Spring 2015 Statistics for Applications 3/5/2015

Introduction to Probability and Statistics Chapter 7

Lecture 4: Probability (continued)

18.S096 Problem Set 5 Fall 2013 Volatility Modeling Due Date: 10/29/2013

Sampling Distributions and Estimation

An Empirical Study of the Behaviour of the Sample Kurtosis in Samples from Symmetric Stable Distributions

Maximum Empirical Likelihood Estimation (MELE)

Lecture 9: The law of large numbers and central limit theorem

5 Statistical Inference

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

NOTES ON ESTIMATION AND CONFIDENCE INTERVALS. 1. Estimation

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Research Article The Probability That a Measurement Falls within a Range of n Standard Deviations from an Estimate of the Mean

Control Charts for Mean under Shrinkage Technique

x satisfying all regularity conditions. Then

Topic-7. Large Sample Estimation

A Bayesian perspective on estimating mean, variance, and standard-deviation from data

Rafa l Kulik and Marc Raimondo. University of Ottawa and University of Sydney. Supplementary material

Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

Topic 14: Maximum Likelihood Estimation

AY Term 2 Mock Examination

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

FINM6900 Finance Theory How Is Asymmetric Information Reflected in Asset Prices?

Confidence Intervals based on Absolute Deviation for Population Mean of a Positively Skewed Distribution

0.1 Valuation Formula:

Lecture 5: Sampling Distribution

1. Suppose X is a variable that follows the normal distribution with known standard deviation σ = 0.3 but unknown mean µ.

ii. Interval estimation:

These characteristics are expressed in terms of statistical properties which are estimated from the sample data.

A point estimate is the value of a statistic that estimates the value of a parameter.

Monetary Economics: Problem Set #5 Solutions

Chapter 8: Estimation of Mean & Proportion. Introduction

Section Mathematical Induction and Section Strong Induction and Well-Ordering

4.5 Generalized likelihood ratio test

A New Approach to Obtain an Optimal Solution for the Assignment Problem

Estimation of Population Variance Utilizing Auxiliary Information

The material in this chapter is motivated by Experiment 9.

SELECTING THE NUMBER OF CHANGE-POINTS IN SEGMENTED LINE REGRESSION

Estimating the Parameters of the Three-Parameter Lognormal Distribution

Unbiased estimators Estimators

Math 124: Lecture for Week 10 of 17

Sequences and Series

An Improved Estimator of Population Variance using known Coefficient of Variation

Monopoly vs. Competition in Light of Extraction Norms. Abstract

Non-Inferiority Logrank Tests

BASIC STATISTICS ECOE 1323

ASYMPTOTIC MEAN SQUARE ERRORS OF VARIANCE ESTIMATORS FOR U-STATISTICS AND THEIR EDGEWORTH EXPANSIONS

Subject CT1 Financial Mathematics Core Technical Syllabus

Math 312, Intro. to Real Analysis: Homework #4 Solutions

Anomaly Correction by Optimal Trading Frequency

Exam 2. Instructor: Cynthia Rudin TA: Dimitrios Bisias. October 25, 2011

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

Simulation Efficiency and an Introduction to Variance Reduction Methods

Sampling Distributions & Estimators

Quantitative Analysis

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Asymptotics: Consistency and Delta Method

AMS Portfolio Theory and Capital Markets

EXERCISE - BINOMIAL THEOREM


The Valuation of the Catastrophe Equity Puts with Jump Risks

SUPPLEMENTAL MATERIAL

Dr. Maddah ENMG 624 Financial Eng g I 03/22/06. Chapter 6 Mean-Variance Portfolio Theory

Overlapping Generations

AUTOMATIC GENERATION OF FUZZY PAYOFF MATRIX IN GAME THEORY

Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

Models of Asset Pricing

Models of Asset Pricing

Models of Asset Pricing

Random Sequences Using the Divisor Pairs Function

CHANGE POINT TREND ANALYSIS OF GNI PER CAPITA IN SELECTED EUROPEAN COUNTRIES AND ISRAEL

Chapter 10 - Lecture 2 The independent two sample t-test and. confidence interval

Journal of Statistical Software

EVEN NUMBERED EXERCISES IN CHAPTER 4

Subject CT5 Contingencies Core Technical. Syllabus. for the 2011 Examinations. The Faculty of Actuaries and Institute of Actuaries.


Neighboring Optimal Solution for Fuzzy Travelling Salesman Problem

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution

Limits of sequences. Contents 1. Introduction 2 2. Some notation for sequences The behaviour of infinite sequences 3

Lecture 5 Point Es/mator and Sampling Distribu/on

Transcription:

Discrimiatig Betwee The Log-ormal ad Gamma Distributios Debasis Kudu & Aubhav Maglick Abstract For a give data set the problem of selectig either log-ormal or gamma distributio with ukow shape ad scale parameters is discussed. It is well kow that both these distributios ca be used quite effectively for aalyzig skewed o-egative data sets. I this paper, we use the ratio of the maximized likelihoods i choosig betwee the log-ormal ad gamma distributios. We obtai asymptotic distributios of the ratio of the maximized likelihoods ad use them to determie the miimum sample size required to discrimiate betwee these two distributios for user specified probability of correct selectio ad tolerace limit. Key Words ad Phrases: Asymptotic distributio; Kolmogorov-Smirov distaces; probability of correct selectio; tolerace level. 1 Itroductio It is a quite importat problem i statistics to test whether some give observatios follow oe of the two possible probability distributios. I this paper we cosider the problem of selectig either log-ormal or gamma distributio with ukow shape ad scale parameters for a give data set. It is well kow (Johso, Kotz ad Balakrisha 13]) that both log-ormal ad gamma distributios ca be used quite effectively i aalyzig skewed Departmet of Mathematics, Idia Istitute of Techology Kapur, Pi 208016, INDIA. E-mail: kudu@iitk.ac.i, correspodig author Faculty of Mathematics ad Iformatics, Uiversity of Passau, GERMANY 1

positive data set. Sources i the literature idicate that these two distributios are ofte iterchageable (Wies 18]). Therefore, to aalyze a skewed positive data set a experimeter might wish to select oe of them. Although these two models may provide similar data fit for moderate sample sizes but it is still desirable to choose the correct or early correct order model, sice the iferece based o a particular model will ofte ivolve tail probabilities where the affect of the model assumptio will be more crucial. Therefore, eve if large sample sizes are ot available, it is very importat to make the best possible decisio based o the give observatios. The problem of testig whether some give observatios follow oe of the two probability distributios, is quite old i the statistical literature. Atkiso 1, 2], Che 5], Chambers ad Cox 4], Cox 6, 7], Jackso 12] ad Dyer 9] cosidered this problem i geeral for discrimiatig betwee two arbitrary distributio fuctios. Due to the icreasig applicatios of the lifetime distributios, special attetio has bee give to discrimiate some specific lifetime distributio fuctios. Pereira 15] developed two tests to discrimiate betwee log-ormal ad Weibull distributios. Dumoceaux ad Atle 8] also cosidered the same problem of discrimiatig betwee log-ormal ad Weibull distributios. They proposed a test ad provided its critical values i that paper. Fear ad Nebezahl 10] used the maximum likelihood ratio method i discrimiatig betwee the Weibull ad gamma distributios. Bai ad Eglehardt 3] provided the probability of correct selectio (PCS) of Weibull versus gamma distributios based o extesive computer simulatios. Firth 11] ad Wies 18] discussed the problem of discrimiatig betwee the log-ormal ad gamma distributios. I this paper we cosider the problem of discrimiatig betwee the log-ormal ad gamma distributio fuctios. We use the ratio of maximized likelihoods (RML) i discrimiatig betwee these two distributios, which was origially proposed by Cox 6, 7] i 2

discrimiatig betwee two separate models. We obtai the asymptotic distributios of the RML. It is observed by extesive simulatios study that these asymptotic distributios work quite well to compute the PCS, eve if the sample size is ot very high. Usig these asymptotic distributios ad the distace betwee these two distributio fuctios, we compute the miimum sample size required to discrimiate the two distributio fuctios at a user specified protectio level ad a tolerace limit. The rest of the paper is orgaized as follows. We briefly discuss the RML i sectio 2. We obtai the asymptotic distributios of RML i sectio 3. I sectio 4, we compute the miimum sample size required to discrimiate betwee the two distributio fuctios. Some umerical experimets are performed to observe how the asymptotic results behave for fiite sample i sectio 5. Data aalysis are performed i sectio 6 ad fially we coclude the paper i sectio 7. 2 Ratio Of The Maximized Likelihoods Suppose X 1,..., X are idepedet ad idetically distributed (i.i.d.) radom variables from a gamma or from a log-ormal distributio fuctio. The desity fuctio of a logormal radom variable with scale parameter θ ad shape parameter σ is deoted by f LN (x; θ, σ) = 1 e ( l( x θ )) 2 2σ 2 ; x, θ, σ > 0. (1) 2πxσ The desity fuctio of a gamma distributio with shape parameter α ad scale parameter λ will be deoted by f GA (x; α, λ) = 1 ( ) x α 1 e ( λ) x ; λγ(α) λ x, α, λ > 0. (2) A log-ormal distributio with shape parameter σ ad scale parameter θ will be deoted by LN(σ, θ) ad similarly a gamma distributio with shape parameter α ad scale parameter λ will be deoted as GA(α, λ). 3

are The likelihood fuctios assumig that the data are comig from GA(α, λ) or LN(θ, σ) L GA (α, λ) = f GA (x; α, λ) ad L LN (θ, σ) = f LN (x; θ, σ) i=1 i=1 respectively. The RML is defied as L = L LN(ˆσ, ˆθ), (3) L GA (ˆα, ˆλ) where (ˆα, ˆλ) ad (ˆθ, ˆσ) are maximum likelihood estimators of (α, λ) ad (θ, σ) respectively based o the sample {X 1,..., X }. The atural logarithm of RML ca be writte as ( ) ( ) Γ(ˆα) X T = l ˆα l ˆσ ˆλ + X ˆλ 1 2ˆσ 2 i=1 ( l ( Xi θ )) 2 1 2 l(2π) ] here X ad X are arithmetic ad geometric meas of {X 1,... X } respectively, i.e. X = 1 ( X i ad X = i=1 i=1, (4) X i ) 1. (5) Note that i case of log-ormal distributio, ˆθ ad ˆσ have the followig forms; ˆθ = X ad ˆσ = Also ˆα ad ˆλ satisfy the followig relatio ( 1 ( Xi l i=1 ˆθ ) 2 ) 1 2. (6) ˆα = X ˆλ. (7) The followig procedure ca be used to discrimiate betwee gamma ad log-ormal distributios. Choose the log-ormal distributio if T > 0, otherwise choose the gamma distributio as the preferred oe. From the expressio of T as give i (4), it is clear that if the data come from a log-ormal distributio, the the distributio of T is idepedet of θ ad depeds oly o σ. Similarly, if the data come from a gamma distributio, the its distributio depeds oly o α ad it is idepedet of λ. 4

We estimate the PCS by usig extesive computer simulatios for differet sample sizes ad for differet shape parameters. First we geerate a sample of size from a LN(σ, 1) ad we compute ˆσ, ˆθ, ˆα ad ˆλ from that sample. Based o that sample we compute T ad verify whether T > 0 or T < 0. We replicate the process 10,000 times ad obtai the percetage of times it is positive. It provides a estimate of the PCS whe the data come from a log-ormal distributio. Exactly the same way we estimate the PCS whe the data come from a gamma distributio. The results are reported i Tables 5 ad 6 respectively. Some of the poits are quite clear from Tables 5 ad 6. I both cases for fixed shape parameter as sample size icreases the PCS icreases as expected. Whe the data come from a log-ormal distributio, for a fixed sample size, the PCS icreases as the shape parameter decreases. Iterestigly, whe the data come from a gamma distributio the PCS icreases as the shape parameter icreases. From these simulatio experimets, it is clear that the two distributio fuctios become closer if the shape parameter of the log-ormal distributio decreases ad the correspodig shape parameter of the gamma distributio icreases. 3 Asymptotic Properties Of The RML I this sectio we obtai the asymptotic distributios of RML for two differet cases. From ow o we deote the almost sure covergece by a.s.. Case 1: The data are comig from a log-ormal distributio. We assume that data poits {X 1,..., X }, are from a LN(σ, θ) ad ˆα, ˆλ, ˆθ ad ˆσ are same as defied before. We use followig otatios. For ay Borel measurable fuctio h(.), E LN (h(u)) ad V LN (h(u)) deote mea ad variace of h(u) uder the assumptio that U follows LN(σ, θ). Similarly we defie E GA (h(u)) ad V GA (h(u)) as mea ad variace of h(u) uder the assumptio that U follows GA(α, λ). If g(.) ad h(.) are two Borel measur- 5

able fuctios, we defie alog the same lie that cov LN (g(u), h(u)) = E LN (g(u)h(u)) - E LN (g(u))e LN (h(u)) ad similarly cov GA (g(u), h(u)) also, where U follows LN(θ, σ) ad GA(α, λ) respectively. The followig lemma is eeded to prove the mai result. Lemma 1: Uder the assumptio that the data are from LN(θ, σ) as, we have (i) ˆσ σ a.s., ˆθ θ a.s., where E LN l(f LN (X; σ, θ))] = max E LN l(fln (X; σ, θ)) ]. σ, θ (ii) ˆα α a.s., ˆλ λ a.s., where E LN l(fga (X; α, λ)) ] = max α,λ E LN l(f GA (X; α, λ))]. Note that α ad λ may deped o σ ad θ but we do ot make it explicit for brevity. Let us deote ( ) T LLN (σ, θ) = l L GA ( α, λ). (iii) 1 2 T E LN (T )] is asymptotically equivalet to 1 2 T E LN (T )] Proof of Lemma 1: The proof follows usig the similar argumet of White 17, Theorem 1] ad therefore it is omitted. Now we ca state the mai result; Theorem 1: Uder the assumptio that the data are from LN(σ, θ), T is asymptotically ormally distributed with mea E LN (T ) ad variace V LN (T ) = V LN (T ). Proof of Theorem 1: Usig the Cetral limit theorem ad from part (ii) of lemma 1, it follows that 1 2 T E LN (T )] is asymptotically ormally distributed with mea zero ad variace V LN (T ). Therefore usig part (iii) of lemma 1, the result immediately follows. 6

Now we discuss how to obtai α, λ, E LN (T ) ad V LN (T ). Let us defie g(α, λ) = E LN l(f GA (X; α, λ))] = E LN (α 1) l X X ] λ α l(λ) l(γ(α)) = (α 1) l θ θ λ e σ2 2 + α l λ + l(γ(α)). I this case, α ad λ have the followig relatios; λ = θ α e σ2 2 (8) ad ψ( α) = l α σ2 2. (9) Here ψ(x) = d l Γ(x) is a psi fuctio. Therefore, α ca be obtaied by solvig the o- dx liear equatio (9), ad clearly it is a fuctio of σ 2 oly. Oce α is obtaied, λ ca be obtaied from (8). It is immediate that ( ) λ θ is also a fuctio of σ 2 oly. Now we provide the expressio for E LN (T ) ad V LN (T ). Observe that lim E LN (T ) ad lim V LN (T ) exist. We deote lim E LN (T ) = AM LN (σ 2 ) ad lim V LN (T ) = AV LN (σ 2 ) respectively. Therefore for large, E LN (T ) AM LN (σ) = E LN l(fln (X; σ, 1)) l(f GA (X; α, λ)) ] = 1 2 l(2π) l σ 1 2 1 λe σ2 2 + α l λ + l Γ( α) (10) We also have V LN (T ) AV LN (σ) = V LN l(fln (X; σ 2, 1)) l(f GA (X; α, λ)) ] = V LN α l X + X λ 1 ] (l X)2 2σ2 = α 2 σ 2 + 1 λ2 eσ2 (e σ2 1) + 1 2 2 α λ cov LN (l X, X) 1 λσ 2 cov LN((l X) 2, X). (11) 7

Now we cosider the other case. Case 2: The data are from a gamma distributio. Let us assume that a sample {X 1,..., X } of size is obtaied from GA(α, λ). I this case we have the followig lemma. Lemma 2: Uder the assumptio that the data are from a gamma distributio ad as, we have (i) ˆα α a.s., ˆλ λ a.s., where E GA l(f GA (X; α, λ))] = max E GA l(fga (X; ᾱ, λ)) ]. ᾱ, λ (ii) ˆσ σ a.s., ˆθ θ a.s., where E GA l(fln (X; σ, θ)) ] = max E GA l(fln (X; σ 2, θ)) ]. σ,θ Note that here also σ ad θ may deped o α ad λ but we do ot make it explicit for ( ) brevity. Let us deote T = l. L LN ( σ, θ) L GA (α,λ) (iii) 1 2 T E GA (T )] is asymptotically equivalet to 1 2 T E GA (T )]. Theorem 2: Uder the assumptio that the data are from a gamma distributio, T is approximately ormally distributed with mea E GA (T ) ad variace V GA (T ) = V GA (T ). Now to obtai σ ad θ, let us defie h(θ, σ) = E GA l(f LN (X; σ, θ))] = E GA 1 2 = 1 2 1 l(2π) l σ l X (l X l θ)2 2σ2 l(2π) l σ ψ(α) l λ 1 2σ 2 ψ (α) + (ψ(α)) 2 + (l λ l θ) 2 +2ψ(α)(l λ l θ)]. ] 8

Therefore, σad θ ca be obtaied as θ = λe ψ(α) ad σ = (ψ (α)) 1 2. (12) Here ψ (α) = d dα ψ(α). Now we provide the expressios for E GA(T ) ad V GA (T ). Similarly as before, we observe that lim E GA (T ) ad lim V GA (T ) exist. We deote lim E GA (T ) = AM GA (α) ad lim V GA (T ) = AV GA (α) respectively, the for large, E GA (T ) AM GA (α) = E GA l(fln (X; σ, θ)) l(f GA (X; α, 1)) ] = 1 2 1 l(2π) l σ ψ (α) + (ψ(α) l θ) 2] 2 σ 2 + l (Γ(α)) + α(1 ψ(α)) V GA (T ) AV GA (α) = V GA l(fln (X; σ, θ)) l(f GA (X; α, 1)) ] = V GA X α l X 1 ] 2 (l X l θ) 2 σ 2 = α + α 2 ψ (α) 2α(ψ(α + 1) ψ(α)) 1 σ 2 α(α + 1)(ψ (α + 2) + ψ(α + 2)) 2 (l θ)αψ(α + 1) +α(l θ) 2 α(ψ (α) + ψ(α)) 2 2(l θ)ψ(α) 2αψ(α)ψ (α) αψ (α) + 2(l θ)ψ (α)] + 1 4 σ 4 ψ (α) + 4ψ(α)ψ (α) + 4ψ (α)(ψ(α)) 2 + 2(ψ (α)) 2 4(l θ)ψ (α) 8ψ(α)ψ (α) l θ + 4ψ (α)(l θ) 2]. Note that α, λ, AM LN (σ), AV LN (σ), σ, θ, AM GA (α) ad AV GA (α) are quite difficult to compute umerically. We preset α, λ ad also AM LN (σ) ad AV LN (σ) for differet values of σ i Table 1. We also preset σ, θ ad also AM GA (α) ad AV GA (α) for differet values of α i Table 2 for coveiece. 9

4 Determiatio Of Sample Size: We are proposig a method to determie the miimum sample size required to discrimiate betwee the log-ormal ad gamma distributios, for a give user specified PCS. Before discrimiatig betwee two fitted distributio fuctios it is importat to kow how close they are. There are several ways to measure the closeess or the distace betwee two distributio fuctios, for example, the Kolmogorov-Smirov (K-S) distace or Helliger distace etc.. It is very atural that if two distributios are very close the a very large sample size is eeded to discrimiate betwee them for a give PCS. O the other had if the distace betwee two distributio fuctios is quite far, the oe may ot eed very large sample size to discrimiate betwee them. It is also true that if the distace betwee two distributio fuctios are small, the oe may ot eed to differetiate the two distributios from ay practical poit of view. Therefore, it is expected that the user will specify before had the PCS ad also the tolerace limit i terms of the distace betwee two distributio fuctios. The tolerace limit simply idicates that the user does ot wat to make the distictio betwee two distributio fuctios if their distace is less tha the tolerace limit. Based o the probability of correct selectio ad the tolerace limit, the required miimum sample size ca be determied. Here we use the K-S distace to discrimiate betwee two distributio fuctios but similar methodology ca be developed usig the Helliger distace also, which is ot pursued here. We observed i sectio 3 that the RML statistics follow ormal distributio approximately for large. Now it will be used with the help of K-S distace to determie the required sample size such that the PCS achieves a certai protectio level p for a give tolerace level D. We explai the procedure assumig case 1, case 2 follows exactly alog the same lie. 10

Sice T is asymptotically ormally distributed with mea E LN (T ) ad variace V LN (T ), therefore the probability of correct selectio (PCS) is P CS(σ) = P T > 0 σ] 1 Φ E LN(T ) = 1 Φ AM LN(σ). (13) V LN (T ) AV LN (σ) Here Φ is the distributio fuctio of the stadard ormal radom variable. AM LN (σ) ad AV LN (σ) are same as defied before. Now to determie the sample size eeded to achieve at least a p protectio level, equate ad solve for. It provides Here z p Φ AM LN(σ) AV LN (σ) = 1 p (14) = z2 p AV LN(σ) (AM LN (σ)) 2. (15) is the 100p percetile poit of a stadard ormal distributio. For p = 0.9 ad for differet σ, the values of are reported i Table 3. Similarly for case 2, we eed = z2 p AV GA(α) (AM GA (α)) 2. (16) Here AM GA (α) ad AV GA (α) are same as defied before. We report, with the help of Table 2 for differet values of α whe p = 0.9 i Table 4. From Table 3, it is clear that as σ icreases the required sample size decreases for a give PCS. Iterestigly, from Table 4, it is immediate that as α icreases the required sample size icreases. Both the fidigs are quite ituitive i the sese oe eeds large sample sizes to discrimiate betwee them if the two distributio fuctios are very close. It is clear that if oe kows the rages of the shape parameters of the two distributio fuctios, the the miimum sample size ca be obtaied usig (15) or (16) ad usig the fact that is a mootoe fuctio of the shape parameters i both the cases. But ufortuately i practice it may be completely ukow. Therefore, to have some idea of the shape parameter of the ull distributio we make the followig assumptios. It is assumed that the experimeter would like to choose 11

the miimum sample size eeded for a give protectio level whe the distace betwee two distributio fuctios is greater tha a pre-specified tolerace level. The distace betwee two distributio fuctios is defied by the K-S distace. The K-S distace betwee two distributio fuctios, say F (x) ad G(x) is defied as sup F (x) G(x). (17) x We report K-S distace betwee LN(σ, 1) ad GA( α, λ) for differet values of σ i Table 3. Here α ad λ are same as defied i Lemma 1 ad they have bee reported i Table 1. Similarly, K-S distace betwee GE(α, 1) ad LN( σ, θ) for differet values of α is reported i Table 4. Here σ ad θ are same as defied i Lemma 2 ad they have bee reported i Table 2. Now we explai how we ca determie the miimum sample size required to discrimiate betwee log-ormal ad gamma distributio fuctios for a user specified protectio level ad for a give tolerace level betwee them. Suppose the protectio level is p = 0.9 ad the tolerace level is give i terms of K-S distace as D = 0.05. Here tolerace level D = 0.05 meas that the practitioer wats to discrimiate betwee a log-ormal ad gamma distributio fuctios oly whe their K-S distace is more tha 0.05. From Table 3, it is observed that the K-S distace will be more tha 0.05 if σ 0.7. Similarly from Table 4, it is clear that the K-S distace will be more tha 0.05 if α 2.0. Therefore, if the data come from the log-ormal distributio, the for the tolerace level D = 0.05, oe eeds at most = 96 to meet the PCS, p = 0.9. Similarly if the data come from the gamma distributio the oe eeds at most = 95 to meet the above protectio level p = 0.9 for the same tolerace level D = 0.05. Therefore, for the give tolerace level 0.05 oe eeds max(95, 96) = 96 to meet the protectio level p = 0.9 simultaeously for both the cases. Table 1 Differet values of AM LN (σ), AV LN (σ), α ad λ for differet σ. 12

σ AM LN (σ) AV LN (σ) α λ 0.5 0.0207 0.0143 4.1594 0.2724 0.7 0.0389 0.0885 2.1930 0.5826 0.9 0.0608 0.1612 1.3774 1.0885 1.1 0.0861 0.2660 0.9588 1.8313 1.3 0.1131 0.4016 0.7133 3.2637 1.5 0.1409 0.5692 0.5556 5.5439 5 Numerical Experimets I this sectio we perform some umerical experimets to observe how these asymptotic results derived i sectio 3 work for fiite sample sizes. All computatios are performed at the Idia Istitute of Techology Kapur, usig Petium-IV processor. We use the radom deviate geerator of Press et al. 16] ad all the programs are writte i C. They ca be obtaied from the authors o request. We compute the probability of correct selectios based o simulatios ad we also compute it based o asymptotic results derived i sectio 3. We cosider differet sample sizes ad also differet shape parameters, the details are explaied below. First we cosider the case whe the data are comig from a log-ormal distributio. I this case we cosider = 20, 40, 60, 80, 100 ad σ = 0.5, 0.7, 0.9, 1.1, 1.3 ad 1.5. For a fixed σ ad we geerate a radom sample of size from LN(σ, 1), we fially compute T as defied i (4) ad check whether T is positive or egative. We replicate the process 10,000 times ad obtai a estimate of the PCS. We also compute the PCSs by usig these asymptotic results as give i (13). The results are reported i Table 5. Similarly, we obtai the results whe the data are geerated from a gamma distributio. I this case we cosider the same set of ad α = 2.0, 4.0, 6.0, 8.0, 10.0 ad 12.0. The results are reported i Table 6. I each box the first row represets the results obtaied by usig Mote Carlo simulatios ad the secod row represets the results obtaied by usig the asymptotic theory. 13

As sample size icreases the PCS icreases i both the cases. It is also clear that whe the shape parameter icreases for the log-ormal distributio the PCS icreases ad whe the shape parameter decreases for the gamma distributio the PCS icreases. Eve whe the sample size is 20, asymptotic results work quite well for both the cases for all possible parameter rages. From the simulatio study it is recommeded that asymptotic results ca be used quite effectively eve whe the sample size is as small as 20 for all possible choices of the shape parameters. Table 2 Differet values of AM GA (α), AV GA (α), σ ad θ for differet α. α AM GA (α) AV GA (α) σ θ 2.0-0.0395 0.0904 0.8031 1.5262 4.0-0.0207 0.0457 0.5328 3.5118 6.0-0.0142 0.0305 0.4258 5.5075 8.0-0.0109 0.0221 0.3649 7.5055 10.0-0.0088 0.0180 0.3243 9.5044 12.0-0.0074 0.0149 0.2948 11.5036 6 Data Aalysis I this sectio we aalyze oe data set ad use our method to discrimiate betwee two populatios. Data Set 1: The data set is from Lawless 14, Page 228]. The data give arose i tests o edurace of deep groove ball bearigs. The data are the umber of millio revolutios before failure for each of the 23 ball bearigs i the life test ad they are: 17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.80, 51.84, 51.96, 54.12, 55.56, 67.80, 68.44, 68.64, 68.88, 84.12, 93.12, 98.64, 105.12, 105.84, 127.92, 128.04, 173.40. Whe we use a log-ormal distributio, the MLEs of the differet parameters are ˆσ 14

= 0.5313, ˆθ = 0.63.5147 ad L(L LN (ˆσ, ˆθ)) = -112.8552. The K-S distace betwee the fitted empirical distributio fuctio ad the fitted log-ormal distributio fuctio is 0.09. Similarly, if we use a gamma distributio, the MLEs of the differet parameters are ˆα = 4.0196, ˆλ = 17.9856 ad L(L GA (ˆα, ˆλ)) = -113.0274. I this case, the K-S distace betwee the fitted empirical distributio fuctio ad the fitted gamma distributio fuctio is 0.12. The K-S distace betwee the two fitted distributios is 0.034. They are quite close to each other. I terms of the K-S distace, log-ormal distributio is closer to the empirical distributio fuctio tha a gamma distributio. Iterestigly, T = 112.8552 + 113.0274 = 0.1722 > 0, also suggests to choose the log-ormal distributio rather tha the gamma distributio. Table 3 The miimum sample size = z2 0.90 AV LN (σ) (AM LN (σ)) 2, for p = 0.9 ad whe the data are comig from a log-ormal distributio is preseted. The K-S distace betwee LN (σ,1) ad GA( α, λ) for differet values of σ is reported. σ 0.5 0.7 0.9 1.1 1.3 1.5 159 96 72 59 52 47 K-S 0.033 0.049 0.064 0.076 0.097 0.113 Table 4 The miimum sample size = z2 0.90 AV GA(α) (AM GA (α)) 2, for p = 0.9 ad whe the data come from a gamma distributio is preseted. The K-S distace betwee GA (α,1) ad LN ( σ, θ) for differet values of α is reported. α 2.0 4.0 6.0 8.0 10.0 12.0 95 175 249 306 382 447 K-S 0.049 0.034 0.025 0.023 0.021 0.013 Assumig that the origial distributio was log-ormal with σ = 0.5215 = ˆσ ad θ = 63.4784 = ˆθ, we compute PCS by computer simulatios (based o 10,000 replicatios) similarly as i sectio 5 ad we obtai PCS = 0.6985. It implies that PCS 70%. O the 15

other had if the choice of log-ormal distributio was wrog ad the origial distributio was gamma with shape parameter α = 4.0196 = ˆα ad scale parameter λ = 17.9856 = ˆλ, the similarly as before based o 10,000 replicatios we obtai PCS = 0.6788, yieldig a estimated risk approximately 32% to choose the wrog model. Now we compute the PCSs based o large sample approximatios. Assumig that the data are comig from the LN(0.5313, 63.5147), we obtai AM LN (0.5215) = 0.0276 ad AV LN (0.5215) = 0.0578, it implies E LN (T ) 0.6348 ad V LN (T ) = 1.3294. Therefore, assumig that the data are from LN(0.5313, 63.5147), T is approximately ormally distributed with mea = 0.6348, variace = 1.3294 ad PCS = 1 - Φ( 0.5505) = Φ(0.5505) 0.71, which is almost equal to the above simulatio result. Similarly, assumig that the data are comig from a gamma distributio, we compute AM GA (4.0196) = -0.0198 ad AV GA (4.0196) = 0.0424. ad we have E LN (T ) -0.4554 ad V LN (T ) = 0.9752. Therefore, assumig that the data are from a gamma distributio the PCS = Φ(0.4612) 0.68, which is also very close to the simulated results. Therefore, based o K-S distaces ad also o the RML statistics, we would like to coclude that it is more likely that the data are comig from a log-ormal distributio ad the probability correct selectio is 70 %. Table 5 The probability of correct selectio based o Mote Carlo Simulatios ad also based o asymptotic results whe the data are comig from log-ormal. The elemet i the first row i each box represets the results based o Mote Carlo Simulatios (10,000 replicatios) ad the umber i bracket immediately below represets the result obtaied by usig asymptotic results. 16

σ 20 40 60 80 100 0.5 0.66 0.73 0.78 0.82 0.85 (0.68) (0.74) (0.79) (0.82) (0.85) 0.7 0.70 0.79 0.85 0.88 0.91 (0.72) (0.80) (0.84) (0.88) (0.91) 0.9 0.73 0.84 0.89 0.93 0.95 (0.75) (0.83) (0.88) (0.92) (0.94) 1.1 0.77 0.88 0.92 0.93 0.95 (0.76) (0.86) (0.91) (0.93) (0.95) 1.3 0.78 0.88 0.92 0.95 0.96 (0.79) (0.87) (0.92) (0.95) (0.96) 1.5 0.81 0.90 0.94 0.96 0.97 (0.80) (0.89) (0.93) (0.96) (0.97) 7 Coclusios I this paper we cosider the problem of discrimiatig the two families of distributio fuctios, amely the log-ormal ad gamma families. We cosider the statistic based o the ratio of the maximized likelihoods ad obtai the asymptotic distributios of the test statistics uder ull hypotheses. We compare the probability of correct selectio usig Mote Carlo simulatios with the asymptotic results ad it is observed that eve whe the sample size is very small the asymptotic results work quite well for a wide rage of the parameter space. Therefore, the asymptotic results ca be used to estimate the probability of correct selectio. We use these asymptotic results to calculate the miimum sample size required for a user specified probability of correct selectio. We use the cocept of tolerace level based o the distace betwee the two distributio fuctios. For a particular D tolerace level the miimum sample size is obtaied for a give user specified protectio level. Two small tables are provided for the protectio level 0.90 but for the other protectio level the tables ca be easily used as follows. For example if we eed the protectio level p = 0.8, the all the etries correspodig to the row of, will be multiplied by z2 0.8, because of (15) z0.9 2 ad (16). Therefore, Tables 3 ad 4 ca be used for ay give protectio level. Table 6 17

The probability of correct selectio based o Mote Carlo Simulatios ad also based o asymptotic results whe the data are comig from a gamma distributio. The elemet i the first row i each box represets the results based o Mote Carlo Simulatios (10,000 replicatios) ad the umber i bracket immediately below represets the result obtaied by usig asymptotic results. α 20 40 60 80 100 2.0 0.71 0.80 0.86 0.88 0.91 (0.72) (0.80) (0.85) (0.88) (0.91) 4.0 0.65 0.74 0.78 0.81 0.83 (0.67) (0.73) (0.77) (0.81) (0.83) 6.0 0.63 0.71 0.74 0.77 0.79 (0.64) (0.70) (0.74) (0.77) (0.79) 8.0 0.61 0.70 0.72 0.75 0.77 (0.63) (0.69) (0.72) (0.75) (0.77) 10.0 0.60 0.67 0.70 0.73 0.75 (0.61) (0.66) (0.70) (0.72) (0.75) 12.0 0.59 0.66 0.69 0.71 0.73 (0.61) (0.65) (0.68) (0.71) (0.73) Refereces 1] Atkiso, A. (1969), A test for discrimiatig betwee models, Biometrika, 56, 337-347. 2] Atkiso, A. (1970), A method for discrimiatig betwee models ( with discussios), Jour. Royal Stat. Soc. Ser. B, 32, 323-353. 3] Bai, L.J. ad Eglehardt, M. (1980), Probability of correct selectio of Weibull versus gamma based o likelihood ratio, Commuicatios i Statistics, Ser. A., vol. 9, 375-381. 4] Chambers, E.A. ad Cox, D.R. (1967), Discrimiatig betwee alterative biary respose models, Biometrika, 54, 573-578. 5] Che, W.W. (1980), O the tests of separate families of hypotheses with small sample size, Jour. Stat. Comp. Simul., 2, 183-187. 6] Cox, D.R. (1961), Tests of separate families of hypotheses, Proceedigs of the Fourth Berkeley Symposium i Mathematical Statistics ad Probability, Berkeley, Uiversity of Califoria Press, 105-123. 7] Cox, D.R. (1962), Further results o tests of separate families of hypotheses, Jour. of Royal Statistical Society, Ser. B, 24, 406-424. 8] Dumoceaux, R, ad Atle, C. (1973), Discrimiatig betwee the log-ormal ad the Weibull distributios, Techometrics, 15, 923-926. 18

9] Dyer, A.R. (1973), Discrimiatio procedure for separate families of hypotheses, Jour. Amer. Stat. Asso., 68, 970-974. 10] Fear, D.H. ad Nebezahl, E. (1991), O the maximum likelihood ratio method of decidig betwee the Weibull ad gamma distributios, Commuicatios i Statistics, Ser. A, 20, 2, 579=593. 11] Firth, D. (1988), Multiplicative errors: log-ormal of gamma?, Joural of the Royal Statistical Society, Ser. B, 2, 266-268. 12] Jackso, O.A.Y. (1968), Some results o tests separate families of hypotheses, Biometrika, 55, 355-363. 13] Johso, N., Kotz, S. ad Balakrisha, N (1995), Cotiuous Uivariate Distributios, 2d Editio, Wiley, New York. 14] Lawless, (1982), Statistical Models ad Methods for Lifetime Data, New York, Wiley. 15] Pereira, B. de B. (1977), A ote o the cosistecy ad o the fiite sample comparisos of some tests of separate families of hypotheses, Biometrika, 64, 109-113. 16] Press et al. (1993) Numerical Recipes i FORTRAN, Cambridge Uiversity Press, Cambridge. 17] White, H. (1982), Regularity coditios for Cox s test of o-ested hypotheses, Joural of Ecoometrics, vol. 19, 301-318. 18] Wies, B.L. (1999), Whe log-ormal ad gamma models give differet results: a case study, The America Statisticia, 53, 2, 89-93. 19

20 1 0.9 0.8 0.7 Fitted log ormal 0.6 0.5 Fitted gamma 0.4 0.3 0.2 0.1 0 0 50 100 150 200 250 Figure 1: The two fitted distributio fuctios for the give data set

21 1 0.9 0.8 0.7 Fitted log ormal 0.6 0.5 Fitted gamma 0.4 0.3 0.2 0.1 0 0 50 100 150 200 250 Figure 1: The two fitted distributio fuctios for the give data set