Abstract. Keywords and phrases: gamma distribution, median, point estimate, maximum likelihood estimate, moment estimate. 1.

Similar documents
A NEW POINT ESTIMATOR FOR THE MEDIAN OF GAMMA DISTRIBUTION

Technology Support Center Issue

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Random Variables and Probability Distributions

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Estimating term structure of interest rates: neural network vs one factor parametric models

Joensuu, Finland, August 20 26, 2006

MM and ML for a sample of n = 30 from Gamma(3,2) ===============================================

AP Statistics Chapter 6 - Random Variables

IOP 201-Q (Industrial Psychological Research) Tutorial 5

SAMPLE STANDARD DEVIATION(s) CHART UNDER THE ASSUMPTION OF MODERATENESS AND ITS PERFORMANCE ANALYSIS

Application of MCMC Algorithm in Interest Rate Modeling

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

The Impact of Liquidity Ratios on Profitability (With special reference to Listed Manufacturing Companies in Sri Lanka)

Data analysis methods in weather and climate research

STRESS-STRENGTH RELIABILITY ESTIMATION

Chapter 7. Inferences about Population Variances

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Process capability estimation for non normal quality characteristics: A comparison of Clements, Burr and Box Cox Methods

Properties of IRR Equation with Regard to Ambiguity of Calculating of Rate of Return and a Maximum Number of Solutions

Gamma Distribution Fitting

A New Hybrid Estimation Method for the Generalized Pareto Distribution

14.1 Moments of a Distribution: Mean, Variance, Skewness, and So Forth. 604 Chapter 14. Statistical Description of Data

Simulation of probability distributions commonly used in hydrological frequency analysis

On Some Statistics for Testing the Skewness in a Population: An. Empirical Study

STOCHASTIC DIFFERENTIAL EQUATION APPROACH FOR DAILY GOLD PRICES IN SRI LANKA

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Government Tax Revenue, Expenditure, and Debt in Sri Lanka : A Vector Autoregressive Model Analysis

Much of what appears here comes from ideas presented in the book:

On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

ELEMENTS OF MONTE CARLO SIMULATION

UPDATED IAA EDUCATION SYLLABUS

Math 140 Introductory Statistics

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Using Fractals to Improve Currency Risk Management Strategies

A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution

Institute of Actuaries of India Subject CT6 Statistical Methods

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

Capital Allocation Principles

Window Width Selection for L 2 Adjusted Quantile Regression

Chapter 7: Estimation Sections

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Simple Descriptive Statistics

Business Statistics 41000: Probability 3

ก ก ก ก ก ก ก. ก (Food Safety Risk Assessment Workshop) 1 : Fundamental ( ก ( NAC 2010)) 2 3 : Excel and Statistics Simulation Software\

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

A Skewed Truncated Cauchy Logistic. Distribution and its Moments

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Mean-Variance Portfolio Theory

Introduction to Algorithmic Trading Strategies Lecture 8

Week 1 Quantitative Analysis of Financial Markets Basic Statistics A

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Paper Series of Risk Management in Financial Institutions

The Two-Sample Independent Sample t Test

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Obtaining Predictive Distributions for Reserves Which Incorporate Expert Opinion

Introduction to Population Modeling

Does Calendar Time Portfolio Approach Really Lack Power?

Commonly Used Distributions

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

Application of Conditional Autoregressive Value at Risk Model to Kenyan Stocks: A Comparative Study

Maximum Likelihood Estimation

Lecture 10: Point Estimation

Robust Critical Values for the Jarque-bera Test for Normality

Operational Risk Aggregation

MODELLING 1-MONTH EURIBOR INTEREST RATE BY USING DIFFERENTIAL EQUATIONS WITH UNCERTAINTY

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

Development Team. Environmental Sciences. Prof. R.K. Kohli Prof. V.K. Garg &Prof.AshokDhawan Central University of Punjab, Bathinda

February 2 Math 2335 sec 51 Spring 2016

1 Economical Applications

THE USE OF THE LOGNORMAL DISTRIBUTION IN ANALYZING INCOMES

Sampling Distributions and the Central Limit Theorem

Linear Regression with One Regressor

STOCHASTIC COST ESTIMATION AND RISK ANALYSIS IN MANAGING SOFTWARE PROJECTS

Simulation Wrap-up, Statistics COS 323

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Using Monte Carlo Analysis in Ecological Risk Assessments

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

Measuring Financial Risk using Extreme Value Theory: evidence from Pakistan

On the Distributional Assumptions in the StoNED model

GENERATION OF STANDARD NORMAL RANDOM NUMBERS. Naveen Kumar Boiroju and M. Krishna Reddy

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Stochastic model of flow duration curves for selected rivers in Bangladesh

An Improved Skewness Measure

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

Analysis of the Oil Spills from Tanker Ships. Ringo Ching and T. L. Yip

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Transcription:

Vidyodaya J. of sc: (201J9) Vol. /-1. f'f' 95-/03 A new point estimator for the median of gamma distribution B.M.S. G Banneheka' and GE.M. V.P.D Ekanayake' IDepartment of Statistics and Computer Science, University of Sri Jayewardenepura, Nugegoda, Sri Lanka. 'Department of Census and Statistics, Prices and wages division, 104A, Kitulwatta Road, Colombo 8, Sri Lanka. Received on : 27-03-2008 Accepted on : 23-12-2008 Abstract In this paper, we consider the problem of estimating the median of a gamma distribution. We introduce a new point estimator based on an approximation that we derive for the median of a gamma distribution. We compare the new estimator with two conventional estimators, namely the sample median and the maximum likelihood estimator (mle). Comparison is based on the amount of computations required to calculate the estimates and the root mean square errors ofthe estimators. The new estimator is shown to be 'optimum' with respect to these two criteria. Keywords and phrases: gamma distribution, median, point estimate, maximum likelihood estimate, moment estimate. 1. Introduction Estimation of population 'average' or 'central tendency' is a common inferential problem. Population mean and population median are the commonly used parameters to represent the population average. Most researchers consider mean to represent the average because the inference concerning the mean is easy. Sample mean is an unbiased estimator for the population mean. The central limit theorem can be used to derive confidence intervals and to test hypotheses when large samples are available. However, when the underlying distribution is skewed, the population mean tends to be larger (when positively skewed) or smaller (when negatively skewed) than the typical population 'average'. For example, consider the monthly income of households in a fixed area. The monthly incomes of most of the households are small to moderately large. There may be few households with very large monthly incomes. Then the distribution of household incomes 95

Banneheka and Ekanayake is positively skewed and the population mean can be significantly larger than the typical 'average' monthly household income. In such situations, the population median is better than the population mean to represent the population 'average'. When the population median is selected to represent the population average, the next problem is how to make inference regarding the population median. The parametric approach is to select a suitable model for the distribution of the variable of interest and make inference regarding the median of the selected model distribution. The gamma distribution is often used as a model for positively skewed distributions. Literature related to inference concerning the mean of a gamma distribution can be found in Anita S. et.a\. (2002) and references therein. However, we could not find any literature related to the inference concerning the median of a gamma distribution. In this paper we consider the problem of estimating the median of a gamma distribution. We intend to present a way to construct confidence intervals for the median of a gamma distribution, in another paper. 2. An Approximation for the Median of Gamma Distribution If a random variable X has a gamma distribution with shape parameter a (>0) and scale parameter fj (>0), it is denoted as X G(a, fj) (Anita S. elal.,2002). Its density function is given by -x//3 f X (X; a,p ) = (a ;P a, X > 0, a :> 0, s-. 0. a-i (1) Using simple calculus, it is easy see that lim!, (x;a,,b) = {=f3 1 x... o o a < 1 a =1 a> 1 Figure 1 shows the three different shapes arising from the above three cases. 96 (2)

A new point estimator CJJ d G(0.S,2) X- ed d '"" ""d G(1,2) N d 0 d 0 2 4 6 8 10 x Figure 1: Densities ofg(0.5,2), G(1,2) and G(2,2) For the above distribution, mean (J.L) = afj, standard deviation «J) = ra fj ' 2 and skewness = ra (Anita S. et.al.,2002). The skewness depends only on the shape parameter. As a increases, skewness decreases, and consequently the gamma distribution approaches a normal distribution when a is large (e.g., a> 10) (Anita S. et.a\., 2002). Let v be the median of the above gamma distribution. According to the definition, v satisfies the equation I' fft (x;a,,o)dx = 0.5. o It is not possible to write v in terms of a and explicitly (http://en.wikipedia.org/ wiki/gamma_distribution). However, the value of v for given values of a and can be obtained using the 'INVCDF' function in the statistical package Minitab or 'qgamma' function in the statistical package R (http://www.rproject.org/), Here we derive an approximation for v using two interesting features that we observed of the ratio pf(ll-v). The first is that pf(ll-v) is free of. In order to see this, suppose xg( a,).then, using the moment generating function technique (Mood A.M., et.a\., 2001, pg. 189) it can be shown that XI - G( a, 1). 97 (3)

Banneheka and Ekanayake If v is the median of X, then Pr(X<v)=0.5. Hence, Pr(X/I3<v/l3)=0.5. This implies that the median of(x/i3) = v/l3. In other words, v = 13*the median of G (a,l) distribution. Therefore,,.u(-v) =ul3/(ul3-13*median of a G («, I) distribution). This implies,.u(-v)=u/(u -median of a G (u,l) distribution). (4) From (4), it is clear that,.u(j.!-v) is free of 13and it is a function Figure 2 shows the relationship between,.u(-v) and c. of c. only. (a) (b) s :il 0., I g 0.0 0.2 0.4 0.6 0.8 1.0 10 15 20 alpha alpha Figure 2:,.u(-v)versus a Figure 2 (a) is the plot of,.u(j.!-v) against a when a < 1. Figure 2 (b) is the same when a 1. In order to produce these graphs, the medians of G (a,l) distributions for different values of u were obtained using the function 'qgamma' of the statistical package R. When ad, the relationship is non-linear. However, when a 1,,.u(J.!-v) is almost perfectly linear in c, This is the second interesting feature. When a 1, the suitable values for the slope and intercept of the linear relationship can be obtained using the least square method. Based on 100 equally spaced u values between 1 and 20 and the corresponding,.u(-v) values, the least square estimates for the slope and intercept are 0.2096 and 2.998 respectively. For simplicity, using 0.2 and 3 as the intercept and slope, we can write I" 0.2 + 3a or equivalently v J1 (3a -0.8). We denote this approximation as 98 (3a +0.2)

A new point estimator (5) Table I shows the absolute error of the approximation vbe calculated as a percentage of the actual median v ( IV-VBEI * 100 ). v Table I: Absolute error ofv HE as a percentage of actual median. a IV-VB/o I.* 100 v v=actual median VBE = approximation for v I 0.8147159 5 0.003077533 10 0.001650245 20 0.0005178544 These values show that our approximation (5) is very good when a 2: 1. According to (2), the gamma distribution with u-c l is suitable only if the relative frequency of values near zero are very high. Such situations are rare in practice. Gamma distribution with a 2: 1 fits in most practical situations. Therefore, our approximation is suitable for most practical applications. 3. Conventional Estimators for the Median of Gamma Distribution Let v be the median of gamma distribution with shape parameter a (>0) and scale parameter fj (>0). The sample median and maximum likelihood estimator are two possible estimators for the median v. 99

Banneheka and Ekanayake The sample median The sample median of a sample of size n is calculated Sample median = as follows: (n; 1) th ordered value when n is odd { n n (-th ordered value + (- + l)st ordered value)/ 2 when n is even 2 2 We shall denote this estimator by Sill The maximum likelihood estimator Since it is not possible to write v in terms of a and P explicitly, it is also not possible to obtain the maximum likelihood estimator ofv in a closed form. However, the maximum likelihood estimate of v can be obtained using the invariance property of the maximum likelihood estimators (Mood A.M., at.e\., 2001). This can be done by first deriving the maximum likelihood A A A estimates a and a of a and P respectively, and then finding V",IL' IIIle fj mlc that satisfies f mle Ix (x;:x mle ; /J mle )dx = 0.5. (6) using the 'INVCDF' function in the statistical package Minitab or 'qgamma' function in the statistical package R. Anita S. et. al. (2002) have discussed the maximum likelihood estimation a and 13. For the convenience of the reader, we reproduce some of their results in this paper. Let xl' x2,, x; be a random sample from a G (a, 13) distribution. Then, " maximum likelihood estimator a of P is given by fj mle " x 13 mil! = a. (7) It is not possible to obtain nile in a closed form. The authors have provided the following iterative procedure to obtain a. mle 100 of

A new point estimator (8) In equation (8), M = log(x) - L)og(x;), n \fi(a) = (log rea)), and da \fi' (a) = (\fi(a)). da \fi (a) is the digamma function and \fi' (a) is the trigamma function, These functions are available in R statistical software, Authors have suggested several starting values for a o in the iterative procedure (8). We found that the moment estimator a /tit' - (X)2 = ---'-----'----!"n X2 n.l..,=i ' -(XY (9) of a (Wiens et. al.,2003 ) also works well as the initial value a o ' As it can be seen from the above description, the derivation of the maximum likelihood estimate VA requires intensive computations, In the next section, nile we introduce a new estimator which requires fewer computations, 4. A New Estimator for the Median of Gamma Distribution Based on our approximation (5), we propose the following new estimator for the median v of a gamma distribution. " 1\ (3al/l e -O.8) _ VBI:.' = X 1\ (3am/!+ 0,2) (10) Here, a" me is the moment estimate of a, given by (9) 101

Banneheko and Ekanayake 5. Comparison of Estimators Table 2 shows the root mean square errors of the three estimators, and as percentages of the actual median. We consider three m valum fora. (Jnder each value of a, we consider three values of p. For each combination of a,p we consider four sample sizes (n). For each combination of a,p, n, the root mean square errors were calculated based on 10000 Monte Carlo simulations. Table 2: Root mean square errors of estimators as percentages of actual medians. a p v n A RMSBy) * 100 I' 1\ 1\ 1\ V.\m V'mte VBE I 0.5 0.35 5 68 54 57 10 44 36 39 20 32 26 29 30 27 21 24 I I 0.69 5 68 54 57 10 46 37 40 20 32 25 29 30 26 21 24 1 5 3.47 5 68 54 56 10 45 37 40 20 32 25 29 30 26 20 24 5 0.5 234 5 25 21 21 10 17 15 15 20 13 10 10 30 10 8 8 5 I 4.67 5 25 21 21 10 17 15 15 20 13 10 10 30 10 8 8 5 5 2335 5 25 21 21 10 17 15 15 20 12 10 10 30 10 8 8 10 0.5 4.83 5 17 14 14 10 12 10 10 20 9 7 7 30 7 6 6 10 1 9.67 5 17 14 14 10 12 10 10 20 9 7 7 30 7 6 6 10 5 4834 5 17 14 14 10 12 10 10 20 9 7 7 30 7 6 6 102

A new point estimator According to the values in Table 2, The sample median SII/ has the highest root mean square error. When a = 1, the maximum likelihood estimator mle has the smallest root mean square error. When a > 1, estimators V/\Be and v/\ have the same root mean square r. mle error. 6. Conclusion The sample median \m is the easiest estimate to calculate. Maximum likelihood estimator v/\ mle is the most difficult estimate to calculate. It requires intensive computations. Our estimator requires slightly more computations than that for the sample median and much less computations than that for the maximum likelihood estimate. Sample median has the highest root mean square error. Maximum likelihood estimator (mle) has the smallest root mean square error when a= I. The root mean square error of our estimator is slightly above that of the mle when a= 1, but the same when a> 1. Therefore, considering the required amount of computations and the root mean square error, our estimator can be considered as an 'optimum' estimator for the population median, when the gamma distribution with a 1 is a suitable model for the distribution of the variable of interest. 7. References Anita Singh, Ashok K. Singh, and Ross J. laci. Estimation of the Exposure Point Concentration Term Using a Gamma Distribution, 2002. EPA Technology Support Center Issue, United States Environmental Protection Agency. Available at: http://www.hanford.gov/dgo/training/289cmb02.pdf Mood, A.M., Graybill, F., Boes, D.C. Introduction to the theory of Statistics (2001). Tata McGraw Hill Publishing Company Limited, New Delhi. Wiens, D.P., Cheng, J., Beaulieu, N.C. A class of method of moments estimators for the two-parameter gamma family. Pakistan Journal of Statistics, 2003. Vol 19(1). pp. 129-141. 103