A New Hybrid Estimation Method for the Generalized Pareto Distribution

Similar documents
Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

STRESS-STRENGTH RELIABILITY ESTIMATION

Analysis of the Oil Spills from Tanker Ships. Ringo Ching and T. L. Yip

Measuring Financial Risk using Extreme Value Theory: evidence from Pakistan

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Analysis of truncated data with application to the operational risk estimation

Much of what appears here comes from ideas presented in the book:

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Introduction to Algorithmic Trading Strategies Lecture 8

Financial Time Series and Their Characteristics

Financial Risk Forecasting Chapter 9 Extreme Value Theory

Chapter 7: Estimation Sections

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

Chapter 8: Sampling distributions of estimators Sections

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Lecture 10: Point Estimation

12 The Bootstrap and why it works

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Modelling Environmental Extremes

Modelling Environmental Extremes

Back to estimators...

An Application of Extreme Value Theory for Measuring Financial Risk in the Uruguayan Pension Fund 1

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

Chapter 8: Sampling distributions of estimators Sections

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Homework Problems Stat 479

Applied Statistics I

MEASURING EXTREME RISKS IN THE RWANDA STOCK MARKET

Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk?

GPD-POT and GEV block maxima

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

FINITE SAMPLE DISTRIBUTIONS OF RISK-RETURN RATIOS

Statistical analysis and bootstrapping

Paper Series of Risk Management in Financial Institutions

Chapter 4: Asymptotic Properties of MLE (Part 3)

Random Variables and Probability Distributions

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

Estimating the Parameters of Closed Skew-Normal Distribution Under LINEX Loss Function

Chapter 7: Estimation Sections

An Introduction to Statistical Extreme Value Theory

Generalized MLE per Martins and Stedinger

Business Statistics 41000: Probability 3

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Point Estimation. Edwin Leuven

REINSURANCE RATE-MAKING WITH PARAMETRIC AND NON-PARAMETRIC MODELS

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Chapter 2 Uncertainty Analysis and Sampling Techniques

Understanding extreme stock trading volume by generalized Pareto distribution

Chapter 7: Point Estimation and Sampling Distributions

Learning From Data: MLE. Maximum Likelihood Estimators

Statistical estimation

Maximum Likelihood Estimates for Alpha and Beta With Zero SAIDI Days

QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016

Chapter 7: Estimation Sections

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

AN EXTREME VALUE APPROACH TO PRICING CREDIT RISK

MVE051/MSG Lecture 7

Maximum Likelihood Estimation

Frequency Distribution Models 1- Probability Density Function (PDF)

The normal distribution is a theoretical model derived mathematically and not empirically.

Technology Support Center Issue

Modeling of Price. Ximing Wu Texas A&M University

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

An Insight Into Heavy-Tailed Distribution

Fitting financial time series returns distributions: a mixture normality approach

STA 532: Theory of Statistical Inference

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Point Estimation. Copyright Cengage Learning. All rights reserved.

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

SYLLABUS OF BASIC EDUCATION SPRING 2018 Construction and Evaluation of Actuarial Models Exam 4

Practice Exam 1. Loss Amount Number of Losses

Market Risk Analysis Volume I

Chapter 6 Forecasting Volatility using Stochastic Volatility Model

The method of Maximum Likelihood.

Mongolia s TOP-20 Index Risk Analysis, Pt. 3

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Fitting parametric distributions using R: the fitdistrplus package

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

An Information Based Methodology for the Change Point Problem Under the Non-central Skew t Distribution with Applications.

Using Halton Sequences. in Random Parameters Logit Models

Robust Critical Values for the Jarque-bera Test for Normality

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

Small Area Estimation of Poverty Indicators using Interval Censored Income Data

An Improved Skewness Measure

The data-driven COS method

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES

Strategies for Improving the Efficiency of Monte-Carlo Methods

PIVOTAL QUANTILE ESTIMATES IN VAR CALCULATIONS. Peter Schaller, Bank Austria Creditanstalt (BA-CA) Wien,

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Amath 546/Econ 589 Univariate GARCH Models

ELEMENTS OF MONTE CARLO SIMULATION

Week 7 Quantitative Analysis of Financial Markets Simulation Methods

Chapter 4 Variability

Hardy Weinberg Model- 6 Genotypes

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Transcription:

A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 1/32

1 Introduction The Generalized Pareto Distribution Application 2 Estimation of the GPD Parameters A Review of Literature The Maximum Likelihood Estimation The Maximum Goodness-of-Fit Estimation A New Hybrid Estimation Method 3 Simulation Study Bias and MSE Comparisons 4 An Example An Example: Bilbao waves data 5 Final Conclusions A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 2/32

The Generalized Pareto Distribution The Generalized Pareto Distribution The Generalized Pareto Distribution (GPD) is a two-parameter family of distributions first introduced by Pickands (1975) with the distribution function (cdf) F σ,k (x) = and the probability density function (pdf) { 1 (1 kx/σ) 1/k, if k 0, 1 e x/σ, if k = 0, (1) f σ,k (x) = { σ 1 (1 kx/σ) 1/k 1, if k 0, σ 1 e x/σ, if k = 0, where the σ > 0 and < k < are the scale and shape parameters, and the domain of x is (0, ) when k 0 or (0, σ/k) when k > 0. We denote the above distribution by GPD(σ, k). (2) A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 3/32

The Generalized Pareto Distribution The Generalized Pareto Distribution The GPD is important because of its versatility and flexibility. The special cases of GPD are when k = 1, the GPD becomes the uniform distribution in the range [0, σ]; when k = 0, the GPD becomes the exponential distribution with mean σ as taken the limit; when k < 0, the GPD reduces to the Pareto distribution (PD). The mean of the GPD is σ/(1 + k); and the variance of the GPD is σ 2 /[(1 + k) 2 (1 + 2k)], but its mean and variance exist only if k > 1 and k > 1/2, respectively. In general, the rth central moment of the GPD exists only if k > 1/r. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 4/32

The Generalized Pareto Distribution Graphing the GPD The Figure 1 shows the density functions of the GPD with σ = 1 fixed. Plot of GPD density, with σ = 1 fixed, and k > 0 Plot of GPD density, with σ = 1 fixed, and k <= 0 f(x) 0.0 0.5 1.0 1.5 2.0 k=0.1 k=0.5 k=0.75 k=1 k=1.25 f(x) 0.0 0.2 0.4 0.6 0.8 1.0 k = 0 k = 0.5 k = 2 0.0 0.5 1.0 1.5 2.0 0 1 2 3 4 5 6 x x Figure 1: The Density functions of the GPD with different k. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 5/32

Application Application: Peaks Over Thresholds (POT) In extreme value theory, there are generally two methods for modeling the extremes: The classical approach is based on the limiting distribution of the maxima or minima of a sequence of i.i.d. random variables, which turns out to be the generalized extreme value distribution (GEVD). The GPD was introduced to model the exceedences X i t over a high threshold, where {X i } are the sample observations and t is a given threshold: examples are flood levels of rivers, heights of waves, etc. An attractive and useful feature of the GPD in this application is its stability. It may easily be shown that if X follows a GPD(σ, k), then the conditional distribution of X t given that X > t for any level t follows the GPD(σ kt, k). A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 6/32

A Review of Literature A Review of Literature Given a random sample from the GPD, most of the existing estimation methods for the GPD parameters σ and k can give some theoretical or computational problems. As the most classical and important method of estimation in statistics, the maximum likelihood (ML) method, has been considered by DuMouchel (1983), Davison (1984), Smith (1984, 1985), Grimshaw (1993), Choulakian and Stephens (2001), and the references therein. We will present the ML method in more details in the next section. Hosking and Wallis (1987) and Dupuis and Tsao (1998) studied some alternative estimation methods to the method of moment (MOM), and the probability-weighted moment (PWM) method. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 7/32

A Review of Literature A Review of Literature Castillo and Hadi (1997) proposed an elemental percentile method (EPM) which was based on the idea to make full use of the order statistics by initially equating the GPD distribution function to all pairs of the order statistics, and then use the median as the overall estimates of σ and k. Luceño (2006) brought out the maximum goodness-of-fit estimation (MGFE) method based on the family of the empirical distribution function (EDF) statistics. In fact, this method can be dated back to Wolfowitz (1953, 1957) under a more general name of minimum distance estimation. We will carefully investigate the MGFE method in the next section, and borrow some of its ideas to develop our new hybrid estimation method. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 8/32

A Review of Literature A Review of Literature Zhang (2007) suggested the likelihood moment estimation (LME) method for the GPD to overcome the computational problems faced by the ML method. Zhang and Stephens (2009) provided a new efficient estimation method based on the likelihood and the empirical Bayesian method (EBM). But this method is quite sensitive to the choice of the shape of the prior distribution as indicated in their paper. In order to improve the poor performance of the EBM estimators in the heavy-tailed cases, Zhang (2010) introduced a modified EBM (EBM*) by updating a more reliable and adaptive prior. The main conclusion of the paper was that the EBM* generally outperforms the other existing estimation procedures in the range 6 < k < 1/2, in terms of estimation bias and efficiency. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 9/32

The Maximum Likelihood Estimation The Estimating Equations Given a random sample X = (X 1, X 2,..., X n ) from the GPD with the cdf given in (1), the log-likelihood function is given by ( l(σ, k; X ) = n log σ 1 1 ) n ( log 1 kx ) i. k σ i=1 To find the maximum of the log-likelihood over the parameter space A = {k < 0, σ > 0} {k > 0, σ/k > X (n) }, consider the first derivatives of the GPD log-likelihood with respect to k and σ, and set them to be zero to have the following estimating equations { n n(k 1) = i=1 log ( 1 kx ) i n σ + (k 1) i=1 k = n 1 n i=1 log ( 1 kx ) i σ. ( ) 1 kx i 1 σ, A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 10/32

The Maximum Likelihood Estimation The Estimating Equations As pointed out by Davison (1984), the above bivariate maximization can be reduced to a one-dimensional search because the two estimating equations are only dependent on the ratio θ = k/σ (θ < 1/X (n) ), and then given a value of θ, a close-form expression for k is available. So it is natural and convenient to reparameterize the (σ, k) to (θ, k). Based on the log-likelihood function of (θ, k) and substituting k with k = n 1 n i=1 log (1 θx i), we have the profile log-likelihood function of θ given by [ ] n l(θ; X ) = n log (1 θx i ) n log 1 n log (1 θx i ). (3) nθ i=1 i=1 A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 11/32

The Maximum Likelihood Estimation Computing the MLE Supposed a local maximum of (3) can be found at ˆθ MLE numerically over the parameter space B = { θ < 1/X (n) }, then the MLE of σ and k are given by ˆk MLE = n 1 n log(1 ˆθ MLE X i ) and ˆσ MLE = ˆk MLE /ˆθ MLE. (4) i=1 But the numerical solution of ˆθ MLE could be complex since there could have more than one root for the first derivative of (3) to be zero, and some convergence problem may occur when θ gets closer to its boundary, so the constraint θ < 1/X (n) needs to be cared about. An algorithm for computing the MLE for the GPD parameters was designed in Grimshaw (1993). A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 12/32

The Maximum Likelihood Estimation Computing the MLE When k < 1/2, Smith (1984) proved that the ML estimators given in (4) is asymptotically normally distributed with the asymptotic variances achieving the Cramer-Rao lower bound under some proper regularity conditions. Specifically, we have [ ˆσMLE ˆk MLE ] N ([ σ k ], n 1[ 2σ 2 (1 k) σ(1 k) ]) σ(1 k) (1 k) 2, k < 1/2. When k 1/2, Smith (1984) identified as the non-regular case since the regularity conditions fail to hold, and the convergence problems may occur in this case. When k > 1, the MLE does not exist because the likelihood function near the endpoint tends to infinity as x approaches σ/k. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 13/32

The Maximum Goodness-of-Fit Estimation EDF Statistics Given a random sample X = (X 1, X 2,..., X n ) from a continuous distribution function F (x; θ), let F n (x) denote the empirical distribution function (EDF), that is F n (x) = 1 n n I Xi (x), i=1 where I Xi (x) = 1 if X i x, and I Xi (x) = 0 if X i > x. Then any statistic that measures the discrepancy between F n (x) and F (x; θ) is called an EDF statistic, which is originally used to test the goodness-of-fit (GOF) of fitting a continuous probability distribution to sample data. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 14/32

The Maximum Goodness-of-Fit Estimation EDF Statistics There are mainly two classes of EDF statistics: the supremum EDF statistics which include the Kolmogorov-Smirnov (KS) statistic, the Kuiper statistic; and the integral EDF statistics which include the Craḿer-von Mises statistic (CM), the Anderson-Darling (AD) statistic and etc. In Luceño (2006), the idea of GOF was borrowed for the parameter estimation purpose for the GPD. The proposed maximum goodness-of-fit estimator (MGFE) was obtained by minimizing any of the EDF statistics with respect to unknown parameters σ and k. We will only focus on the MGFE based on the AD statistic. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 15/32

The Maximum Goodness-of-Fit Estimation Computing the MGFE In terms of the GPD with the cdf F (x; σ, k), the definition of the AD statistic A 2 (σ, k) is A 2 (σ, k) = n {F n (x) F (x; σ, k)} 2 {F (x; σ, k)(1 F (x; σ, k)} 1 df (x; σ, k) For computational purposes, the above AD statistic can be expressed in an alternative form since the F n (x) is a step function with jump at each order statistics. By applying the probability integral transformation to the ordered sample, we denote z i = F (x (i) ; σ, k), i = 1,... n. Then the AD statistic A 2 (σ, k) can be written as follows A 2 (σ, k) = n 1 n n {(2i 1) ln z i + (2n + 1 2i) ln(1 z i )}. (5) i=1 A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 16/32

The Maximum Goodness-of-Fit Estimation Computing the MGFE The final estimates ˆσ MGFE and ˆk MGFE of the GPD are obtained by minimizing the AD statistic A 2 (σ, k; x) given in (5) with respect to the unknown parameters σ and k. The minimization should be carefully performed over the parameter space A = {k < 0, σ > 0} {k > 0, σ/k > X (n) }. In general, the technique of MGFE was shown to be able to deal with the GPD parameters estimation when the MLE and other methods failed, and even in the context of generalized linear model. However, the two-dimensional numerical optimization could be complex and relatively time-consuming, and a well specified starting point (σ (0), k (0) ) could be useful. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 17/32

A New Hybrid Estimation Method Motivation As we have discussed, the MLE can possess high large-sample efficiency whenever it exists in a restricted parameter space, while the MGFE have small bias and can always be found provided a well chosen initial point. Motivated by the idea to take advantage of both the MGFE and the MLE, we propose a new hybrid estimation method, which primarily relies on the MGFE to maintain the small bias and then improves the efficiency by incorporating the useful maximum likelihood information. At the same time, the computational effort is also greatly reduced. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 18/32

A New Hybrid Estimation Method Computing the New Hybrid Estimates Under the reparameterization of θ = k/σ for the GPD, the MLE of k and θ must satisfy k = n 1 n i=1 log (1 θx i). For the MGFE based on the AD statistic A 2 (σ, k; X ), we can consider the reparameterized version and substitute the above maximum likelihood relationship into it to have a simplified univariate minimization problem. Specifically, we consider minimizing the target function G, so the problem becomes a univariate minimization given the maximum likelihood relationship as a constraint min G(θ; X ) = min θ B σ,k A A2 (σ, k; X ) θ = k/σ, k = n 1 log (1 θx i ). A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 19/32

A New Hybrid Estimation Method Computing the New Hybrid Estimates The target function G based on AD statistic can be written in a simple computational form { G(θ; X ) = n 1 n (2i 1) log [1 (1 θx i ) n/ ] j log(1 θx j ) n i=1 } log(1 θx i ) n (2n + 1 2i) j log(1 θx. (6) j) In the POT applications the sample size is usual small. To reduce the bias in such cases, through our extensive simulation, an effective adjustment in the above G(θ; X ) is suggested, which is to replace the first n of the last term by (n 0.5) to ensure that as n gets larger, this adjustment vanishes. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 20/32

A New Hybrid Estimation Method Computing the New Hybrid Estimates Our new hybrid estimator ˆθ NEW of θ is defined to be the value of θ at which G(θ; X ) is minimized subject to the boundary condition θ < 1/X (n). Finally, the new hybrid estimators ˆσ NEW and ˆk NEW can be calculated as ˆk NEW = n 1 n log(1 ˆθ NEW X i ) and ˆσ NEW = ˆk NEW /ˆθ NEW. (7) i=1 It is easy to see that the new hybrid estimators ˆk NEW and ˆσ NEW will always give valid estimates. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 21/32

A New Hybrid Estimation Method Inference Because the new hybrid method combines both the maximum goodness-of-fit and the maximum likelihood methods, it seems not easy to derive the asymptotic variances of these new estimators. Fortunately, the bootstrap resampling method introduced by Efron (1977) provides us an alternative to find approximations to the distributions of the new hybrid estimators, and based on the bootstrap samples we can calculate the standard errors of the new estimators. The use of bootstrap method to find the standard error for other different estimators for the GPD has already been suggested by many other authors. A reason for preferring the bootstrap method is that the confidence intervals obtained for the parameters can always make sense by satisfying the endpoint constraints. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 22/32

Bias and MSE Comparisons Finite Sample Simulation We will only include the classical MLE, the MGFE based on AD statistic and the improved EBM* in the finite-sample comparisons. The range of k considered is 6 < k < 2, which covers all the ranges used previously in the literature, and also the commonly used range 1 < k < 1/2, the non-regular range k > 1/2 where the MLE has trouble and the range k < 1/2 where the GPD has infinite variance. It is already known that the MLE have severe problems when k > 1/2. To deal with such unusual behavior of the MLE as k approaches 1/2 in simulation, we employ a quasi-maximum likelihood (QML) method used in Luceño (2006) which is to replace the MLE of (ˆσ MLE, ˆk MLE ) by n 1 ( k QML = (n 1) 1 log 1 X ) (i) and σ QML = X k QML X (n). (n) i=1 A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 23/32

Bias and MSE Comparisons Bias Comparison Without loss of generality, the scale parameter σ is taken to be 1 because the estimates for the GPD are invariant with respect to the values of σ. As the widely accepted criteria for measuring the accuracy of an estimator, the estimation bias are calculated for the finite sample sizes n = 50 based on 10, 000 random samples. The biases for different estimators of σ and k are plotted against k in Figure 2. We see that our new hybrid estimators have significantly improved the estimation biases for σ and k, especially when compared with the MGFE and the MLE which supply the original ideas behind it. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 24/32

Bias and MSE Comparisons Bias Comparison Bias for scale, n=50 Bias for shape, n=50 bias(sigma) 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.5 NEW(AD) MGFE(AD) MLE EBM* bias(k) 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.5 NEW(AD) MGFE(AD) MLE EBM* 6 4 2 0 2 6 4 2 0 2 k k A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 25/32

Bias and MSE Comparisons MSE Comparison As the widely accepted criteria for measuring the overall quality of an estimator, the estimation mean square error (MSE) are calculated for the finite sample sizes n = 50 based on 10, 000 random samples. The MSEs for different estimators of σ and k are plotted against k in Figure 3. From the figure, we see that our new hybrid estimators always possess comparable MSEs, and improve over the MLE for estimating the scale σ, and over the MGFE for estimating the shape k. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 26/32

Bias and MSE Comparisons MSE Comparison MSE for scale, n=50 MSE for shape, n=50 MSE(sigma) 0.0 0.2 0.4 0.6 0.8 1.0 NEW(AD) MGFE(AD) MLE EBM* MSE(k) 0.0 0.2 0.4 0.6 0.8 1.0 NEW(AD) MGFE(AD) MLE EBM* 6 4 2 0 2 6 4 2 0 2 k k A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 27/32

An Example: Bilbao waves data An Example: Bilbao waves data To illustrate the advantages of the new hybrid estimation procedure, we will present a real-world example originally analyzed in Castillo and Hadi (1997), which consists of the zero-crossing hourly mean periods (in seconds) of the sea waves measured in the Bilbao bay, Spain. Later on, this data set was revisited in Luceño (2006) and in Zhang and Stephens (2009). Only the 197 observations with periods above 7 seconds were taken into consideration. We model this data by the GPD using thresholds at t = 7.5 following the above mentioned authors. The table below provides the estimated GPD parameters for Bilbao waves data using different estimators. ˆσ ˆk t m MLE EBM* MGFE Hybrid MLE EBM* MGFE Hybrid 7.5 154 1.860 1.722 1.632 1.626 0.768 0.686 0.614 0.620 A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 28/32

An Example: Bilbao waves data An Example: Bilbao waves data To check graphically whether the minimum of the target function G defined in (6) is reached at ˆθ NEW = ˆk NEW /ˆσ NEW = 0.3812, the G(θ; X ) and its first derivative are plotted for the Bilbao waves data at t = 7.5. The boundary condition for this given data set is θ < 1/X (n) = 1/2.4 = 0.4167. The plot of G for the Bilbao waves data The plot of first derivative of G for the Bilbao waves data G(θ) 0 2 4 6 8 dg/dθ 10 5 0 5 10 15 20 0.4 0.2 0.0 0.2 0.4 0.4 0.2 0.0 0.2 0.4 θ A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 29/32 θ

An Example: Bilbao waves data An Example: Bilbao waves data The following figure shows the histograms of B = 1000 parametric bootstrap samples of ˆσ NEW and ˆk NEW for the Bilbao waves data. The parametric bootstrap standard errors for the hybrid estimates are se(ˆσ NEW ) = 0.167 and se(ˆk NEW ) = 0.090, and the corresponding 95% bootstrap confidence intervals for σ and k are (1.288, 1.949) and (0.413, 0.771). Histogram of 1000 parametric bootstrap samples of sigma Histogram of 1000 parametric bootstrap samples of k Frequency 0 50 100 150 200 Frequency 0 50 100 150 200 250 1.0 1.2 1.4 1.6 1.8 2.0 2.2 0.2 0.4 0.6 0.8 b.se[, 1] b.se[, 2] A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 30/32

Final Comments The new hybrid estimating procedure has been introduced for the GPD parameters, and it has several advantages. First, the new hybrid estimates are easily obtained by optimizing a single parameter function using some standard algorithms, and the existence and feasibility of the hybrid estimates can even be verified graphically. Second, unlike some other existing methods, the new hybrid estimates can always be found for the entire parameter space. Third, the standard errors and confidence intervals can be easily calculated by the bootstrap method. Finally, the simulation study of bias and MSE showed that the proposed hybrid estimators greatly improve over the MLE and the MGFE, and well compared with the other existing methods. A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 31/32

Acknowledgements THANK YOU! A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 32/32