An Introduction to Statistical Extreme Value Theory

Similar documents
Modelling Environmental Extremes

Modelling Environmental Extremes

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

GPD-POT and GEV block maxima

AN EXTREME VALUE APPROACH TO PRICING CREDIT RISK

Measuring Financial Risk using Extreme Value Theory: evidence from Pakistan

NEWCASTLE UNIVERSITY SCHOOL OF MATHEMATICS & STATISTICS SEMESTER /2013 MAS8304. Environmental Extremes: Mid semester test

QUANTIFYING THE RISK OF EXTREME EVENTS IN A CHANGING CLIMATE. Rick Katz. Joint Work with Holger Rootzén Chalmers and Gothenburg University, Sweden

Generalized Additive Modelling for Sample Extremes: An Environmental Example

Generalized MLE per Martins and Stedinger

Financial Risk Forecasting Chapter 9 Extreme Value Theory

STOCHASTIC MODELING OF HURRICANE DAMAGE UNDER CLIMATE CHANGE

Analysis of extreme values with random location Abstract Keywords: 1. Introduction and Model

Estimate of Maximum Insurance Loss due to Bushfires

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Modelling catastrophic risk in international equity markets: An extreme value approach. JOHN COTTER University College Dublin

Mongolia s TOP-20 Index Risk Analysis, Pt. 3

Simulation of Extreme Events in the Presence of Spatial Dependence

Introduction to Algorithmic Trading Strategies Lecture 8

Risk Management and Time Series

Financial Risk 2-nd quarter 2012/2013 Tuesdays Thursdays in MVF31 and Pascal

Modelling Joint Distribution of Returns. Dr. Sawsan Hilal space

Extreme Values Modelling of Nairobi Securities Exchange Index

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Analysis of the Oil Spills from Tanker Ships. Ringo Ching and T. L. Yip

Advanced Extremal Models for Operational Risk

Unconventional Resources in US: Potential & Lessons Learned

FAV i R This paper is produced mechanically as part of FAViR. See for more information.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

Lecture 3: Probability Distributions (cont d)

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Final Exam

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

THRESHOLD PARAMETER OF THE EXPECTED LOSSES

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Solutions to Final Exam

Log-Robust Portfolio Management

A STATISTICAL RISK ASSESSMENT OF BITCOIN AND ITS EXTREME TAIL BEHAVIOR

Linda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach

Lecture 2. Vladimir Asriyan and John Mondragon. September 14, UC Berkeley

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

1. You are given the following information about a stationary AR(2) model:

Paper Series of Risk Management in Financial Institutions

Probability Weighted Moments. Andrew Smith

An Application of Extreme Value Theory for Measuring Financial Risk in the Uruguayan Pension Fund 1

INSTITUTE OF ACTUARIES OF INDIA

John Cotter and Kevin Dowd

MEASURING EXTREME RISKS IN THE RWANDA STOCK MARKET

ECON 6022B Problem Set 2 Suggested Solutions Fall 2011

Stochastic Models. Statistics. Walt Pohl. February 28, Department of Business Administration

High Dimensional Bayesian Optimisation and Bandits via Additive Models

Machine Learning for Quantitative Finance

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES

Universität Regensburg Mathematik

14.461: Technological Change, Lectures 12 and 13 Input-Output Linkages: Implications for Productivity and Volatility

Understanding Tail Risk 1

Math 103: The Mean Value Theorem and How Derivatives Shape a Graph

WEATHER EXTREMES AND CLIMATE RISK: STOCHASTIC MODELING OF HURRICANE DAMAGE

An Application of Extreme Value Theory for Measuring Risk

Stochastic model of flow duration curves for selected rivers in Bangladesh

ERASMUS UNIVERSITY ROTTERDAM. Erasmus School of Economics. Extreme quantile estimation under serial dependence

Modelling of extreme losses in natural disasters

QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016

Scaling conditional tail probability and quantile estimators

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Risk Management Performance of Alternative Distribution Functions

Chapter 4: Asymptotic Properties of MLE (Part 3)

Modeling the extremes of temperature time series. Debbie J. Dupuis Department of Decision Sciences HEC Montréal

2002 Statistical Research Center for Complex Systems International Statistical Workshop 19th & 20th June 2002 Seoul National University

Risk Premia and the Conditional Tails of Stock Returns

Dependence Structure and Extreme Comovements in International Equity and Bond Markets

Assessing the performance of Bartlett-Lewis model on the simulation of Athens rainfall

Estimation Procedure for Parametric Survival Distribution Without Covariates

Section B: Risk Measures. Value-at-Risk, Jorion

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Continuous random variables

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

1 Residual life for gamma and Weibull distributions

BROWNIAN MOTION Antonella Basso, Martina Nardon

Amath 546/Econ 589 Univariate GARCH Models: Advanced Topics

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Chapter 7: Estimation Sections

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Gamma Distribution Fitting

JEL Classification: C15, C22, D82, F34, G13, G18, G20

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Modelling insured catastrophe losses

A Generalized Extreme Value Approach to Financial Risk Measurement

Quantifying Operational Risk within Banks according to Basel II

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Random Variables and Probability Distributions

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Heavy Tails in Foreign Exchange Markets: Evidence from Asian Countries

Non-informative Priors Multiparameter Models

Value at Risk with Stable Distributions

Value at Risk Analysis of Gold Price Returns Using Extreme Value Theory

Return Levels Approach and Periods of Currency Crises

Portfolio Optimization. Prof. Daniel P. Palomar

Transcription:

An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Outline Part I - Two basic approaches to extreme value theory block maxima, threshold models. Part II - Uncertainty, dependence, seasonality, trends.

Fundamentals In classical statistics: model the AVERAGE behavior of a process.

Fundamentals In extreme value theory: model the EXTREME behavior (the tail of a distribution).

Fundamentals In extreme value theory: model the EXTREME behavior (the tail of a distribution). Usually deal with very small data sets!

Different Approaches Block Maxima (GEV) R th order statistic Threshold approach (GPD) Point processes

Block Maxima Approach Model extreme daily rainfall in Boulder Take block maximum maximum daily precipitation for each year: M n = max{x 1,..., X 365 } 54 annual records (data points for M n ): Annual maximum of daily rainfall for Boulder (1948 2001) max. daily precip in 1/100 in 100 200 300 400 1950 1960 1970 1980 1990 2000 years

Block Maxima Approach The distribution of M n = max{x 1,..., X n } converges to (as n ) G(x) = exp{ [1 + ξ( x µ σ )] 1 ξ }. G(x) is called the Generalized Extreme Value (GEV) distribution and has 3 parameters: shape parameter ξ location parameter µ scale parameter σ.

Fitting a GEV Estimating Parameters Use the 54 annual records to fit the GEV distribution. Estimate the 3 parameters ξ, µ and σ with maximum likelihood (MLE) using statistical software (R). Get a GEV distribution with ξ = 0.09, µ = 50.16, and σ = 133.85. Density 0.000 0.002 0.004 0.006 100 200 300 400 500

Fitting a GEV Return Levels Often of interest: return level z m P (M > z m ) = 1 m. Expect every m th observation to exceed the level z m. Or: at any point, there is a 1/m% probability to exceed the level z m. Can be computed easily once the parameters are known. E.g. m = 100, then z 100 = 420, i.e. expect the annual daily maximum to exceed 4.2 inches every 100 years in Boulder.

Fitting a GEV Return Levels Often of interest: return level z m P (M > z m ) = 1 m. Expect every m th observation to exceed the level z m. Return Levels for Boulder m year return level 0 100 200 300 400 0 20 40 60 80 100 m (years)

Fitting a GEV Assumptions We did not need to know what the underlying distribution of each X i, i.e. the daily total rainfall was. Underlying assumption: observations are iid independently and identically distributed.

Threshold Models Model exceedances over a high threshold u X u X > u. Daily total rainfall for Boulder exceeding 80 (1/100 in). Allows to make more efficient use of the data. Daily total rainfall for Boulder (1948 2001) max. daily precip in 1/100 in 0 100 200 300 400 1960 1970 1980 1990 2000 years

Threshold Models Model exceedances over a high threshold u X u X > u. Daily total rainfall for Boulder exceeding 80 (1/100 in). Allows to make more efficient use of the data. Annual maximum of daily rainfall for Boulder (1948 2001) max. daily precip in 1/100 in 100 200 300 400 1950 1960 1970 1980 1990 2000 years

Threshold Models Model exceedances over a high threshold u X u X > u. Daily total rainfall for Boulder exceeding 80 (1/100 in). Allows to make more efficient use of the data. Daily total rainfall for Boulder (1948 2001) max. daily precip in 1/100 in 0 100 200 300 400 1960 1970 1980 1990 2000 years

Threshold Models The distribution of Y := X u X > u converges to (as u ) H(y) = 1 (1 + ξ ỹ σ ) 1 ξ. H(y) is called the Generalized Pareto distribution (GPD) with 2 parameters. shape parameter ξ scale parameter σ. The shape parameter ξ is the same parameter as in the GEV distribution.

Fitting a GPD Estimating Parameters Use the 184 exceedances over the threshold u = 80 to fit the GEV distribution. Estimate the 2 parameters ξ and σ (using maximum likelihood using statistical software (R). Get a GPD distribution with ξ = 0.22 and σ = 51.46. Density 0.000 0.005 0.010 0.015 0 50 100 150 200 250

Fitting a GPD Choosing a Threshold Diagnostics: mean excess function linear? Mean Excess 50 0 50 100 150 200 0 100 200 300 400 u

Fitting a GPD Choosing a Threshold Diagnostics: mean excess function linear? Mean Excess 50 0 50 100 150 200 0 100 200 300 400 u

Fitting a GPD Choosing a Threshold Diagnostics: shape and modified scale constant? Modified Scale 0 500 1000 50 100 150 200 250 300 Threshold Shape 3 2 1 0 1 50 100 150 200 250 300 Threshold

Fitting a GPD Choosing a Threshold Alternatively: Choose the threshold u so that a certain percentage of the data lies above it (robust and automatic, but is the approximation valid?).

Fitting a GPD Return Levels Compute 100-year return level for daily rainfall totals using the threshold approach: z 36500 = 429, i.e. expect the daily total to exceed 4.29 inches every 100 years (36500 days). Return Levels for Boulder m year return level 0 100 200 300 400 0 20 40 60 80 100 m (years)

Uncertainty (GEV) Essentially, the maximum likelihood approach yields standard errors for the estimates and therefore confidence bounds on the parameters. From the GEV (block maxima) fit for the yearly maximum of daily precipitation for Boulder: ξ = 0.09, 95% conf. interval is (-0.1,0.28). σ = 50.16, 95% conf. interval is (38.77, 61.54). µ = 133.85, 95% conf. interval is (118.58,149.12).

Uncertainty (GEV) Essentially, the maximum likelihood approach yields standard errors for the estimates. These errors can be propagated to the return levels: Return Level 100 200 300 400 500 600 0.1 1 10 100 1000 Return Period

Uncertainty (GPD) More data means less uncertainty. From the GPD (threshold model) fit for daily precipitation in Boulder: ξ = 0.22, 95% conf. interval is (-0.12,0.16). σ = 51.46, 95% conf. interval is (40.70, 62.21).

Uncertainty (GPD) More data means less uncertainty. From the GPD (threshold model) fit for daily precipitation in Boulder: Return level 100 200 300 400 0.1 1 10 100 1000 Return period (years)

Dependence Declustering For the GEV and GPD approximations to be valid, we assume independence of the data. If the data is dependent, can use declustering to make them independent. E.g. pick only one (the max) point in a cluster that exceeds a threshold.

Dependence Declustering Assume we want to make inference about hourly precipitation in Boulder. To decluster (instead of using 24 values for each day), we select only the maximum daily (1-h) record to fit the GPD model. daily 1h max. precip. 0 50 100 150 1950 1960 1970 1980 1990 2000 time

Dependence (fitting the GPD) Choosing a threshold mean excess function as a diagnostic: Mean Excess 50 0 50 100 150 200 0 100 200 300 400 u

Dependence (fitting the GPD) Choosing a threshold mean excess function as a diagnostic: Mean Excess 50 0 50 100 150 200 0 100 200 300 400 u

Dependence (fitting the GPD) Modified Scale 100 0 50 150 50 100 150 200 Threshold Shape 0.4 0.0 0.4 0.8 50 100 150 200 Threshold

Dependence (fitting the GPD) u = 75 seems to be a good threshold using the diagnostics. But u = 75 only leaves 28 data points above the threshold. Use u = 35 instead (with 108 data points above the threshold) to get the following estimates: ξ = 0.05, 95% conf. interval is (-0.27,0.15). σ = 27.98, 95% conf. interval is (19.94, 36.02). 100-year return level is z m = 185, i.e. expect the hourly rainfall to exceed 1.85 inches every 100 years (10-year level is 1.36 inches.)

Dependence (fitting the GPD) Use u = 35 (with 108 data points above the threshold) to fit a GPD model. Return Levels for Boulder m year (hourly) return level 0 50 100 150 0 20 40 60 80 100 m (years)

Seasonality daily 1h max. precip. 0 50 100 150 0.3 0.4 0.5 0.6 0.7 0.8 fraction of the year

Seasonality To incorporate seasonality, link the scale parameter to covariates to describe the seasonal cycle. Use the covariates X 1 (t) = sin(2πf(t)) and X 2 (t) = cos(2πf(t)), where f(t) =fraction of the year for each day t. covariates 1.0 0.5 0.0 0.5 1.0 0.3 0.4 0.5 0.6 0.7 0.8 fraction of the year

Seasonality To incorporate seasonality, link the scale parameter to covariates to describe the seasonal cycle. Use the covariates X 1 (t) = sin(2πf(t)) and X 2 (t) = cos(2πf(t)), where f(t) =fraction of the year for each day t. Use an exponential link function to link the covariates to the scale parameter: Fit a GPD with density σ(t) = exp(β 0 + β 1 X 1 (t) + β 2 X 2 (t)). GP D (ξ, σ(t) = exp(β 0 + β 1 X 1 (t) + β 2 X 2 (t))).