Joseph O. Marker Marker Actuarial Services, LLC and University of Michigan CLRS 2011 Meeting. J. Marker, LSMWP, CLRS 1

Similar documents
Loss Simulation Model Testing and Enhancement

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

Rating Exotic Price Coverage in Crop Revenue Insurance

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

Homework Problems Stat 479

QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016

Fitting parametric distributions using R: the fitdistrplus package

Frequency Distribution Models 1- Probability Density Function (PDF)

Aggressive Retrospec.ve Tes.ng of Stochas.c Loss Reserve Models What it Leads To

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Lecture 1: Empirical Properties of Returns

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Topic 8: Model Diagnostics

Certified Quantitative Financial Modeling Professional VS-1243

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Regression and Simulation

Operational Risk Modeling

SAS Simple Linear Regression Example

Introduction to Statistical Data Analysis II

Application of statistical methods in the determination of health loss distribution and health claims behaviour

ก ก ก ก ก ก ก. ก (Food Safety Risk Assessment Workshop) 1 : Fundamental ( ก ( NAC 2010)) 2 3 : Excel and Statistics Simulation Software\

Normal populations. Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi

Where s the Beef Does the Mack Method produce an undernourished range of possible outcomes?

Homework Problems Stat 479

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Logit Models for Binary Data

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Syllabus 2019 Contents

Lecture 3: Probability Distributions (cont d)

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Business Statistics 41000: Probability 3

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction

1. You are given the following information about a stationary AR(2) model:

A Robust Test for Normality

book 2014/5/6 15:21 page 261 #285

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Visual fixations and the computation and comparison of value in simple choice SUPPLEMENTARY MATERIALS

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

An Application of Data Fusion Techniques in Quantitative Operational Risk Management

Dependent Loss Reserving Using Copulas

Tail Risk, Systemic Risk and Copulas

Market Risk Analysis Volume I

Lecture 2. Probability Distributions Theophanis Tsandilas

Presented at the 2012 SCEA/ISPA Joint Annual Conference and Training Workshop -

1. Distinguish three missing data mechanisms:

ENGM 720 Statistical Process Control 4/27/2016. REVIEW SHEET FOR FINAL Topics

Asymmetric Price Transmission: A Copula Approach

Lecture 6: Non Normal Distributions

Maximum Likelihood Estimation

Contents Utility theory and insurance The individual risk model Collective risk models

Mongolia s TOP-20 Index Risk Analysis, Pt. 3

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Session Window. Variable Name Row. Worksheet Window. Double click on MINITAB icon. You will see a split screen: Getting Started with MINITAB

Modelling insured catastrophe losses

2. Copula Methods Background

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Final Exam

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

Lecture 9: Markov and Regime

By-Peril Deductible Factors

SYLLABUS OF BASIC EDUCATION SPRING 2018 Construction and Evaluation of Actuarial Models Exam 4

2.1 Random variable, density function, enumerative density function and distribution function

Lecture 21: Logit Models for Multinomial Responses Continued

And The Winner Is? How to Pick a Better Model

How To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion

Analysis of the Oil Spills from Tanker Ships. Ringo Ching and T. L. Yip

Clark. Outside of a few technical sections, this is a very process-oriented paper. Practice problems are key!

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

QQ Plots Stat 342, Spring 2014 Prof. Guttorp - TA Aaron Zimmerman

The SAS System 11:03 Monday, November 11,

Institute of Actuaries of India Subject CT6 Statistical Methods

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

Maximum Likelihood Estimation

STA 4504/5503 Sample questions for exam True-False questions.

Topic 30: Random Effects Modeling

Lecture 8: Markov and Regime

Analyzing Oil Futures with a Dynamic Nelson-Siegel Model

Financial Models with Levy Processes and Volatility Clustering

yuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0

Fat Tailed Distributions For Cost And Schedule Risks. presented by:

PROBLEMS OF WORLD AGRICULTURE

Modeling Medical Professional Liability Damage Caps An Illinois Case Study

Catastrophe Risk Capital Charge: Evidence from the Thai Non-Life Insurance Industry

Amath 546/Econ 589 Univariate GARCH Models

Homework Problems Stat 479

Background. opportunities. the transformation. probability. at the lower. data come

Central University of Punjab, Bathinda

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

Transcription:

Joseph O. Marker Marker Actuarial Services, LLC and University of Michigan CLRS 2011 Meeting J. Marker, LSMWP, CLRS 1

Expected vs Actual Distribu3on Test distribu+ons of: Number of claims (frequency) Size of ul+mate loss (severity) Sources of significant difference between actual and expected amounts: Programming or communica+on errors Not understanding how sta+s+cal language (e.g. R ) works. Errors or misleading results in R. J. Marker, LSMWP, CLRS 2

Display Raw Simulator Output Claims file Simula+on No Occurrence No Claim No Accident Date Report Date Line Type 1 1 1 20000104 20000227 1 1 1 2 1 20000105 20000818 1 1. Transac+ons file Simula+on No Occurrence No Claim No Date Trans- ac+on C a s e Reserve Payment 1 1 1 20000227 REP 2000 0 1 1 1 20000413 RES 89412 0 1 1 1 20000417 CLS - 91412 141531..... J. Marker, LSMWP, CLRS 3

Another use for Tes3ng informa3on Create Ul+mate Loss File for Analysis Layout Simula - +on. No Occur- rence No Claim No Accident. Date Report. Date Line Type Case. Reserve Pay- ment Idea: Another use for this sec+on of paper If an insurer can summarize its own claim data to this format, then it can use the tests we will discuss to parameterize the Simulator using its data. We have included in this paper all the R code used in tes+ng. J. Marker, LSMWP, CLRS 4

Emphasis in the Paper Document the R code used in performing various tests. Provide references for those who want to explore the modeling more deeply. Provide visual as well as formal tests QQPlots, histograms, densi+es, etc. J. Marker, LSMWP, CLRS 5

Test 1 Frequency, Zero- Modifica3on, Trend Model parameters: # Occurrences ~ Poisson (mean = 120 per year) 1,000 simula+ons One claim per occurrence Frequency Trend 2% per year, three accident years Pr[Claim is Type 1] = 75%; Pr[Type 2] = 25% Pr[CNP( Closed No payment )] = 40% Type and Status independent. Status is a category variable for whether a claim is closed with payment. Test output to see if its distribu+on is consistent with assump+ons. J. Marker, LSMWP, CLRS 6

Test 1 Classical Chi- square Con+ngency Table Actual Counts Χ 2 = 2 ( Actualij Expectedij ) = 0.0819 Expected Expected Counts Type 1 Type 2 Margin Type 1 Type 2 Margin CNP 111,066 37,007 0.398906 CNP 111,029.0 37,044.0 0.398906 CWP 167,268 55,857 0.601094 CWP 167,305.0 55,820.0 0.601094 Margin 0.749826 0.250174 371,198 0.749826 0.250174 371,198 i j ij Pr [Χ 2 > 0.0819 ] = 0.775. The independence of Type and Status is supported. J. Marker, LSMWP, CLRS 7

Test 1 Regression approach Previous result can be obtained using xtabs command in R Result can also be obtained using Poisson GLM Full model: model6x<- glm(count ~ Type + Status + Type*Status, data = temp.datacc.stack, family = poisson, x=t) Reduced model: model5x<- glm(count ~ Type + Status, data = temp.datacc.stack, family = poisson, x=t) Independence obtains if the interac+ve variable Type*Status is not significant. J. Marker, LSMWP, CLRS 8

Test 1 Analysis of variance anova( model5x, model6x, test="chi") Analysis of Deviance Table Response: count Terms Resid. Df Resid. Dev Test Df 1 + Type + Status 143997 160969.366 2 Type + Status + Type * Status 143996 160969.284 +Type:Status 1 Deviance Pr(Chi) 1 2 0.0819088429 0.774727081 Result matches the previous Χ 2 Test. We did not show here the model coefficients, which will produce the expected frequency for each combination of Type and Status. J. Marker, LSMWP, CLRS 9

Test 2 Univariate size of loss Model parameters: Three lines no correla+on in frequency by line # Claims for each line ~ Poisson (mean = 600 per year) Two accident years, 100 simula+ons Size of loss distribu+ons Line 1 lognormal Line 2 Pareto Line 3 - - Weibull Zero trend in frequency and size of loss. Expected count = 600 (freq) x 100 (# sims) x 3 (lines) x 2 (years) = 360,000. Actual # claims: 359,819. J. Marker, LSMWP, CLRS 10

Size of loss tes3ng strategy Person doing tes+ng Person running simula+on. Test all three distribu+ons on each line s output. Produce plots to get a feel for distribu+ons. Fit using maximum likelihood es+ma+on. Produce QQ (quan+le- quan+le) plots Run formal goodness- of- fit tests. J. Marker, LSMWP, CLRS 11

Size of loss Histograms and p.d.f. J. Marker, LSMWP, CLRS 12

Size of loss Histograms and p.d.f. J. Marker, LSMWP, CLRS 13

Size of loss The plots above compare: Histogram of empirical distribu+on Density of the theore+cal distribu+on with m.l.e. parameters The plots show that both Weibull and Pareto fit Lines 2 and 3 well. QQ plots offer another perspec+ve. J. Marker, LSMWP, CLRS 14

Size of loss QQ Plots Example of R code to produce a QQ Plot thqua.w2 <- rweibull(n2,shape=fit.w2$estimate[1],scale=fit.w2$estimate[2]) generate a random sample same size n2 as empirical data qqplot(ultloss2,thqua.w2,xlab="sample Quantiles", ylab="theoretical Quantiles", main="line 2, Weibull") ultloss2 is empirical data, thqua.w2 is the generated sample abline(0,1,col="red ) One can also replace the sample with the quan+les of the theore+cal Weibull c.d.f. J. Marker, LSMWP, CLRS 15

Size of Loss QQ Plot, Line 1 J. Marker, LSMWP, CLRS 16

Size of Loss QQ Plot, Line 2 J. Marker, LSMWP, CLRS 17

Size of Loss QQ Plot, Line 3. J. Marker, LSMWP, CLRS 18

Size of Loss FiRed distribu3ons From QQ Plots, it appears that lognormal fits Line 1, Pareto fits Line 2, and Weibull fits Line 3. Chi- square is a formal goodness- of- fit test. Sec+on 6 discusses senng up the test for Pareto on Line 2. Appendix B contains R code for all the chi- square tests. Komogorov- Smirnov test was applied also, but too late to include results in this presenta+on. J. Marker, LSMWP, CLRS 19

Size of Loss Chi- square g.o.f. test Senng up bins and the expected and actual # claims by bin is not easy in R. Define break points and bins: s = sqrt(var(ultloss2)) ult2.cut <- cut(ultloss2.0, ##binning data breaks = c(0,m-s/2,m,m+s/4,m+s/2,m+s,m+2*s,2*max(ultloss2))) Note: ultloss2.0 is vector of loss sizes, m = mean The table of expected and observed values by bin: # E.2 O.2 x.sq.2 #[1,] 43993.890 44087 0.19705959 Notes: #[2,] 35651.989 35680 0.02200752 E.2 expected number #[3,] 10493.758 10323 2.77864169 O.2 actual number #[4,] 7240.583 7269 0.11152721 x.sq.2 Chi-sq statistic #[5,] 9277.383 9164 1.38570182 #[6,] 8063.576 8176 1.56743997 #[7,] 5289.820 5312 0.09299630 J. Marker, LSMWP, CLRS 20

Size of Loss Chi- square g.o.f. test Execute the Chi- Square test df=length(e.2)-1-2 ## degrees of freedom Result= 4 chi.sq.2 <- sum(x.sq.2) ## test statistic Result = 6.155374 qchisq(.95,df) ## critical value Result = 9.487729 1-pchisq(chi.sq.2,df) ## p-value Result = 0.1878414 Important degrees of freedom = 4, not 6, because the two parameters for expected distribu+on were determined from m.l.e. on the data rather than from a predetermined distribu+on. Using the chi- squared test in R directly would produce a wrong p- value: chisq.test(o.2,p=e.2/n2.0) This test uses degrees of freedom = 6 J. Marker, LSMWP, CLRS 21

Correla3on Model allows correlated variables in two ways: Frequencies among lines. Report lag and size of loss. We tested the correla+on feature for frequency by line. To do this, first specify the parameters for Poisson or nega+ve binomial frequency by line. Then specify correla+on matrix and the copula that links the univariate frequency distribu+ons to the mul+variate distribu+on. The correla+on tes+ng helped the programmer determine how the copula statements from R actually work in the model. J. Marker, LSMWP, CLRS 22

Correla3on simula3on parameters Simulator was run 7/20/2010 with parameters: Three lines Annual frequency by line is Poisson with mean 96. One accident year. 1,000 simula+ons Gaussian (normal) copula Frequency correla+on matrix: Correlation Line 1 Line 2 Line 3 Line 1 1 0 0.99 Line 2 0 1-0.01 Line 3 0.99-0.01 1 J. Marker, LSMWP, CLRS 23

Correla3on data used The annual number of claims were summarized by simula+on and line to a file D:/LSMWP/byyear.csv. Visualize this data: Row (simulation) Line 1 Line 2 Line 3 1 114 95 117 2 89 85 90.... 99 103 78 101 100 96 106 99 J. Marker, LSMWP, CLRS 24

Correla3on FiSng data Detail of sta+s+cal tes+ng for correla+on is in sec+on 6.2.3 and Appendix B of the paper. Data was fit to normal copula using both m.l.e. and inversion of Kendall s tau, using all 1,000 observa+ons, and then goodness of fit tests were applied to each pair of lines. Scaser- plot of Line 1 and Line 3 data Line.3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Line.1 J. Marker, LSMWP, CLRS 25

Correla3on es3mated correla3on from data Details of maximum likelihood es+mate of correla+ons Estimate Std. Error z value Pr(> z ) Rho(line 1 & 2) -0.002112605 0.031977597-0.06606516 0.9473259 Rho(line 1 & 3) 0.979258746 0.000921392 1062.80366235 0.0000000 Rho(line 2 & 3) -0.010486832 0.031974114-0.32797880 0.7429277 Example of statements used for first rho above: normal2.cop <- normalcopula(c(0),dim=2,dispstr="un") gofcopula(normal2.cop, x12, N=100, method = "mpl") Note: x12 is a dataset without line 3 observations. J. Marker, LSMWP, CLRS 26

Correla+on goodness of fit The empirical copula and hypothesized copula are compared under the null hypothesis that they are from the same copula. Cramér- von- Mises ( CvM ) sta+s+c S n is used. Goodness of fit test runs very slowly, so each pair of lines were compared using only the first 100 simula+ons. The two- sample Kolmogorov- Smirnov test was performed. This compared the empirical distribu+on with a random sample from the hypothesized distribu+on. J. Marker, LSMWP, CLRS 27

Correla+on g.o.f. results Line 1&2 Parameter es+mate(s): - 0.002100962 Cramer- von Mises sta+s+c: 0.0203318 with p- value 0.4009901 Line 1&3 Parameter es+mate(s): 0.97926 Cramer- von Mises sta+s+c: 0.007494245 with p- value 0.3811881 Line 2&3 Parameter es+mate(s): - 0.01049841 Cramer- von Mises sta+s+c: 0.01614539 with p- value 0.5891089 J. Marker, LSMWP, CLRS 28

Final Thoughts on Tes3ng Initial tests were simple because we were also checking the mechanics of the model. There are many more features of the model to explore and to test. The testing statements can also be applied to parameterize the model using an insurer s data. The tests described only test ultimate distributions, not the loss development patterns. J. Marker, LSMWP, CLRS 29