Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Similar documents
Session 5. Predictive Modeling in Life Insurance

Wage Determinants Analysis by Quantile Regression Tree

Five Things You Should Know About Quantile Regression

Contents Utility theory and insurance The individual risk model Collective risk models

By-Peril Deductible Factors

Not your average regression: A practical introduction to quantile regression. James Ellens

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

Bayesian Non-linear Quantile Regression with Application in Decline Curve Analysis for Petroleum Reservoirs.

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Session 5. A brief introduction to Predictive Modeling

Quantile Regression in Survival Analysis

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

Computational Statistics Handbook with MATLAB

UPDATED IAA EDUCATION SYLLABUS

Market Risk Analysis Volume I

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Intro to GLM Day 2: GLM and Maximum Likelihood

ECS171: Machine Learning

Institute of Actuaries of India Subject CT6 Statistical Methods

WC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology

Predictive Modeling GLM and Price Elasticity Model. David Dou October 8 th, 2014

Estimation Procedure for Parametric Survival Distribution Without Covariates

Introduction Models for claim numbers and claim sizes

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Statistical Models and Methods for Financial Markets

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Statistical Case Estimation Modelling

And The Winner Is? How to Pick a Better Model

Subject CS2A Risk Modelling and Survival Analysis Core Principles

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

Variable Life Insurance

Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY

Quantile Regression due to Skewness. and Outliers

2.1 Random variable, density function, enumerative density function and distribution function

Stochastic Claims Reserving _ Methods in Insurance

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

GLM III - The Matrix Reloaded

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Quantile regression and surroundings using SAS

A Comprehensive, Non-Aggregated, Stochastic Approach to. Loss Development

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

(iii) Under equal cluster sampling, show that ( ) notations. (d) Attempt any four of the following:

Advanced Risk Management Use of Predictive Modeling in Underwriting and Pricing

Discussion of Using Tiers for Insurance Segmentation from Pricing, Underwriting and Product Management Perspectives

1. You are given the following information about a stationary AR(2) model:

SELECTION OF VARIABLES INFLUENCING IRAQI BANKS DEPOSITS BY USING NEW BAYESIAN LASSO QUANTILE REGRESSION

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Expanding Predictive Analytics Through the Use of Machine Learning

Data Mining Applications in Health Insurance

Monte Carlo Methods in Financial Engineering

Implementing Models in Quantitative Finance: Methods and Cases

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

9. Logit and Probit Models For Dichotomous Data

Window Width Selection for L 2 Adjusted Quantile Regression

Discrete-time Asset Pricing Models in Applied Stochastic Finance

Predictive Modeling Cross Selling of Home Loans to Credit Card Customers

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Optimal Layers for Catastrophe Reinsurance

Comprehensive Risk and Performance Attribution

Stochastic Approximation Algorithms and Applications

The Fundamentals of Reserve Variability: From Methods to Models Central States Actuarial Forum August 26-27, 2010

Agricultural and Applied Economics 637 Applied Econometrics II

Analysis of Microdata

ASSIGNMENT - 1, MAY M.Sc. (PREVIOUS) FIRST YEAR DEGREE STATISTICS. Maximum : 20 MARKS Answer ALL questions.

2017 IAA EDUCATION SYLLABUS

A case study on using generalized additive models to fit credit rating scores

Fast Convergence of Regress-later Series Estimators

Stochastic Analysis Of Long Term Multiple-Decrement Contracts

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

arxiv: v1 [q-fin.rm] 13 Dec 2016

Session 113 PD, Data and Model Actuaries Should be an Expert of Both. Moderator: David L. Snell, ASA, MAAA

Fitting financial time series returns distributions: a mixture normality approach

The Matrix Inverted A Primer in GLM Theory and Practical Issues. March 11-12, 2004 CAS Ratemaking Seminar Roosevelt Mosley, FCAS, MAAA

Random Tree Method. Monte Carlo Methods in Financial Engineering

Statistics and Finance

Environmental samples below the limits of detection comparing regression methods to predict environmental concentrations ABSTRACT INTRODUCTION

Predicting Economic Recession using Data Mining Techniques

Market Risk Analysis Volume II. Practical Financial Econometrics

List of tables List of boxes List of screenshots Preface to the third edition Acknowledgements

Appendix A: Introduction to Queueing Theory

Insurance Actuarial Analysis. Max Europe Holdings Ltd Dublin

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Syllabus 2019 Contents

Dynamic Copula Methods in Finance

TABLE OF CONTENTS - VOLUME 2

Zürich Spring School on Lévy Processes. Poster abstracts

Making the Link between Actuaries and Data Science

A new look at tree based approaches

Institute of Actuaries of India. March 2018 Examination

Effects of missing data in credit risk scoring. A comparative analysis of methods to gain robustness in presence of sparce data

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

A Comprehensive, Non-Aggregated, Stochastic Approach to Loss Development

REINSURANCE RATE-MAKING WITH PARAMETRIC AND NON-PARAMETRIC MODELS

Modeling Implied Volatility

MWSUG Paper AA 04. Claims Analytics. Mei Najim, Gallagher Bassett Services, Rolling Meadows, IL

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Heterogeneous Risks and GLM Extensions

CPSC 540: Machine Learning

Transcription:

Quantile Regression By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Agenda Overview of Predictive Modeling for P&C Applications Quantile Regression: Background and Theory Two Commercial Line Case Studies Claim Severity Modeling and Loss Ratio Modeling Q&A

Overview of Predictive Modeling for P&C Applications

Overview of PM for P&C Applications Modeling Techniques: Ordinary Least Square and Linear Regression: Normal distribution assumption Linear relationship between target and covariates Predict the mean of the target against the covariates GLM: Expansion of distribution assumptions with exponential family distributions,: frequency - Poisson, severity - Gamma, pure premium and loss ratio - Tweedie Linear relationship between the mean of the target and the covariates through the link function Minimum Bias/General Iteration Algorithm: Iterative algorithm Essentially derives the same results as GLM Categorical predictive variables only Linear relationship between target and covariates Neural Networks: Non-linear regression: a series of logit functions to approximate the nonlinear relationship between target and covariates Originated in the data mining field; a curve fitting technique; non parametric assumptions Different cost or error functions can be selected for minimization, for example, a least square error term or an absolute error function One popular algorithm is the backward propagation of errors

Overview of PM for P&C Applications Modeling Techniques: MARS: Non-linear regression using series of hockey stick function to approximate the nonlinear relationship between the target variable and the covariates Can test interaction between covariates CART: A recursive partitioning technique - regression trees for continuous target and classification trees for categorical target variables Originated in the data mining field Different error functions can be used, such as the least square error function, Gini, entropy, etc. Strong for categorical variables, but weak for continuous variables with a linear relationship with the target Underlying Algorithms for Different Techniques: Statistical modeling techniques, such as GLM and OLS, attempt to maximize the likelihood functions for the underlying distribution Non-statistical modeling techniques, such as Nnet, CART, and MARS, attempt to minimize a pre-determined error function, and the error functions can include a least square error function, an absolute error function, a Gini function, etc. Different optimization techniques are deployed to find the solutions

Overview of PM for P&C Applications Types of Model for P&C Applications: Binary target models for retention, cross sale, or marketing: Logistic regression Target variable is a binary variable with a logit distribution assumption Frequency and severity models Popular for class plan optimization Poisson for frequency and Gamma for severity Becoming the standard approach for personal line pricing Used when data and exposure information are in a good condition Loss cost and pure premium models: Long history with the minimum bias technique One step modeling instead of 2 steps frequency and severity modeling Tweedie distribution assumption can be applied, but may not fit the data well Software support of the Tweedie assumption is not popular yet

Overview of PM for P&C Applications Types of Model for P&C Applications: Loss ratio modeling: More popular for commercial line applications Tweedie assumption can be applied as well, but may not fit the data well Applied when data challenges are significant for the frequency or pure premium models, for example, exposure not accurate or not homogenous May not focusing on the predictive value of the target, but the ranked value of the target, for example, the worst X% of policies for cancellation, or the best Y% for the most preferred company placement Application is not for setting class plan factors or rates, but for underwriting, such as tiering, company placement, credit-debit assignment, new business rejection/acceptance, or renewal business cancellation

Quantile Regression Background and Theory

Quantile Regression Background Originated in the Econometric field by Roger Koenker and Gilbert Bassett from University of Illinois: Roger Koenker and Gilber Bassett, Regression Quantiles, Econometrica, (1978) Traditional modeling, such as OLS and GLM, is to model the conditional mean of the target variable against the covariates, while Quantile Regression is to model conditional percentiles of the target variable against the covariates. For example, driver gender may not impact the mean of claim severity, but may have a significant impact on the 95% percentile of the severity The technique has been used in other industries and researches, such as ecology, healthcare, and financial economics, where data is volatile and extremes are important.

Quantile Regression Advantages No distribution assumptions: Severity: Gamma, Lognormal, or Pareto? Pure premium or loss ratio: Tweedie? Robust: Unlike OLS or GLM, it is robust in handling extreme value points and outliers for the target Insensitive (equi-variant) to any monotonic transformations of the target variable The regression coefficients do not vary by the capping on the target variable for most of the percentiles Comprehensive: A more complete picture of the relationship between the target and the covariates

Quantile Regression Theory OLS with a Least Square Error Function - The parameter estimates give the relationship between the mean value of the target against the covariates For OLS, the parameters estimates by minimizing the least square function is equivalent, asymptotically, to the parameters estimates by maximizing the normal likelihood function In general, the GLM types of modeling will predict the mean of the target variable given the covariates

} Quantile Regression Theory Quantile regression - Predict the th percentile, instead of the mean, of the target variable against the covariates. The th percentile of a random variable, Y is defined as: Conditional quantile function of Y given covariates of X: Let s start to predict the median, the 50 th percentile, then, instead of minimizing the least square error term, we will minimize the absolute error function (also known as L 1 regression): To further conduct the th quantile regression, we will minimize the following error function:

, Quantile Regression Theory Algorithms to Solve Quantile Regression: The error function for minimization can be transformed into the standard Linear Programming type of dual problems for minimization and maximization. Then, linear programming algorithms can be applied to solve the parameters for Quantile Regression: Simplex method: classical, less efficient, stable Interior point method: fast, may not converge Smoothing algorithm: fast, may not converge

, ; Quantile Regression Theory Confidence Interval Calculation for Quantile Regression: Since it is a non-parametric approach, no distribution function can be used to calculate the confidence interval Three alternative algorithms to estimate the confidence interval: Sparsity function: direct, fast, but not robust if data is not i.i.d. Inversion of Rank Tests: computation intensive due to using simplex algorithm. Markov Chain Marginal Bootstrapping: unstable for small data sets.

, Quantile Regression Background and Theory Summary of Quantile Regression: By definition, the target variable needs to be a continuous variable, not a categorical or mixed variable Minimizing a quantile error function More complete understanding of the impact of covariates on the dependent variable across the whole distribution, not just the mean of the dependent variable Uses Linear Programming algorithms to estimate the regression parameters Confidence intervals cannot be estimated with known distribution functions. Instead, several different algorithms can be used to estimate the confidence intervals Widely applied in other research areas Several statistical software can supports quantile regression, such as SAS (Proc Quantreg), R, Stata, and Matlab

Cast Study #1 BI Claim Severity Model

Cast Study #2 Loss Ratio Modeling

Conclusions: For the two case studies, the technique proves to be useful: More complete understanding of the impact of covariates on the target, especially toward the extreme ends of the distribution. Yield more stable, robust, and stronger results for the loss ratio modeling compared to OLS and GLM Can be another tool added to our modeling tool set Potential insurance applications in severity modeling, loss ratio modeling, reinsurance pricing, value-at-risk analysis, and capital allocation

Q&A