And The Winner Is? How to Pick a Better Model Part 1 Introduction to GLM and Model Lift Hernan L. Medina, CPCU, API, AU, AIM, ARC 1 Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics described in the programs or agendas for such meetings. Under no circumstances shall CAS seminars be used as a means for competing companies or firms to reach any understanding expressed or implied that restricts competition or in any way impairs the ability of members to exercise independent business judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of antitrust regulations, to prevent any written or verbal discussions that appear to violate these laws, and to adhere in every respect to the CAS antitrust compliance policy. Motivation Models that appear to be strong may have weaknesses Fit may not be good Model may be overfit Wrong distribution may have been chosen Results may not be stable across data subsets or over time Results may be highly influenced by several records Model may underperform the status quo Model development is a major investment, and before implementing a model, we should fully understand its strengths and weaknesses 3 1
Some Models Used by Actuaries Exponential Regression ln(y) = a + bx + ε Linear Regression Y = a + bx + ε Minimum Bias Procedures Generalized Linear Models g(y) = a + X T B + ε 4 Understanding & Validating a Model Model Lift o How well does the model differentiate between best and worst risks? o Does the model help prevent adverse selection? o Is the model better than the current rating plan? Goodness of Fit o What kind of model statistics are available, and how do you interpret them? o What kind of residual plots should you consider, and how do you interpret them? o What are some considerations regarding actual versus predicted plots? Internal Stability o How well does the model perform on other data? o How will the model perform over time? o How reliable are the model s parameter estimates? 5 Model Lift Ability to differentiate between low and high cost policyholders o Sometimes called the economic value of the model Tools for measuring and illustrating model lift o Simple Quantile plots o Double lift charts o Gini index o Loss ratio charts 6 2
Model Lift Simple Quantile Plots Creating a quantile plot o Use holdout sample. o Sort data based on predicted value (frequency, severity, loss cost). o Subdivide sorted data into quantiles (quartiles, quintiles, deciles) with equal weight (exposure, claim count). o Calculate average actual value and predicted value for each quantile and index to overall average. 1.8 Actual Model 7 Model Lift Simple Quantile Plots 1.8 Sorted by Loss Costs Underlying Current Rates Actual Current 1.8 Sorted by Model s Predicted Loss Costs Actual Model 8 Model Lift Double Lift Charts Creating a double lift chart o Sort data by ratio of model prediction to current premium o Subdivide sorted data into quantiles with equal exposure o For each quantile calculate average actual loss cost, average model predicted loss cost and the average loss cost underlying the current manual premium o Index the quantile averages to the overall averages. Actual Model Current 9 3
Economics The Gini Index Gini coefficient or Gini ratio o Named after Corrado Gini 100 Measure of income inequality 80 o Horizontal axis = percentage of country s population 60 o Vertical axis = percentage of country s income 40 o A = Area between line of equality and Lorenz Curve 20 o B = Area beneath Lorenz Curve o Gini index = A / ( A + B ) 0 A B 0 20 40 60 80 100 Line of Equality Lorenz Curve 10 Model Lift Simple Gini Index Adapting to car insurance o Assume claim frequency = 5% The perfect model o Prediction = actual loss, which is $0 for 95% of exposures o Sort holdout data set by model prediction. o Horizontal axis = percentage of total car years o Vertical axis = Percentage of total losses o Gini Index = A / ( A+ B ) is very high 100 80 60 A 40 20 B 0 0 20 40 60 80 100 Line of Equality Lorenz Curve 11 Model Lift Simple Gini Index A real model o Prediction = expected loss cost > 0 for each policyholder o Sort holdout data set by prediction. o Horizontal axis = percentage of total car years o Vertical axis = Percentage of total losses o Gini Index = A / ( A+ B ) A high Gini index implies o There is loss cost inequality o The model reflects it well 100 80 60 A 40 B 20 0 0 20 40 60 80 100 Line of Equality Lorenz Curve 12 4
Model Lift Simple Gini Index Gini Index measures how well the model classifies policyholders Exercise o Assume Model X prediction = expected loss cost o Assume Model Y prediction = 0.5 (Model X prediction) o Assume Model Z prediction = 2.0 (Model X prediction) o Which model has the highest Gini index? o Hint sort holdout data set by prediction. o Gini index summarizes lift in one number, but is not a goodness of fit measure. Model A has a Gini index of 15.9 and B has a Gini index of 15.4 o Is that difference significant, or is it just a quirk of the holdout data? 13 Model Lift Loss Ratio Charts 1 Lift charts and Gini index o May be unfamiliar to some stakeholders Loss ratios o Widely used in the industry Actual Loss Ratio 80% 70% 60% 50% 40% 30% Ranking by predicted loss cost o Rank data into quantiles by 20% predicted model loss cost 10% o Calculate loss ratio for each quantile 0% Predicted Loss Cost Decile 14 Model Lift Summary Simple Quantile plots o Illustrate how well the model helps prevent adverse selection o Compare against current rating plan or other model Double lift charts o Compare competing models o Compare new model against current rating plan Simple Gini Index o Summarizes model lift into one number Loss ratio charts o Puts lift in context most people in insurance industry can understand o Can be distorted by redundancy or inadequacy of current rating plan 15 5
References Anderson, Duncan, et. al., A Practitioner s Guide to Generalized Linear Models, CAS Discussion Paper Program, 2004, pp. 1-116. Bailey, Robert A. and LeRoy J. Simon, Two Studies in Automobile Insurance Ratemaking, PCAS XLVII, pp. 1-19. Bailey, Robert A., Insurance Rates with Minimum Bias, PCAS L, pp. 4-13. Brown, Robert L., Minimum Bias with Generalized Linear Models, PCAS LXXV, pp. 187-217. Feldblum, Sholom and Eric Brosius, The Minimum Bias Procedure A Practitioner s Guide, CAS Forum, Fall 2002, pp. 591-684. Medina, Hernán L., Towards Multivariate Ratemaking: Claim Frequency Analysis Examples, CAS eforum, Winter 2011, Vol 2. Mildenhall, Stephen, A Systematic Relationship Between Minimum Bias and Generalized Linear Models, PCAS LXXXVI, pp. 393-487. Venter, Gary G., Discussion of Minimum Bias with Generalized Linear Models, PCAS LXXVII, pp. 337-349. Werner, Geoff and Claudine Modlin, Basic Ratemaking, Casualty Actuarial Society, Fourth Edition, October 2010. 16 6