Session 5. Predictive Modeling in Life Insurance

SOA Predictive Analytics Seminar Hong Kong 29 Aug. 2018 Hong Kong Session 5 Predictive Modeling in Life Insurance Jingyi Zhang, Ph.D

Predictive Modeling in Life Insurance JINGYI ZHANG PhD Scientist Global Research & Analytics Agenda Overview of Predictive Modelling Techniques o What actuary already know - OLS o Generalized Linear Model o Decision Tree Model o Clustering Sharing - Predictive Modelling Projects Introduction - RGA Science Team

What Is Predictive Modeling & Analytics? 1 High-quality data 2 Modeling Statistical model 3 Prediction Business decisions Predictive modeling & analytics is about driving business outcomes What actuary already know Are you familiar with the following terms? Linear Regression Ordinary Least Square (OLS) Linear regression model Y target variable, X i predictor variable, error term/noise i parameters to be estimated Underlying assumptions for a valid Linear regression model Normality, ε ~ N(0, 2 ) Homogeneity, Y representative of population, Independence between observations Linearity

Ordinary Least Squares Ordinary Least Squares(OLS) β min min For a simple regression β, β β Identical to Maximum likelihood estimator More robust and consistent approach β m,, min ln,, min Use adj R 2 to compare fitness of models 1 Define 1, but it is biased Adjusted 1 1 min if normal distribution portion that has been explained by OLS model portion of TSS for the error 5 Why actuary did not use OLS Processes are inherently linear, or can be well-approximated by LM Effectiveness & Completeness OLS makes very efficient use of the data; good results with relatively small datasets Identical to maximum likelihood estimation Easy to understand and communicate theory is well-understood; Results are easy to communicate Great! But wait There are several issues with OLS Validation of assumptions - Normal w/ constant 2, independent, homogeneous Unbounded data, non-negative value How about insurance application? Distribution of data, variance structure Binomial for rate (mortality/lapse/uw, etc.), 2 ~ r(1-r) Poisson for claim count, ~ mean OLS is not applicable in insurance, but you already know lots about modeling 6

What actuary may not know Machine Learning & Statistical Techniques Random Forest XGBoost machine Gradient Boosting Ada Boosting Support vector machine Ensemble method Survey Analysis Feature engineering Non-Parametric Analysis Decision Trees (CART) Neural Networks / Deep Learning Bayesian Analysis Classification/Association Analysis of Variance Mixed Models Survival Analysis Cluster Analysis (e.g. K-Means) PM terminology Supervised vs. Unsupervised Learning Supervised: estimate expected value of Y given values of X. GLM, Cox, CART, MARS, Random Forests, SVM, NN, etc. Unsupervised: find interesting patterns amongst X; no target variable Y Clustering, Correlation / Principal Components / Factor Analysis Classification vs. Regression Classification: to segment observations into 2 or more categories fraud vs. legitimate, lapsed vs. retained, UW class Regression: to predict a continuous amount. Dollars of loss for a policy, ultimate size of claim Parametric vs. Non-Parametric Parametric Statistics: probabilistic model of data Poisson Regression(claims count), Gamma (claim amount) Non-Parametric Statistics: no probability model specified classification trees, NN 8

Generalized Linear Model Generalized Linear Model(GLM) Major focus of PM in insurance industry Include most distributions related to insurance Great flexibility in variance structure OLS model is a special case of GLM (Relatively) Easy to understand and communicate Multiplicative model intuitive & consistent with insurance practice 3 components Random component Systematic component link function 9 Generalized Linear Model Random component Observations Y 1,..., Y n are independent w/ density from the exponential family ;,, From maximum likelihood theory,, Each distribution is specified in terms of mean & variance Variance is a function of mean Normal Poisson Binomial Gamma InverseGaussian Name,,,, Range (-,+) (0,+) (0,1) (0,+) (0,+) b e ln(1+e ) ln 2 / e e /(1+e ) 1/ 2 / 1 1 10

Why distribution will affect results 9 8 7 9 8 7 GLM GLM Different Different Distributions C C 6 6 5 5 4 4 3 2 1 3 2 1 A A B B Normal Poisson Gamma Poisson Normal Gamma 0 0 0.5 0.5 1 1 1.5 1.5 2 2 2.5 2.5 3 3 Variance of different distributions Gaussian, constant Poisson, ~ mean Gamma, ~ mean^2 11 Generalized Linear Model Systematic component A linear predictor for observation i link function, random & systematic are connected by a smooth & invertible function Identity Log Logit Reciprocal ln ln 1 1/ 1/ Log is unique in insurance application - all parameters are multiplicative exp exp exp Consistent with most insurance practices Intuitively easy to understand and communicate 12

Generalized Linear Model Solve for parameters () by maximum likelihood Closed form for small data and simple model Iterative numerical techniques for large data set & complex model Use statistical analysis application, such as R Compare OSL and GLM Random Systematic Link OLS Normal only GLM Various distribution Great flexibility Various distribution, variance structure Prior weight and the credibility of data 13 Decision Tree Model Decision Tree - Classification And Regression Tree (CART) Both classification and regression Non-parametric approach (no insight in data structure) CART tree is generated by repeated partitioning of data set is split into two partitions (binary partition) Consider all possible values of all variables. Select the variable/value (X=t 1 ) that produces the greatest separation in the target. Partitions can also be split into sub-partitions (recursive) Until data in end node(leaf) is homogeneous (more or less) Results are very intuitive Identify specific groups that deviate in target variable Yet, algorithm is very sophisticated 14

Decision Tree Model Splitting Point Separation defined in many ways; different for regression & classification Regression Trees: use sum of squared errors Select X=t 1 such that max, Classification Trees: use measures of purity/impurity Intuition: an ideal tree model would produce nodes with only either class A or class B - completely pure nodes Gini Index - purity of a node 1 1 1, = probability of class i Entropy information index 1 1 Clustering Clustering algorithm Find similarities in data according to features in data & group similar objects into clusters Unsurprised (no pre-defined), classification, non-parametric How to measure similarities/dissimilarities, e.g. distance Numeric, categorical, and ordinal variables Partitioning (k-means), Hierarchical, etc. 16

Clustering Algorithm Partitioning algorithms - K-measn/k-medoids Maintain k clusters with k known; place points into their nearest cluster Hierarchical Objects are more related to nearby objects than to objects farther away; objects are connected by distance; how to define nearby object K-Means Algorithm 1. Select K points as initial centroids, with a given k 2. Repeat 3. Form K clusters by assign each points to its nearest centroid 4. Re-compute the centroids of each cluster 5. Until centroids do not change 17 Clustering Standardization / Normalization Values of variables may have different units Variable with high variability/range will dominate metric & lead to bias How to determine K Business reasons could dictate k Try different k, looking at the change in the average distance to centroid, as k increases; error falls rapidly until right k, then changes little 18

Clustering Comments on K-Means Strength: simple, very efficient & fast Weakness Applicable only when mean is defined, (categorical?) Need to know k in advance Unable to handle noisy data & outliers; sensitive to outliers Maybe sensitive to initialization Hierarchical clustering Bottom up or top down produce a dendrogram Important questions - how to represent a cluster of more than one point, & how to determine the nearness of clusters? Single Link: smallest distance between points Complete Link: largest distance between points Average Link: average distance between points Centroid: distance between centroids 19 Conclusion Advantage of actuary Industry knowledge - domain knowledge is a key in modeling process Expertise in data process - data is always #1 issue in data-driven application Unique position in data analytics Opportunity Solid foundation in statistics Education experience in modeling (OLS) Need to pick up new skills & thinking by education, training, and experience Actuaries can not miss it analytics is here to stay; it is changing insurance industry, and will fundamentally change how we run insurance business Actuaries could and should be on top of it and lead the change

Sharing - Predictive Modelling Projects Considerations Business Goals Objective is to support profitable growth of business Resources available & strong support from executives Sufficient quantity & high quality to support analytics Satisfactory data depth & width Able to obtain & capable to understand / clean data Environment Regulatory & privacy laws allow such data analytics Distribution channel can support data-driven solutions

Across the Value Chain As long as there is data, there is potential to capitalize on it Pre-sale Underwriting In-force management Claims High Predictive underwriting Preferred risk selection Crosssell/upsell Fraud/nondisclosure Multivariate analysis New rating factors Medium Propensity to apply & triggers Distributor quality control Propensity to complete purchase Underwriting triage Determine underwriting ratings Proactive lapse management Low Competitive pricing strategy Customer lifetime value Claims triage Level of client demand Customer Risk Scoring China Client would like to build customer risk score for their cancer product, which can predict the claim risk of the customer. Objectives To predict claim risk of customer To Improve customer experience for best risks with reduced UW & sales process To improve claim experience of existing customers, by identifying high risks Modeling & Lift Plot 6 statistically significant variables in model Claim risk of best group is less than half of their pricing assumption; the risk of worst group is about the double of their pricing assumption Two data source combined o Policy data o Claim data Modelled claim risk using wide range of rating factors & compared to pricing assumption

Bancassurance Predictive Underwriting - SEA A bank with a large customer base expressed a strong desire to increase sales penetration of their life product, while streamlining the underwriting process Objectives Simplified underwriting and sales process with high take-up for the best risks Reduce acquisition costs Increase protection sales and product penetration Two data sources combined: o Bank customer information at time of issue o Underwriting decision About 80 variables available for modeling: o Demographic data, bank and insurance product data, banking transaction data etc. Business Application and Lift Plot 11 statistically significant variables in model: o Branch, AUM, customer segment, credit card GIO for the best 20% risks; SIO for next best 20% Introduction - RGA Science Team

RGA Science Team Global Presence, Local Focus Global Research & Analytics The DS team collaborates with regional/local offices to focus on regional initiatives and local market projects We leverage local market knowledge to maximize data value & drive business outcomes Research Experience Analytics Strategy Science Regional Strategy Local Office Science team includes data scientists, actuaries and IT experts More than 50% of the team have a Ph.D. and the rest have master s degrees Work closely with UW, actuarial, admin and IT Global (15) Asia (6) RGA Science Team Who are We?

Thank You!