Making the Link between Actuaries and Data Science Simon Lee, Cecilia Chow, Thibault Imbert AXA Asia 2 nd ASHK General Insurance & Data Analytics Seminar Friday 7 October 2016 1
Agenda Data Driving Insurers 3 Actuaries Action on Data 8 2
Data Driving Insurers 3
Data-driven becomes the key priority of insurance companies BUILDING technological platforms using Big Data SUPPORTING business functions Big Data projects EXPLORING innovative opportunities to transform insurance Plenty of innovation project involving data Platforms Telematics Connected Homes Connected health Partnerships Customer Acquisition Customer value Maximization Customer retention Analytics Driven Operations Claims Cost Control Underwriting & Pricing Data Lake Permanent Connection Cross-sell/ upsell focus area for 2016 4
Data Innovation Lab at glance AN INTERNATIONAL TALENT POOL SPECIFIC METHODOLOGIES DATA! A TEAM OF SELECTED EXPERTS PLATFORMS & TOOLS 5
Data Driven Insurer: People & Organization Data Board IT Board Data Privacy Office Chief Data Office and Chief Information Office Data Privacy Officer Chief Data Officer Information Security Officer Business Business Transformer Data Owner Data Scientist Data Steward Data Architect Data Engineer Actuary Big data & Analytics Project Manager Actuaries are strongly competitive in these areas Software Engineer 6
Actuaries are highly familiar with the insurance data cycle Deployment Launch the project Visualize insights to stakeholders throughout the operation cycle Monitor execution and its compliance Measure KPIs Scoping Allocate the right amount of work necessary to complete a project successfully Analysis Deriving information and insights from structured and processed data Extraction & Preparation of Data Collection and preparation of data for analytic work including data modeling and data quality control 7
Actuaries Action on Data 8
Application of new techniques in the world of insurance Use the same language (R) to build statistical models and create compelling visualization and interactive tools to display your results 9
Work in collaborative teams with mixed profiles and multi-skills Coders and actuaries in business pricing teams working closely together Know how data is stored, updated and transmitted Technology. Apps. Building APIs Communicating insights to the business Strong programmers Analytical thinking Insurance knowledge & market dynamics Pricing Hackathon 10
LARGE CLAIMS DETECTION 11
Average cost Fire is a major claims for home insurance mainly driven by large claims Fire 6% 29% 100% Severe fire 3% 100% 67% Other 94% Other fire 97% 71% 33% Number of claims Claims cost Number of claims Claims cost Severe claims: amount > 75k Any highly risk profils? House x Owner Apartment x Tenant 0,2% of claims are explaining 19,4% of the total claims losses! House x Tenant Apartment x Owner Exposure Let s play! Frequency 12
Context Need for Analysis 0.2% of claims explaining 19.4% of the total claims losses Task Predict probability of a customer having a large fire claim in one year Use classical pricing variables and external data Use feature engineering and model fine tuning KPIs Gini Index Find the best Gini index available 1% lift Predict the riskiest 1% of customers 13
Initial Plan Two Build Approach Data Model Frequency and Propensity Separately Direct Severe Claims Modelling Collaboration Result 14
Development Frequency and Propensity Approach Frequency model of all fire claims Model Simple modelling techniques used GLMs in R Practical method which is well known by industry Propensity model for fire claims Very little data for severe claims More applied techniques required to ensure adequate model accuracy Initial GLM in R PCA used to identify external data to be included Direct Severe Claims Modelling Approach Data for severe claims is sparse Boosting model used in R using xgboost function Binary Logistic Stumps with learning rate 0.05 Min child rate of 333 Help avoid overfitting in situation of low rate Missing rates assigned 999 Helps manage missing values after converting them 15
External Data Propensity Model Weather Variables Data Aggregated before performing analysis PCA determined that moisture and temperature should be tested in the new model Geographical Variables Ranks and proportions derived for factors Do not provide additional improvements to model after weather data is added Do not improve model to the extent of external weather data when added before weather data 16
Frequency and Propensity Model Results Initial Internal Propensity Model Validation Propensity Model with Weather Data Validation Gini Coeff 0.12 Gini Coeff 0.14 Internal Frequency Model Validation Gini Coeff 0.27 Combined Model Validation Gini Coeff 0.18 17
Direct Severe Claims Model Boosting Model Including External Data Results Gini Coeff 0.31 Fire Claim Policies Large Fire Claim Policies 204 (206 claims) 15 (15 claims) Large Claim Rate 7.2% Fire Claim Loss 4,3 M Large Fire Claim Loss 3,3 M (76 %) Original Distribution of Fire Claims Severe fire 3% 100% 67% Other fire 97% 33% Number of claims Claims cost Severe claims: amount > 75k 18
FROM GBM TO GLM 19
WHY IS THIS TOPIC VALUABLE FOR PRICING INNOVATION? GLM = Generalized Linear Model g E Y X = β 0 + β 1 X 1 + + β m X m Advantages Interpretation all things being equal Familiar tool for actuaries Drawbacks Does not fit non-linear relations Does not automatically include interactions Does not automatically select features GBM = (Tree-based) Gradient Boosting Machine Advantages Detects interactions T g(e Y X ) = t=1 Feature selection through variable importance Able to fit non-linear relations Drawbacks No interpretation all things being equal Necessary fine-tuning of the model Tree X 1,, X m ; Θ t 20 20
Can we use the outputs of GBM to improve GLM models? Feature selection Variable importance Detection of interactions Interactions plot Features transformation Tree/glm/gam inversion 21
22