Using Advanced Analytics to Identify Fraud in Property and Casualty Insurance

Similar documents
CECL CONSIDERATIONS FOR CREDIT CARDS

Next-Gen Contract Management

COMBATING MEDICAL PROVIDER FRAUD WITH ADVANCED ANALYTICS USING MACHINE LEARNING

blockchain bitcoin cryptography currency Blockchain: The Next Big Digital Disruptor for CFOs cryptocurrency exchange transaction financial market

FIVE LEVELS OF DIGITAL DISRUPTION IN INSURANCE

Predictive Claims Processing

CHIEF TRANSFORMATION OFFICER: A NEW ROLE FOR TODAY S P&C INSURANCE CFO

Abstract. Estimating accurate settlement amounts early in a. claim lifecycle provides important benefits to the

IBM Financial Crimes Insight for Insurance

THE F FILES. Group benefits fraud what you need to know to fight fraud GET #FRAUDSMART

WHITE PAPER. The Evolution of Fraud in the Insurance Industry

Rethinking the business case for anti-fraud programs in insurance

Exl Reports 2017 First Quarter Results

Using analytics to prevent fraud allows HDI to have a fast and real time approval for Claims. SAS Global Forum 2017 Rayani Melega, HDI Seguros

Rapid returns for the insurance industry with Atos Fraud & Claims Management

INCREASING INVESTIGATOR EFFICIENCY USING NETWORK ANALYTICS

Implementing Analytics for Claims Fraud Title Investigation

Predictive Analytics: The Key to Profitability

The Stark Reality of Synthetic ID Fraud How to Battle the Leading Identity Fraud Tactic in The Digital Age

Investigating Life Insurance Fraud and Abuse: Uncovering the Challenges Facing Insurers

FIGHTING AGAINST CRIME IN A DIGITAL WORLD DAVID HARTLEY DIRECTOR, SAS FRAUD & FINANCIAL CRIME BUSINESS UNIT

Using alternative data, millions more consumers qualify for credit and go on to improve their credit standing

Yield and. Protection. Anti-Fraud. Uncover insights you never thought were possible.

Practical Considerations for Building a D&O Pricing Model. Presented at Advisen s 2015 Executive Risk Insights Conference

Identifying High Spend Consumers with Equifax Dimensions

Banking Title Application Fraud: The Enemy at the Gates

Detecting Fraud, Improving Customer Retention and Optimizing Workflow through Predictive Analytics

Using data mining to detect insurance fraud

Increase Effectiveness in Combating VAT Carousels

Data Mining Applications in Health Insurance

EXL Reports 2017 Second Quarter Results

Digital insurance: How to compete in the new digital economy

Overview of the Key Findings

We are experiencing the most rapid evolution our industry

Get Smarter. Data Analytics in the Canadian Life Insurance Industry. Introduction. Highlights. Financial Services & Insurance White Paper

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

MWSUG Paper AA 04. Claims Analytics. Mei Najim, Gallagher Bassett Services, Rolling Meadows, IL

DriverRisk Guide: Violation insight to fuel your business

Credit Card Default Predictive Modeling

FIGHTING FRAUD & CHARGEBACKS 5 STRATEGIES FOR WINNING

International Journal of Computer Engineering and Applications, Volume XII, Issue IV, April 18, ISSN

Leveraging Innovative Technologies to Combat Health Care Fraud Kathy Mosbaugh, Director State Government Health Care NCSL Fall Forum 2011

Commercial Insurance

Analytic Technology Industry Roundtable Fraud, Waste and Abuse

Property & Casualty Insurance: Fighting Fraud Through Location Analytics

Machine Learning Applications in Insurance

Making Predictive Modeling Work for Small Commercial Insurance Risk Assessment

Implementing the Expected Credit Loss model for receivables A case study for IFRS 9

The State of Third Party Auto: Claim Costs, Consistency and a New Generation of Adjusters

What is arguably the biggest mystery faced by anyone

AI Strategies in Insurance

In-force portfolios are a valuable but often neglected asset that

White Paper. Banking Application Fraud: The Enemy at the Gates. It is a fraud to borrow what we are unable to pay. Publilius Syrus, first century B.C.

A new highly predictive FICO Score for an uncertain world

Operational Excellence / Transformative Strategies for Insurers

The role of an actuary in a Policy Administration System implementation

The future of operational risk in financial services A new approach to operational risk capital management

Corporate Overview. April 2017

Analytic measures of credit capacity can help bankcard lenders build strategies that go beyond compliance to deliver business advantage

HEALTHCARE PAYER ANALYTICS

Accelerating Revenue with Customer Centric Offers

Session 5. Predictive Modeling in Life Insurance

REPORT TO THE NATIONS ON OCCUPATIONAL FRAUD AND ABUSE 2016 SOUTHERN ASIA EDITION

Scoring Credit Invisibles

How Can YOU Use it? Artificial Intelligence for Actuaries. SOA Annual Meeting, Gaurav Gupta. Session 058PD

Research Report. Insurance Companies: Are You Equipped to Successfully Combat Fraud? A SAS Research Report

Attract and retain more high-quality customers while reducing your risks.

Why your PSP should be your best defence against fraud

A new era of P&C claims management

Driving Growth with a New Measure of Credit Capacity

ADVANTAGES OF A RISK BASED AUTHENTICATION STRATEGY FOR MASTERCARD SECURECODE

Moderator: Missy A Gordon FSA,MAAA. Presenters: Missy A Gordon FSA,MAAA Roger Loomis FSA,MAAA

THE PERSISTENCE OF POVERTY IN NEW YORK CITY

Making the Link between Actuaries and Data Science

How much can increased predictive power impact profits?

Canada s Participation in the Global Innovation Economy: Patterns of Entry and Engagement in Asia and Beyond

UK Motor Insurance Insights: Managing the challenges of digital risk

Telematics Usage- Based Insurance

White Paper. Not Just Knowledge, Know How! Artificial Intelligence for Finance!

Deconstructing report lag

Statistical Data Mining for Computational Financial Modeling

Predicting and Preventing Credit Card Default

The Digital Investor Patterns in digital adoption

Active vs. Passive Money Management

The Hanover Insurance Group

SEGMENTATION FOR CREDIT-BASED DELINQUENCY MODELS. May 2006

Construction. Industry Advisor. Fall Year end tax planning for construction companies. How to self-insure your construction business

In a credit-hungry economy, how much is too much?

Advanced analytics and the future: Insurers boldly explore new frontiers. 2017/2018 P&C Insurance Advanced Analytics Survey Results Summary (Canada)

Consumer access to mortgages report

Digital Disruption of the Insurance Industry

See how these companies overcame their auto lending challenges and were able to:

KYC Automation: Scale, Speed, Standardize Merchant Underwriting

An Innocent Mistake or Intentional Deceit? How ICD-10 is blurring the line in Healthcare Fraud Detection

Session 84 PD, Predictive Analytics for Actuaries: A look At Case Studies in Healthcare Analytics. Moderator: Carol J.

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Active vs. Passive Money Management

EZ Way Lunch & Learn Webinar Series Presented by Equitable Safety Group. Making Cents. The Business Case for Safe Patient Handling November 13, 2008

Finacial Statement Fraud. Peter N Munachewa, CFE Risk Management Consultant

Automotive Services. Tools for dealers, lenders and industry service providers that drive profitable results in today s economy

Transcription:

AN EXL WHITE PAPER Using Advanced Analytics to Identify Fraud in Property and Casualty Insurance Neeraj Sibal Senior Manager, Analytics lookdeeper@exlservice.com

The Coalition Against Insurance Fraud (CAIF) estimates that insurance fraud costs insurers $80B annually. In the Property & Casualty (P&C) sector alone, fraud related losses are estimated to be $30B annually. While fraud is constantly evolving and affects all types of insurance, the most common types in terms of frequency and average cost are: automobile insurance, workers compensation and health and medical insurance. With advancements in the field of computing, analytics is now playing a greater role in helping insurance companies identify fraudulent claims; in some cases identifying areas of concern even before the fraud occurs. The role data plays in today s market varies for each insurer, as each weighs the cost of improving information systems against losses caused by fraud. Using advanced analytics, it is possible to implement efficient fraud detection strategies. This paper is dedicated to demonstrate how and why advanced analytics can assist in identifying and decreasing the number of fraudulent claims. Advanced Analytics in Fraud Detection Insurance fraud poses a major threat to both insurers as well as policyholders. As fraud creates significant financial cost, insurers pass on this cost to policyholders in the form of increased premiums. As a result, some consumers and businesses, who could earlier purchase some kind of insurance coverage, are now driven out of the market. Who is committing fraud? Anyone including applicants, policyholders, third-party claimants and professionals who provide services and equipment to claimants could be committing fraud. Fraudulent claims come in many forms. This includes requesting coverage for: Injuries or damage that never occurred or services never rendered or equipment never delivered Inflated claims Staged accidents Misrepresentation of facts on an insurance application With fraudsters using sophisticated techniques that make it difficult for insurers to tell fraudulent claims from honest ones, there has recently been a rapid upsurge in fraudulent activities. As per a survey released in September 2013 by FICO, 35 percent insurers estimated that insurance fraud costs represent 5-10 percent of their total claims, while 31 percent said the EXLservice.com 2

cost is as high as 20 percent. The everchanging economic circumstances pose yet another challenge for insurers trying to deal with fraud. Businesses and individuals are showing an increased investment appetite and this money is now falling into the hands of fraudsters. To combat fraud, it is imperative for companies to invest in fraud monitoring programs. As a result insurers are relying heavily on data analytics for fraud prevention, detection and management. Traditionally, fraud detection strategies focused on identifying fraud after the claim was paid, however it is easier to reduce losses if the fraud is identified before the claim is paid. The first step towards fraud prevention is to employ techniques such as predictive modeling and claims scoring that uncover fraud before a payment is made. Since insurers have large amounts of data, it makes sense to use advanced analytics techniques to evaluate internal and external data for identifying claims with higher propensity of being fraudulent. By detecting patterns and anomalies in a large database, analytical tools can determine probable characteristics of a fraudster and the need of investigating a claim further. Advanced analytics techniques like logistic regression and Gradient Boosting Model (GBM) can often identify potential cases of fraud at the First Notice of Loss (FNOL), instead of waiting weeks or months for an adjuster to review a claim. This is just one of the ways in which it can be used. Since advanced analytics is capable of assessing the propensity of an individual to engage in fraud before it has occurred, it can also be used in the sales and underwriting process before a policy is sold. The use of advanced analytics has reportedly decreased losses incurred due to fraud for a few carriers by 20 to 50 percent. One of the biggest challenges for limiting fraud loss has been associated with the need for significant resources to examine the high volume of claims. In our experience, insurers only have the capability of reviewing 5 to 10 percent of all the claims in a financial year which are typically pulled based on less than selective criteria. But what if you were able to pull the right claims? That s where advanced analytics comes in. Analytics Framework for Detection of Fraud Claims Data analytics can add incremental value in identifying fraud at each stage of the claim process. These stages include: First Notice of Loss (FNOL) is the stage at which the claimant first notifies the insurer that a loss has occurred. EXLservice.com 3

First Contact (FC) is the stage at which the insurer contacts the claimant, after FNOL, asking for more information about the loss that has occurred. Ongoing (OG) is the continuous back and forth of information between the claimant and insurer after FC until the claim is closed. This flow of information makes it more relevant to carry out a robust identification of potential fraud right at the first stage; the next two stages can be used to appropriately lead investigations in a particular direction. To demonstrate how this works, we have provided the case study of a U.S. based insurer s plausible fraud prediction project along with a stair-step approach towards data analytics proposed by us. Please note insurer data for the case study spans from 2005 to mid-2013 and pertains to property in commercial and personal lines of business. Step 1: Collating the Right Data Regardless of the insurance product, there are multiple facets in play, thus making it important to find the right data. To uncover the factors indicating fraudulent behavior, an exhaustive data sourcing exercise was undertaken, considering both internal as well as external data. Internal data comprised of information on attributes around claims, policies, customers, claimants, etc. while external data i.e., data not captured by insurer was gathered from different resources. This included data such as regional demographics, industrially accepted standard scores, information pertaining to weather conditions that prevailed when the loss occurred, as well as information on catastrophes that may have occurred during the time period of interest. In this particular case, there were 21 internal data tables and 13 external data tables for more than 1,500 variables. These data tables were woven together at a claim level to create a master dataset. The variables were classified depending on their availability at independent claim stages and made available accordingly for the three stages of model building. Of the 1,500 variables nearly 100 variables were created using text-mining techniques, which can be a cumbersome process as it generates volumes of data, mostly in the form of text-fields. Text-mining tools are used to convert unstructured data into structured fields, which can be used along with conventional data fields. A 3-grams text mining approach, that uses three word phrases, was used to identify the variables because word phrases have higher predictive power in comparison to individual words. It was found that all variables showing up as significant in the First Contact (second) stage of the model building exercise had origins in unstructured data. EXLservice.com 4

Step 2: Applying Analytics Techniques Once variables were identified, we chose two techniques - logistic regression and GBM to identify fraudulent claims. While both had their pros and cons, they provided valuable insights into the data. Eventually, we compared both techniques to arrive at a conclusion as to which one worked better for the project at hand. A. Logistic Regression Logistic regression is a statistical method for analyzing a dataset with one or more independent variables that determine an outcome. This predictive analytics technique produces an outcome that is measured with a dichotomous variable (which has only two possible outcomes). Plausible fraudulent claims are a rare event; almost less than 1 percent of all claims. As logistic regression underestimates the probability score in case of rare events, in order to ensure unbiased results, an oversampled data set was created where the event rate was >=5 percent. Since the flow of information was in three stages - FNOL, FC and OG. Residual modeling technique was employed for logistic regression; logistic score from one stage appears as an offset variable in the subsequent stage. The advantage of using this method is that the information gains, which happen in one stage, are carried forward to the subsequent stage. The insurer s head of claims, underwriting and customer service count on achieving a certain percentage of straight-through processing (STP), decreasing the time from FNOL to claim resolution. The claims that don t pass the FNOL stage can be flagged as risky and require further assessment, therefore as claims move across the stages we have more information on whether they are fraud or genuine. B. Gradient Boosting Model (GBM) GBM is a machine learning technique that aims to improve a single model by fitting many models and combining them for prediction. In GBM, the need to create an over-sampled data doesn t arise, and modeling exercise can be performed by gradient boosting of classification trees. Since GBM doesn t support sequential modeling, a parallel model development approach was followed at each of the three stages - FNOL, FC and OG. EXLservice.com 5

Step 3: Running the Analysis and Analyzing Results Before running the analysis for logistic regression, a standard approach for variable selection was carried out. Starting from elimination of variables based on fill rates, to correlation and clustering analysis, followed by step-wise selection in logistic procedure using SAS. Further shortlisting was done to get rid of multi-collinearity, if any. For GBM, however, no such treatment was required. The table given below shows a comparative analysis of both the techniques: The three stage-wise logistic models for FNOL, FC as well as OG underperformed in comparison to GBM, in terms of lift and K-S statistics. However, the precision of all three logistic models, was better in comparison to GBM. Higher precision (fewer false positives) in fraud detection takes priority over other performance variables in evaluating claim fraud; as this helps insurance companies optimize fraud detection, decrease costs and increase recovery. Therefore, in this particular case, logistic regression scores over GBM. Which Fraud Identification Approach Is Better - Logistic Regression Or GBM? The two techniques applied have different algorithms running in their background. Logistic regression involves human intervention at multiple stages. GBM, on the other hand, is purely based on a machine learning algorithm, which requires minimal human intervention. Owing to this, a logistic regression analysis can be controlled to a fair degree, but has to follow certain assumptions. Stage 1- First Notice of Loss Stage 2- First Contact Stage 3- Ongoing Test Validation Test Validation Test Validation Logistic GBM Logistics GBM Logistic GBM Logistic GBM Logistic GBM Logistic GBM Lift @ 10% 61.47% 81.90% 68.08% 81.30% 82.96% 92.50% 87.85% 92.30% 89.03% 97.90% 91.99% 98.20% Lift @ 20% 81.35% 88.50% 84.36% 89.80% 93.44% 95.60% 94.46% 95.20% 95.43% 99.10% 95.97% 99.10% Precision 67.98% 18.90% 20.66% 16.60% 74.47% 25.30% 29.37% 24.90% 79.73% 40.80% 31.45% 39.60% AUC 89.80% 92.90% 89.78% 92.70% 95.75% 96.90% 95.57% 96.80% 97.04% 99.30% 96.88% 99.30% K-S 64.78% 72.40% 65.00% 72.2% 79.13% 84.20% 78.34% 83.90% 83.29% 92.90% 82.77% 92.80% EXLservice.com 6

In terms of output, GBM produces a scored dataset. This scored dataset has a probability value for all observations present in the training/modeling dataset with a variable importance list (sorted in descending order with most important variable on the top). GBM reveals nothing about the direction or magnitude of the effect of any predictor variable on the predicted variable. However, this technique offers the advantage of being able to handle both, non-linearity and nonmonotonicity, which are quite common in fraud identification problems. Logistic regression delivers a mathematical equation as output. This equation can explain the effect of every predictor variable on the predicted variable, which proves to be an advantage of using logistic regression. On the flip side, regression drops observations with missing variable values from the modeling exercise, or requires these values to be imputed. GBM requires no such treatment, and is also equipped to handle multi-collinearity on its own, unlike logistic regression that needs multicollinearity to be identified and dealt with by introduction of interactions or removal of certain variables. The interaction terms and variable transformation for logistic regression are subjected to discretion of the person building the model, whereas, GBM itself introduces and tests for interactions and variable transformation. GBM does variable selection for fraud identification on a run-time basis, which takes a lot more processing time as compared to logistic regression. Logistic regression, on the other hand, requires an extensive variable selection/elimination process involving steps like elimination on the basis of correlation value, variable clustering, information value and stepwise procedure. It is apparent that GBM holds an edge over logistic regression, as in fraud detection, linearity is not expected in the data. That said, businesses rely heavily on insights from the mathematical equation generated by logistic regression. To implicate any changes, in the way a firm carries out its business, logistic regression gives flexibility in terms of a clear understanding of causeand-effect relationship that exists between dependent and independent variables. But, this doesn t imply that logistic regression fares better than GBM in every situation. Based on what a business aims to achieve from an analytics exercise, an appropriate technique out of the two can be chosen, as both of these techniques have their own merits and limitations. EXLservice.com 7

Ways to Manage Fraudulent Claims in Your Organization Organizations should not go by predefined fraud variable lists based on historical understanding of the insurance industry, but should leverage predictive modeling for identifying fraudulent claims specific to their organization. In the case discussed above, logistic regression revealed some new variables influencing fraud claims from both internal as well as external databases. In general, in the P&C industry, a fraudulent claim consists of the following characteristics: The presence of a large number of contents damaged/missing Contents missing/stolen typically include electronic items like mobile phones, laptops, cameras, etc. Issues pertaining to coverage of the reported loss Proximity of the claimant to a service center or claim vendor Increased chances of claim arising from a new policy as compared to an old policy Presence of keywords like mysterious, disappearance, lost, unknown, strange or undetermined in the loss description field The number of times a claimant had been arrested in the past for any criminal activity The logistic based analysis confirmed the above factors to be driving fraud claims. Apart from the above mentioned factors, a lot of other factors came up to be significant in the model we developed. These new fraud indicator factors were: Number of utility cards that a claimant has - this shows a negative impact on the propensity of filing a fraudulent claim Presence of terms like, divorced, separated or domestic dispute has a very strong positive relation with a claim being fraud Incase description contains words like, water leak or pipe burst ; the chances of it turning out to be a fraudulent claim goes down significantly Presence of terms such as, theft, broken or police report tend to increase the chances of it being a fraudulent claim Conclusion An effective and efficient notice of loss organization, combined with fast-track claim handling during FNOL, can allow insurers to realign claim resources with more complex claim-handling activities. In this whitepaper we explored two approaches logistic regression and GBM, separately, to understand their effectiveness and interactions with the data at hand. But, it isn t necessary that both these techniques be used independently; stacking logistic regression on top of GBM might beat the results of just using GBM on the dataset. Use of logistic regression and GBM is not mutually exclusive; analysts EXLservice.com 8

can use both of them in tandem as well. Logistics can benefit by identifying trends and finding out causalities; machine learning models like GBM can be used to gain greater performance from a particular dataset. Various other approaches can be considered, which are beyond the scope of this whitepaper. In the end, based on methods used, the client experienced results equivalent to the improvement showcased in the model, which indicated lift at 10 percent for FNOL to be 61.47 percent. This means, we can identify 61.47 percent of all fraud claims by using only 10 percent of the data. Therefore, exploring such varied advanced approaches based on business objectives, can and did enable greater success in claim fraud identification and likely saved the company from loss. Not only did they stop payment of fraudulent claims, they did not have to incur the additional cost of litigation necessary to recover lost funds. The roadmap for successful transformation is clear. Insurers must move on from today s outdated fraud strategies to advanced data analytics. References 1 http://www.naic.org/cipr_topics/topic_insurance_ fraud.htm 2 http://www.iii.org/issue-update/insurance-fraud 3 http://www.insurancenexus.com/fraud/role-dataand-analytics-insurance-fraud-detection 4 http://www2.deloitte.com/content/dam/ Deloitte/us/Documents/financial-services/us-fsia-call-to-action-080912.pdf 5 http://www.irs.gov/uac/examples-of-money- Laundering-Investigations-Fiscal-Year-2013 6 http://www.port.ac.uk/media/contacts-anddepartments/icjs/ccfs/profile-insurancefraudster.pdf 7 http://www.bentley.edu/centers/sites/www. bentley.edu.centers/files/centers/cwb/olinsky. pdf 8 http://www.medicine.mcgill.ca/epidemiology/ joseph/courses/epib-621/logfit.pdf 9 https://home.kpmg.com/uk/en/home/ insights/2017/01/uk-fraud-value-reaches-1bnfirst-time-five-years.html 10 http://deloitte.wsj.com/cio/2012/10/02/its-rolein-fighting-insurance-fraud/ EXLservice.com 9

EXL (NASDAQ: EXLS) is a leading operations management and analytics company that designs and enables agile, customer-centric operating models to help clients improve their revenue growth and profitability. Our delivery model provides market-leading business outcomes using EXL s proprietary Business EXLerator Framework, cutting-edge analytics, digital transformation and domain expertise. At EXL, we look deeper to help companies improve global operations, enhance data-driven insights, increase customer satisfaction, and manage risk and compliance. EXL serves the insurance, healthcare, banking and financial services, utilities, travel, transportation and logistics industries. Headquartered in New York, New York, EXL has more than 26,000 professionals in locations throughout the United States, Europe, Asia (primarily India and Philippines), South America, Australia and South Africa. 2017 ExlService Holdings, Inc. All Rights Reserved. For more information, see www.exlservice.com/legal-disclaimer Email us: lookdeeper@exlservice.com On the web: EXLservice.com GLOBAL HEADQUARTERS 280 Park Avenue, 38th Floor, New York, NY 10017 T: +1.212.277.7100 F: +1.212.277.7111 United States United Kingdom Czech Republic Romania Bulgaria India Philippines Colombia South Africa