Session 3. Life/Health Insurance technical session

Similar documents
Areas AI will transform insurance in years. Cecilia Chow, Head of Sales, Key Accounts, JOS

Operational Excellence / Transformative Strategies for Insurers

Get Smarter. Data Analytics in the Canadian Life Insurance Industry. Introduction. Highlights. Financial Services & Insurance White Paper

Based on the audacious premise that a lot more can be done with a lot less.

An Introduction to Opinion Mining and its Applications. Ana Valdivia Granada, 17/11/2016

THE F FILES. Group benefits fraud what you need to know to fight fraud GET #FRAUDSMART

Mortgage Lender Sentiment Survey

AI Strategies in Insurance

Streamline and integrate your claims processing

The Influence of News Articles on The Stock Market.

INSURTECH OUTLOOK. Executive Summary september 2016

Predictive Modelling. Document Turning Big Data into Big Opportunities

Predictive Analytics in Life Insurance. Advances in Predictive Analytics Conference, University of Waterloo December 1, 2017

Text Mining Part 2. Opinion Mining / Sentiment Analysis. Combining Text procession with Machine Learning

DIGITAL OUTLOOK INSURANCE INDUSTRY

Stock Prediction Using Twitter Sentiment Analysis

How Can YOU Use it? Artificial Intelligence for Actuaries. SOA Annual Meeting, Gaurav Gupta. Session 058PD

Session 2. Leveraging Predictive Analytics for ERM

At the Heart of Redefining Insurance

Do Media Sentiments Reflect Economic Indices?

InsurTech HUB România

Stock Market Predictor and Analyser using Sentimental Analysis and Machine Learning Algorithms

THE BLOCKCHAIN DISRUPTION. INSIGHT REPORT on Blockchain prepared by The Burnie Group

Implementing behavioral analytics to drive customer value: Insurers cannot afford to wait.

Session 2A: Risk Management Perspective in Predictive Modeling. Moderator: Mark W. Griffin, FSA, CERA

Industry survey - Big Data thematic review

An Innocent Mistake or Intentional Deceit? How ICD-10 is blurring the line in Healthcare Fraud Detection

The importance of regulating in the FinTech s world for the protection of consumers

Real-Time Text Analytics for Event Detection in the Financial World

Improving your customer s experience through Streamlined Underwriting

The role of data analytics in present and future claims cost containment

Predictive Claims Processing

Accenture Business Journal for India Digital Insurance: How new technologies are changing the rules of the game for a traditional industry

The future of operational risk in financial services A new approach to operational risk capital management

Outline. Consumers generate Big Data. Big Data and Economic Modeling. Economic Modeling with Big Data: Understanding Consumer Overdrafting at Banks

Classifying Press Releases and Company Relationships Based on Stock Performance

THE TRANSFORMATION OF INSURANCE

INSURANCE INNOVATION EXECUTIVE BOARD

Bond Pricing AI. Liquidity Risk Management Analytics.

Building the Healthcare System of the Future O R A C L E W H I T E P A P E R F E B R U A R Y

1st Seminar on Data Science & Analytics 21st July 2018 Changing Landscape of the Actuarial Profession

Copyright 2008 Congressional Quarterly, Inc. All Rights Reserved. CQ Congressional Testimony SUBCOMMITTEE: DISABILITY ASSISTANCE AND MEMORIAL AFFAIRS

Exploring the Potential of Image-based Deep Learning in Insurance. Luisa F. Polanía Cabrera

Methods for Retrieving Alternative Contract Language Using a Prototype

SESSIONS MASTERCLASS A - 29/01/2018. Healthcare Insurance Forum

OPENING THE GATEWAY TO A SMART INSURANCE FUTURE WITH DIGITAL

Real-time Driver Profiling & Risk Assessment for Usage-based Insurance with StreamAnalytix

Technological Innovations: Challenges for Insurance Supervisors

Data analytics making fitter life insurers

Advanced analytics and the future: Insurers boldly explore new frontiers. 2017/2018 P&C Insurance Advanced Analytics Survey Results Summary (Canada)

Insurance in the digital era: use cases

HEALTH ACTUARIES AND BIG DATA

Digital Disruption of the Insurance Industry

The Digital Insurer: Creating a Blueprint for the Future

United States: Evolving toward Next-Level Taxpayer Service. Accenture Digital Taxpayers Research and Insights

Implementing AI for Life and Health Insurers A Case Study on digital intervention

Whitepaper. Authors : Paresh Vartak, Deepesh Jain

FIVE LEVELS OF DIGITAL DISRUPTION IN INSURANCE

Submissions must confirm the following additional requirements:

Fundamentals of Machine Learning for Predictive Data Analytics

Predictive Analytics: The Key to Profitability

The role of an actuary in a Policy Administration System implementation

NAVIGATING THE DIGITAL JOURNEY IN INSURANCE

Building the Vision: A Look into the Future of an Efficient Insurance Data & Analytics Market

The agent of the future

Future Trends 2017: The Shift Gains Momentum

Information Retrieval

MACHINE LEARNING IN INSURANCE

Improving the Way We Ask What You Do? An Enabler of Self-Serve for Commercial Lines Property/Casualty Insurance

Machine Learning Applications in Insurance

Sentiment Extraction from Stock Message Boards The Das and

Leveraging Real-World Data and Analytics in the Device Industry. Tom Abbott Head, Healthcare Informatics Medical Device & Diagnostics

undiscovered opportunities insurance analytics Advanced analytics for insurance

IBM Financial Crimes Insight for Insurance

FIGHTING AGAINST CRIME IN A DIGITAL WORLD DAVID HARTLEY DIRECTOR, SAS FRAUD & FINANCIAL CRIME BUSINESS UNIT

InFocus. Insurance regulation and technology: Adding business value to compliance

Small Business Lending Landscape

Digital Footprint Data is an indispensable tool for all innovative lenders that helps reduce the most common mistakes all lenders make:

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)

G LO B A L TA L E N T T R E N D S S T U D Y

Exploiting Market Sentiment to Create Daily Trading Signals

VIEW POINT. Big data analytics: New whistleblower on insurance fraud

Digital insurance: How to compete in the new digital economy

Prediction Algorithm using Lexicons and Heuristics based Sentiment Analysis

THE COMPUTER VISION ADVANTAGE FOR INSURANCE CLAIMS E-BOOK

Harnessing the 'Bang': from Digital Disruption to Digital Transformation

FIGHTING FRAUD & CHARGEBACKS 5 STRATEGIES FOR WINNING

UNDERSTAND & PREDICT CONSUMER BEHAVIOUR WITH TRENDED DATA SOLUTIONS

Predictive Analytics in Insurance Getting it right when your customers need you most

BIG DATA TO THE RESCUE: WALKING THE FINE LINE BETWEEN CLAIMS AND FRAUD

Data Analytics and Unstructured Data Actuaries 2.0

Building Blockchain Solutions

International Journal of Computer Engineering and Applications, Volume XII, Issue IV, April 18, ISSN

Yuri van Geest September 2018

2017 Predictive Analytics Symposium

Digital Strategy

Central Depository Services (India) Limited

Fighting Fraud in Financial Services: three success stories

Data Abundance and Asset Price Informativeness

Computer Algorithms & Trading. Chicago NW Burbs Investment & Trading Club

Transcription:

SOA Big Data Seminar 13 Nov. 2018 Jakarta, Indonesia Session 3 Life/Health Insurance technical session Anilraj Pazhety

Life Health Technical Session ANILRAJ PAZHETY MS (BUSINESS ANALYTICS), MBA, BE (CS) Data Innovation Specialist Asian Markets 1 Agenda Big Data in Life Insurance Natural Language Processing (NLP) Convert text to machine readable format Model Framework for a text classifier NLP Applications in Life and Health Insurance 2

Big Data in Life Insurance Life insurers are lagging behind when it comes to embracing the benefits of big data Life insurers collect a substantial amount of data during the application process Insurer Customers Application form Post Issuance : Limited customer interaction Limited Data during policy life cycle Changing dynamics due to increase in customer touch points CRM Insurer Customers IoT Social Networks 3 Data sources being sought and used within advanced analytics applications Desirable Attributes Relevant Granular Voluminous Accurate Timely Potential data sources Credit bureau Bank transactions Credit card transactions Marketing Non-life Wearable devices Social media Retail transactions Mobile usage Life assurance Permissioned 4

Natural Language Processing(NLP) 5 Natural Language Processing (NLP) in Life / Health Insurance NLP aims to develop algorithms which process human language Written or Oral Raw text data is available from a wide range of sources in Life / Health insurance Medical records Websites Prescriptions Customer Care Chatbots Agent Notes BIG potential for insurers to leverage data from these sources to derive information which can drive intelligent data analysis and improved decision making For achieving any level of artificial intelligence it is imperative to have machines to process text data 6

Text Pre processing Text is the most unstructured form of data and hence needs pre processing to transform it into intelligible format Raw Text Remove Stop words and Punctuations Tokenization Lemmatization / Stemming Clean Text Language : The, of, was, are, is Location : Hong Kong, Jakarta Time / Numeral : Weekdays, Year Breaking a sentence into single words (Tokens) Suggested Paracetamol three times a day Suggested Paracetamol Three Times A Day Domain Specific 7 Text Pre processing Text is the most unstructured form of data and hence needs pre processing to transform it into intelligible format Raw Text Remove Stop words and Punctuations Tokenization Lemmatization / Stemming Clean Text The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form Stemming Car, Cars, Car s Car Lemmatization Am, are, is Be Source: https://nlp.stanford.edu/ir book/html/htmledition/stemming and lemmatization 1.html 8

Document Term Matrix (DTM) Most basic component in text analytics Machines understand only numbers. DTM is a numeric representation for a given text after tokenization Documents Term1 Term 2 Term 3 Term 4 Term 5 Doc 1 Doc 2 Doc 3 Doc 4 Doc 5 A two dimensional matrix whose rows are terms and columns represent each document. Hence, each entry (i, j) corresponds to term i in document j 9 Term Frequency (TF) matrix Simple technique to identify relevance of a word in a given document The more frequent the word is the more relevance the word holds in the document Inverse Document Frequency (IDF) matrix Based on the principle that less frequent words are more meaningful log 10

Term Frequency (TF) Inverse Document Frequency (IDF) TFIDF matrix Product of TF and IDF If a word appears multiple time in a document then it should be more meaningful than other words BUT if a word appears many times in a document but also in many other documents then it may be a stop word or a frequent word in that particular domain Ngrams It is just a sequence of N words Unigrams Bigrams Trigrams Exercise Exercises daily Yoga exercises daily Increase in Information 11 Word Embedding Vector Space Models The underlying idea is to represent documents as matrices or arrays This facilitates to represent documents geometrically Ativan Codeine Prescription 1 7 9 Ativan P1 7,9 P3 Prescription 2 10 2 Prescription 3 9 6 9,6 P2 Enables document comparison mathematically 0,0 10,2 Codeine 12

Comparing Documents Ativan P1 Definition of Cosine Similarity. 7,9 P3 cos cos 9,6 P2 10,2 Cosine Similarity (P1,P2) = Cosine Similarity (P1,P3) = (7 10) + (9 2) = 0.75 (7 9) + (9 6) 0,0 Codeine = 0.94 13 Statistical models using NLP Text Clustering Given a set of text, the model creates clusters of similar words Topic modeling Given a set of documents,identifies the different topics within each document and across documents Text Summarization Given a long sequence of paragraphs, returns a short summary consisting of key points Sentiment Analysis Identifies sentiments based on the context and meaning of words 14

Model Development Overview Text Classification A technique to classify a document into one or more categories It can be used to detect presence of certain words, filter documents based on keywords etc. Natural Language Classifier Text Preprocessing Model Training Tokenization Feature Extraction Patient Records, Prescriptions, User Reviews etc. Label Machine Learning Algorithm(s) 15 NLP Applications in Life / Health Insurance 16

NLP for Document Digitization NLP techniques are being used to speed up the process of digitization of medical records Digitization is crucial for businesses to advance into modern age today Scan OCR Name Entity Tagging Application forms UW Systems Remove irrelevant Text Reduce errors and omissions introduced by manual data entry Increase accessibility, communication and collaboration, free up a lot of space and more importantly save MONEY! 17 Word Embedding Applications NLP techniques used to understand biological sequences like DNA and RNA Protein structures are similar to human language in terms of composition Hence, researchers are treating protein sequences as text and using existing NLP techniques to study them These techniques are similar to the approaches used in NLP to identify relationship between words in a given sentence or between sentences in a given document Word Embedding (vectors) are used to represent biological sequences over a large set of sequences, and establish physical and chemical interpretations for such representations Citation: Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics Asgari E, Mofrad MRK (2015). PLoS ONE 10(11): e0141287. https://doi.org/10.1371/journal.pone.0141287 18

Word Embedding Applications NLP techniques used to understand biological sequences like DNA and RNA These algorithms accept the whole protein structures (structure alignment) as text and parse the sequence to search for corresponding patterns (sequence alignment). The results of these alignments are traditionally presented in a form of color coded one dimensional sequential information Each row represents a unique protein structure https://www.ncbi.nlm.nih.gov/pmc/articles/pmc5333176/ 19 NLP for Claims Processing NLP techniques can be used on a real time basis to optimize claims processing The underlying concept would be similar to those used by virtual assistants like Apple s Siri, Amazon s Echo etc. Claims Notification Train model based on data fields required for claim processing Identify client s speech and fill out relevant fields on the claim application Scan email text for relevant content and fill out the claim application form Improve customer service levels and enhance customer satisfaction Reduce the time required for claims processing by accelerating the time required to gather and analyze information from different sources 20

NLP for Fraud Detection NLP techniques can be used to detect fraudulent claims Identify common phrases and / or descriptions of incidents from multiple claimants Unstructured data sources include claim forms,applications, notes etc.to flag claims with suspicious text or patterns It might be difficult even for a trained human eye to spot such patterns after going through a tons of claim applications NLP based model would help to eliminate inconsistency and subjectivity and reduce the time required to flag potential fraudulent claims 21 NLP for Underwriting NLP techniques for extracting medical information relevant to underwriting Unstructured Data Physician Notes Clinical Observations Medical History Lab Results Help automate clinical decisions by taking into account text from various sources Identify patients with higher risk at a faster rate Enable physicians to derive effective treatment methods based on comprehensive patient data Understand context, grammar and automate decision making taking into account medical jargons, custom abbreviations, tone etc. 22

Final Thoughts 23 Future view on data science in life insurance Factors for growth Increasing volumes of quality data and data products available Global demand for personalized offerings and ease of transactions Growth of direct to consumer offerings Monetization of data assets Head winds Major financial successes yet to be demonstrated Effort in data cleaning, manipulation, modelling More onerous data protection legislation (explicit consent, profiling) Cyber risk Risks of data being lost, corrupted or stolen 24

Questions? 25 26