HEALTH ACTUARIES AND BIG DATA

Similar documents
Using Predictive Analytics to Better Understand Morbidity

Data Analytics and Unstructured Data Actuaries 2.0

Predictive Modelling. Document Turning Big Data into Big Opportunities

Disability Income Data What do we need to capture and analyse now?

Making the Link between Actuaries and Data Science

Stock Market Predictor and Analyser using Sentimental Analysis and Machine Learning Algorithms

Problems with Current Health Plans

Moderator: Missy A Gordon FSA,MAAA. Presenters: Missy A Gordon FSA,MAAA Roger Loomis FSA,MAAA

FIGHTING AGAINST CRIME IN A DIGITAL WORLD DAVID HARTLEY DIRECTOR, SAS FRAUD & FINANCIAL CRIME BUSINESS UNIT

Joint Committee Discussion Paper on the Use of Big Data by Financial Institutions. IFoA response to European Securities and Markets Authority

Better decision making under uncertain conditions using Monte Carlo Simulation

Data analytics making fitter life insurers

Methodology to assess the cost impact of PMB benefit definitions

How Can YOU Use it? Artificial Intelligence for Actuaries. SOA Annual Meeting, Gaurav Gupta. Session 058PD

InsurTech HUB România

AI Strategies in Insurance

Session 84 PD, Predictive Analytics for Actuaries: A look At Case Studies in Healthcare Analytics. Moderator: Carol J.

Machine Learning Applications in Insurance

Health analytics for informed decision-making

Die Sicht des Regulierers auf Big Data mit Fokus auf den Verbraucherschutz

Session 5. Predictive Modeling in Life Insurance

AI for Quality & Risk Management

Industry survey - Big Data thematic review

UPDATED IAA EDUCATION SYLLABUS

The Cost of Specialty Drugs: Payer Perspectives

Genomics and Insurance. 30 November 2018 Julie Scott Underwriting Proposition Manager Munich Re UKLB Tel: +44 (20)

The Information Commissioner s response to the Financial Conduct Authority s call for inputs on big data in retail general insurance

Predicting, and preventing costblooms. Nigam Shah, MBBS, PhD

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

Session 5. A brief introduction to Predictive Modeling

White Paper. Demystifying Analytics. Proven Analytical Techniques and Best Practices for Insurers

Get Smarter. Data Analytics in the Canadian Life Insurance Industry. Introduction. Highlights. Financial Services & Insurance White Paper

Privacy Policy. HDI Global SE - UK

Machine Learning Automation: A Game-Changer for the Insurance Industry

Big Data Analytics and Insurance

THE APPLICATION OF AI IN ENTERPRISE FOR IMPROVED PERFORMANCE, INNOVATION & CUSTOMER EXPERIENCE.

Actuarial. Predictive Modeling. March 23, Dan McCoach, Pricewaterhouse Coopers Ben Williams, Towers Watson

Using data mining to detect insurance fraud

CAMPUS CAREERS INVESTMENT GROUPS BUILD STRATEGIES

Practical Considerations for Building a D&O Pricing Model. Presented at Advisen s 2015 Executive Risk Insights Conference

Managing Data for Analytics. April 14, 2015

Data Driven Decision Making

undiscovered opportunities insurance analytics Advanced analytics for insurance

Big Data, Small Data, Medium-sized Data

Health chain. Vidushi Savant, MD Stanford University / Savant Care 401 Quarry Rd Stanford, CA 94305

Building the Healthcare System of the Future O R A C L E W H I T E P A P E R F E B R U A R Y

A t S + b r t T B (h i + 1) (t S + t T ) C h i (t S + t T ) + t S + b t T D (h i + n) (t S + t T )

Article from The Modeling Platform. November 2017 Issue 6

2017 Predictive Analytics Symposium

FINDING THE GOOD IN BAD DEBT BEST PRACTICES FOR TELECOM AND CABLE OPERATORS LAURENT BENSOUSSAN STEPHAN PICARD

LIFE INSURANCE: THE CASE FOR CHANGE

Who are we? Why do we collect and use your personal information?

Optimizing the actuarial modeling environment

EY Law Privacy & Security Update (Oceania)

Implementing the Expected Credit Loss model for receivables A case study for IFRS 9

Chubb 4D: Power of Predictive Analytics

MANUAL OF UNIVERSITY POLICIES PROCEDURES AND GUIDELINES. Applies to: faculty staff students student employees visitors contractors

Leveraging Real-World Data and Analytics in the Device Industry. Tom Abbott Head, Healthcare Informatics Medical Device & Diagnostics

Producing actionable insights from predictive models built upon condensed electronic medical records.

Managing longevity risk

Detecting and Preventing Fraud, Waste and Abuse: Using Analytics to Help Improve Revenue and Services

IBM Phytel Cloud Services

The Dynamic Cross-sectional Microsimulation Model MOSART

Big Data - Transforming Risk and Insurance. Driving Change

Implementing behavioral analytics to drive customer value: Insurers cannot afford to wait.

Examining the patient journey: Case studies using linked health insurance claims and clinical data

Thriving through shared-value

Session 3. Life/Health Insurance technical session

Kingdom of Saudi Arabia Capital Market Authority. Investment

IBM Watson Care Manager Cloud Service

FUND TECHNOLOGY & DATA, NORTH AMERICA 2017

Fiscal Implications of Chronic Diseases. Peter S. Heller SAIS, Johns Hopkins University November 23, 2009

Expanding Predictive Analytics Through the Use of Machine Learning

Artificial intelligence and customer protection in the banking and insurance industries

Stock Prediction Using Twitter Sentiment Analysis

Machine Learning and the Insurance Industry Prof. John D. Kelleher

Based on the audacious premise that a lot more can be done with a lot less.

1st Seminar on Data Science & Analytics 21st July 2018 Changing Landscape of the Actuarial Profession

Proposed Revisions to IVSC Exposure Draft: The Valuation of Equity Derivatives

Innovation in the insurance and automotive sector

A Guide to Healthcare Buzzwords and What They Mean: Part One (A through L)

2018 Predictive Analytics in Healthcare Trend Forecast

Session 2. Leveraging Predictive Analytics for ERM

Session 40 PD, How Would I Get Started With Predictive Modeling? Moderator: Douglas T. Norris, FSA, MAAA

GLOBAL IFRS 17 READINESS ASSESSMENT Q4 2017

Smart Eye Interim Report 1 January 30 September 2017

Advanced analytics and the future: Insurers boldly explore new frontiers. 2017/2018 P&C Insurance Advanced Analytics Survey Results Summary (Canada)

FXCM Webinar Series on Algorithmic Trading

Telematics Usage- Based Insurance

Scenic Video Transcript End-of-Period Accounting and Business Decisions Topics. Accounting decisions: o Accrual systems.

POC Insurance Claims Prediction. vlife Use Case

FE501 Stochastic Calculus for Finance 1.5:0:1.5

The Business Continuity Blueprint. A practical guide to. business continuity planning. PART 1 An Introduction

Part II 2011 Syllabus:

Predictive Analytics in Life Insurance. Advances in Predictive Analytics Conference, University of Waterloo December 1, 2017

Privacy Notice. Our Hastings Direct SmartMiles policy has a separate privacy notice which can be found here.

GLOBAL IFRS 17 READINESS ASSESSMENT

Insurance 2020 & Beyond

MACHINE LEARNING IN INSURANCE

Health Plan Comparison Tool

Transcription:

HEALTH ACTUARIES AND BIG DATA What is Big Data? The term Big Data does not only refer to very large datasets. It is typically understood to refer to high volumes of data, requiring high velocity of ingestion and processing and involving high degrees of variability in data structures (Gartner, 2016). Given this, Big Data involves large volumes of both structured and unstructured data (the latter referring to free text, images, sound files, and so on), from many different sources, requiring advanced hardware and software tools to link, process and analyse such data at great speed. [Ian s comment: The term big data is misleading and I don t like using it. After all, in healthcare, we are used to using very large datasets my sense is bigger than those used by life, pension or GI actuaries. Some other practice areas refer to big data sets that are relatively small by healthcare standards. For this reason I prefer structured vs. unstructured. The second issue that arises is the tension between those who (like me) are practitioners of traditional statistical approaches and models, and those that practice machine learning. I suspect more actuaries fall into the first camp rather than the second. A visitor from Stanford who gave a seminar here last year referred to the statistical approach as: propose a hypothesis; search for data; test hypothesis; depending on results, refine hypothesis. The machine learning approach: find data; hook up machine; spin through the data; develop a hypothesis to explain the findings. The problem with the second approach is its replicability. Emile s comment: it seems as if definitions are getting clearer in the literature I d be reluctant to confine it only to unstructured data. My understanding: it is the combination of a much higher volume of structured and unstructured data from multiple sources linked in a database that can handle queries at high speed, to a much greater extent than what was possible even a few years ago. The definition of machine learning is actually a lot less clear, in my experience. Some people classify normal linear regression as a form of machine learning although that is probably not what most people understand by it! I agree that there are some machine learning applications that are opaque, and where results are hard to explain (leading to the temptation to develop a hypothesis to explain the findings), but this is by no means, in my experience, the only way to apply the technology. Some Big Data techniques involve traditional statistical approaches, but just with much more unstructured data incorporated, and with rapid and multiple iterations of model fitting with complete transparency on why you get the answers you re getting.] Data is rapidly increasing in volume and velocity because of developments in technology, involving many more sensors, which constantly generates streams of data. Examples of this include fitness or wellness tracking devices, car tracking devices and medical equipment. People also contribute to the rapid expansion of data, primarily in the form of social media and online interactions. Furthermore, IBM estimates that at least 80% of the world s data is unstructured (Watson, 2016), in the form of text, images, videos and audio. This data may contain valuable unique insights for an organisation, enabling it to more effectively meet customers needs and answer queries in real time, among other applications. However, such a data environment is very different to the sets of structured data in tables that actuaries and other analysts are used to analysing, and requires investments in hardware, software and skills. 1

[Ian s comment: The volume of sensor-based data creates major problems for modelers in terms of distinguishing signal from noise. I suppose that, as actuaries, we have training (and our experience) to rely on that helps us to do this with traditional data (we are able to distinguish, for example, when conditions warrant changing pricing or reserving assumptions). The problem with the streams of data is that we lack the types of algorithms to organize granular data into something that is reportable or understandable. (Good example; in the 1990s modelers developed grouper algorithms to group together the 15,000 diagnosis codes into more manageable and useful condition categories. But we lack this type of grouping for a lot of the clinical data that are generated, and models to determine when a trend in a clinical observation is becoming critical (as opposed to the achievement of a particular value). Emile s comment: this is rapidly changing I ve seen what appears to be very powerful tools to interpret, analyse and understand unstructured clinical data (e.g. doctor s notes, as well as pathology and radiology reports). This may have been an issue a while ago, but it is no longer the case, in my view.] The increase in volume, velocity and variability have increased the demand for processing power and Big Data typically cannot be stored and be analysed in traditional systems. To handle Big Data, organisations typically have to introduce large scale parallel processing systems. This allows organisations to store vast amounts of data of all types on low cost commodity hardware, and query and analyse the data in near real-time by parallelising operations that were previously done on a single processor. In addition, the software required for this is often open source and freely available, for example Apache Hadoop, Apache Spark and Cloudera. This reduces the cost of storing large volumes of data and reduces the barriers to entry from a direct cost perspective. Distributed parallel computing allows a single task to be performed on multiple computers, which reduces the processing time, as shown in the following diagram. 2

(Code Project, 2009) If an organisation implements systems that enable it to access and store large quantities of data, that is, however, only the first step. According to Gary King, Big data is not about the Data (King, 2016) - while the data may be plentiful, the real value and complexity emerges from the analysis of this data, and, beyond that, a responsive operational environment that allows for the application of analytical insights. At the same time, an analyst cannot ignore the complexities of the data: how it was generated, how it is coded, what types of coding errors and missing values are included and how to address any data problems. Understanding the data itself, and its sources and limitations remains critical in understanding the outputs of any modelling exercise. Actuaries role in Big Data The ability to analyse and interpret unstructured data requires advanced analytical and programming skills. The term data scientist refers to an individual possessing specific skills in analysing and delivering 3

actionable insights from Big Data. In particular, Drew Conway defines data scientists as people with skills in statistics, machine learning algorithms and programming, who also have domain knowledge in the field (Drew, 2013). Machine learning automates analytical model building by using algorithms that iteratively learn from data. This allows computers to find hidden insights without being explicitly programmed where to look. (SAS, 2016) Actuaries have a rich grounding in traditional statistics and its correct application in the evaluation of insurance and other financial risks. Actuaries also have deep knowledge on the insurance and financial services environment. These two skills coupled with the ability to solve a variety of problems have earned actuaries a niche role in modelling and analysing data in insurance. However, for actuaries to enter into and compete in the world of Big Data, they require new programming and non-traditional analytical skills and techniques, beyond the traditional areas of survival models, regression, GLM, and data mining techniques. Actuaries will therefore be required to develop these skills themselves, or be familiar with the tools and their applications and work in multidisciplinary teams where their domain knowledge can be applied with the most advanced data science tools. Either way, some familiarity with the power of new data handling technologies (particularly in respect of unstructured data) will help actuaries to understand and identify the opportunities that Big Data provides. Why is Big Data particularly relevant to healthcare actuaries? Actuaries within the healthcare industry have access to many potential sources of data which could provide insight into risks and opportunities, much of which weren t available before. These new sources of data, in addition to claims and demographic data, include data generated by fitness devices, wellness devices, medical equipment (including diagnostic devices), as well as social media. This may be generated by policyholders, patients, health providers (e.g. doctor notes written on an Electronic Health Record), or by diagnostic or other medical equipment (e.g. x-rays, MRIs, blood test results). Some sources of data did not exist before, such the mapped genomes of patients, in the context of personalised medicine. This data can have a variety of applications in health insurance, but of course also raises many questions about the way in which insights flowing from such data are applied, and the risks posed by the mere existence of it. (Feldman, Martin, & Skotnes, 2012) 4

Healthcare actuaries are closely involved in the management of healthcare risks. Historically, healthcare actuaries have managed this risk through of a combination of underwriting, pricing, benefit design and contracting with providers. However, through the use of Big Data, actuaries are starting to develop unique insights into how behavioural factors affect healthcare outcomes. For example, the success rate of a particular treatment may be dependent on the genetic profile of a patient and their level of fitness. The personalisation of medicine requires new data to enter electronic health records, with the aim of choosing far more appropriate treatment for individual patients, and hence potentially significantly improving health outcomes and therefore mortality and morbidity. (insert reference to our Personalised Medicine paper when available) For instance, knowledge of an individual s genome allows doctors to better match the most effective cancer drug with the individual patient (Garman, Nevins, & Potti, 2007). This may lead to considerable savings in the healthcare industry and reduce wastage on incorrect treatment. In some environments, health insurers are the custodians of electronic health records. To the extent that the information mentioned above enters the health record, it would, in theory, be available to health insurers. If this is the case, it could be applied in very effective ways to make relevant information available to treating doctors, and hence improve health outcomes. On the other hand, such data is of course very sensitive and privacy considerations are very important. However, to the extent that new sources of medical data are not available to insurers, either because they are legally prevented from requesting it, or, even if they ask for it, it is withheld by potential policyholders, there are clear risks of adverse selection in purchasing health or life insurance. In some jurisdictions, it is not clear that insurers would have any rights to access genetic information, or other health record information that may be relevant to underwriting, and this may create significant risk. It is also relevant that much of this data can be used to drive behaviour change in the interest of better health outcomes. For instance, capturing more data on clinical outcomes and augmenting it with geolocation data of the insured and provider, allow for high quality provider networks to be created, and insured patients may be incentivised or directed to use healthcare providers who provide higher quality treatment. At a member level, any data on wellness activities (whether in the form of preventative screenings, exercise or nutrition) may be used to incentivise and reward wellness engagement, which in turn reduces healthcare costs for those that respond to such incentives. Determining the optimum level of rewards and wellness activity is an actuarial problem which can be solved if multiple sources of wellness and health data is shared with an insurer. Text mining doctors notes on claims or health records can also provide additional information, over and above the procedure and ICD codes that would typically be obtained from the claim. This will provide additional information on the complexity of the procedure and the stage of the disease, which will assist in analysing the success rate of treatment provided. It may also be used to determine the case mix of patients visiting a provider, which may be used in the context of provider profiling, and which in turn gives insights into quality and efficiency of treatments provided. Big Data can also be used to provide insight into the incidence and spread of disease within a population, perhaps even before individuals access healthcare facilities. For example, Google have used the number and type of searches to produce current estimate of Flu and Dengue fever in a particular area (Google Flu Trends, 2016), although with varying rates of success. The initial model built by Google failed to account for shifts in people s search behaviour and therefore became a poor predictor over 5

time. Further work has been done by Samuel Kou which allows the model to self-correct for changes in how people search and this has led to more accurate results (Mole, 2015). This data can provide an understanding of the spread of disease within a population, which can potentially be used as an early warning to identify a potential increase in claims and demand for healthcare resources before it occurs. Healthcare actuaries have unique domain knowledge, which means that they are in a position to practically apply these non-traditional data sources to solve problems and seek opportunities. Big Data has the potential to enhance the healthcare industry, through enabling wellness programmes to operate effectively, personalising treatments, and improving the allocation of healthcare resources to reduce wastage in the system. Actuaries also tend to have a better understanding of financial risk than other professionals, and hence their understanding of risk is critical to finding the correct application of Big Data tools in insurance. There are many concerns about privacy, data security and the ways in which data is used, that must be addressed before data is applied in practice. Patient and doctor permission, depersonalisation of data for analytical purposes, failsafe access control to sensitive data, and an ethics and governance framework for evaluating the application of insights to practical problems, must all be in place. Health actuaries need to evaluate the regulatory requirements and the ethics of Big Data applications. At the same time actuaries should also consider the risk implications of their organisations not having access to data that exists, and how these risks can be managed. So what should healthcare actuaries do? Healthcare actuaries need to identify the importance and value of Big Data within their organisations and invest in the appropriate technology infrastructure, analytical tools and skills. Investing in the data may include the purchasing of data from external providers, systems development to extract and collect the data that an organisation currently has access to, as well as classifying the data within the system so that it can be used in analysis. Technology required to process and analyse this data includes both a parallel processing hardware system as well as the software required to operate this system. Most of the software required is open source and is thus freely available, however the organisation will likely not have the necessary skills to set up the system and will therefore require the use of an external provider. The organisation will also need to invest in the skills required to interpret this data, either by encouraging actuaries to develop the skills, or by employing multi-disciplinary teams involving data scientists. With improvements in technology and techniques to store, process and extract value from Big Data, it is clear that Big Data is very relevant to healthcare actuaries, whether such data is available to their organisations or not. The many ethical and legal questions that this environment gives rise to will also have major implications for actuarial risks, and actuaries should therefore be active participations in debates and finding solutions to the complex issues arising from it. 6

References Code Project. (2009, April 19). Retrieved from Code Project: http://www.codeproject.com/articles/35671/distributed-and-parallel-processing-using-wcf Drew, C. (2013, March 26). Retrieved from http://drewconway.com/zia/2013/3/26/the-data-sciencevenn-diagram Feldman, B., Martin, E., & Skotnes, T. (2012, 10). Retrieved from https://www.scribd.com/document/107279699/big-data-in-healthcare-hype-and-hope Garman, K., Nevins, J., & Potti, A. (2007). Genomic strategies for personalized cancer therapy. Human Molecular Genetics, 226-232. Gartner. (2016, October 18). Retrieved from http://www.gartner.com/it-glossary/big-data/ Google Flu Trends. (2016, October 18). Retrieved from https://www.google.org/flutrends/about/ King, G. (2016). Retrieved from http://gking.harvard.edu/files/gking/files/prefaceorbigdataisnotaboutthedata_1.pdf Mole, B. (2015, September 11). Retrieved from http://arstechnica.com/science/2015/11/new-flutracker-uses-google-search-data-better-than-google/ SAS. (2016, October 25). Retrieved from http://www.sas.com/en_us/insights/analytics/machinelearning.html Watson. (2016, May 25). Retrieved from https://www.ibm.com/blogs/watson/2016/05/biggest-datachallenges-might-not-even-know/ 7