MODELLING HEALTH MAINTENANCE ORGANIZATIONS PAYMENTS UNDER THE NATIONAL HEALTH INSURANCE SCHEME IN NIGERIA

Similar documents
Machine Learning Performance over Long Time Frame

Journal of Internet Banking and Commerce

The Role of Cash Flow in Financial Early Warning of Agricultural Enterprises Based on Logistic Model

LOGISTIC REGRESSION OF LOAN FULFILLMENT MODEL ON ONLINE PEER-TO-PEER LENDING

Financial Risk Tolerance and the influence of Socio-demographic Characteristics of Retail Investors

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time

ECS171: Machine Learning

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Econometric Computing Issues with Logit Regression Models: The Case of Observation-Specific and Group Dummy Variables

To be two or not be two, that is a LOGISTIC question

Mutual Funds Action Predictor. Our product platform

The Relationship among Stock Prices, Inflation and Money Supply in the United States

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty

Profit-based Logistic Regression: A Case Study in Credit Card Fraud Detection

Does austerity harm health?

COMPREHENSIVE ANALYSIS OF BANKRUPTCY PREDICTION ON STOCK EXCHANGE OF THAILAND SET 100

Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt*

Credit Card Default Predictive Modeling

Confusion in scorecard construction - the wrong scores for the right reasons

Session 57PD, Predicting High Claimants. Presenters: Zoe Gibbs Brian M. Hartman, ASA. SOA Antitrust Disclaimer SOA Presentation Disclaimer

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

Wage Determinants Analysis by Quantile Regression Tree

Financial Performance Determinants of Organizations: The Case of Mongolian Companies

Explicit Description of the Input Data for the Program CRAC 2.0 Used in the Applications of the Credibility Theory

Factors in the returns on stock : inspiration from Fama and French asset pricing model

Data Mining: A Closer Look. 2.1 Data Mining Strategies 8/30/2011. Chapter 2. Data Mining Strategies. Market Basket Analysis. Unsupervised Clustering

Keywords Akiake Information criterion, Automobile, Bonus-Malus, Exponential family, Linear regression, Residuals, Scaled deviance. I.

The analysis of credit scoring models Case Study Transilvania Bank

2018 Predictive Analytics Symposium Session 10: Cracking the Black Box with Awareness & Validation

International Journal of Economics and Finance Vol.1, Issue 2, 2013 EFFECT OF COMPETITION ON THE LOAN PERFORMANCE OF DEPOSIT

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

Claim Risk Scoring using Survival Analysis Framework and Machine Learning with Random Forest

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

CAS antitrust notice CAS RPM Seminar Excess Loss Modeling. Page 1

Comparison of Logit Models to Machine Learning Algorithms for Modeling Individual Daily Activity Patterns

PERCEIVED FINANCIAL LITERACY AND SAVINGS BEHAVIOR OF IT PROFESSIONALS IN KERALA

CSC 411: Lecture 08: Generative Models for Classification

Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS)

LOGISTIC REGRESSION ANALYSIS IN PERSONAL LOAN BANKRUPTCY. Siti Mursyida Abdul Karim & Dr. Haliza Abdul Rahman

Performance and Economic Evaluation of Fraud Detection Systems

CREDIT SCORING & CREDIT CONTROL XIV August 2015 Edinburgh. Aneta Ptak-Chmielewska Warsaw School of Ecoomics

PERFORMANCE COMPARISON OF THREE DATA MINING MODELS FOR BUSINESS TAX AUDIT

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

A case study on using generalized additive models to fit credit rating scores

IMPACT OF BANK SIZE ON PROFITABILITY: EVIDANCE FROM PAKISTAN

Financial Distress Prediction Using Distress Score as a Predictor

A Comparison of Univariate Probit and Logit. Models Using Simulation

STA 4504/5503 Sample questions for exam True-False questions.

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

Psychological Factors of Voluntary Retirement Saving

Predicting Defaults with Regime Switching Intensity: Model and Empirical Evidence

Lecture 21: Logit Models for Multinomial Responses Continued

CHAPTER II THEORITICAL BACKGROUND

Racial/Ethnic Disparities Related to Health Insurance Coverage, Access to Care and Ease in Health Care Services among Children in 2012 CCHAPS Data

EFFECTS OF DEBT ON FIRM PERFORMANCE: A SURVEY OF COMMERCIAL BANKS LISTED ON NAIROBI SECURITIES EXCHANGE

Previous articles in this series have focused on the

Pension fund investment: Impact of the liability structure on equity allocation

Comparability in Meaning Cross-Cultural Comparisons Andrey Pavlov

The evaluation of the performance of UK American unit trusts

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

The Loans_processed.csv file is the dataset we obtained after the pre-processing part where the clean-up python code was used.

Predictive Modeling: An Optimized and Dynamic Solution Framework for Systematic Value Investing. R.J. Sak. Abstract

A Statistical Analysis to Predict Financial Distress

NBER WORKING PAPER SERIES RISK AND RISK MANAGEMENT IN THE CREDIT CARD INDUSTRY

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

Calculating the Probabilities of Member Engagement

An Evaluation of Nonresponse Adjustment Cells for the Household Component of the Medical Expenditure Panel Survey (MEPS) 1

Strategies for Assessing Health Plan Performance on Chronic Diseases: Selecting Performance Indicators and Applying Health-Based Risk Adjustment

Predictive Model for Prosper.com BIDM Final Project Report

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY

Mortality Rates Estimation Using Whittaker-Henderson Graduation Technique

Expert Systems with Applications

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

Estimation of a credit scoring model for lenders company

Asian Economic and Financial Review, 2014, 4(10): Asian Economic and Financial Review

The Effects of Liquidity Management on Firm Profitability: Evidence from Sri Lankan Listed Companies

Z-score Model on Financial Crisis Early-Warning of Listed Real Estate Companies in China: a Financial Engineering Perspective Wang Yi *

Modeling and Forecasting Customer Behavior for Revolving Credit Facilities

Technical Appendix: Protecting Open Space & Ourselves: Reducing Flood Risk in the Gulf of Mexico Through Strategic Land Conservation

CAPITAL BUDGETING AND RISK MANAGEMENT IN SMALL AND MEDIUM ENTERPRISES

Machine Learning and Artificial Neural Network Process Viability and Implications in Stock Market Prediction

DOES ORGANIZATIONAL GROWTH CONTRIBUTE TO PROFITABILITY? EVIDENCE FROM MALAYSIAN PUBLIC LISTED COMPANIES

FINANCIAL INSTABILITY PREDICTION IN MANUFACTURING AND SERVICE INDUSTRY

Asian Journal of Empirical Research

Impact of Unemployment and GDP on Inflation: Imperial study of Pakistan s Economy

The Credit Research Initiative (CRI) National University of Singapore

UAE BANKS FINANCIAL MERIT DIAGNOSIS USING DUAL- CLASSIFICATION SCHEME

HOUSEHOLDS INDEBTEDNESS: A MICROECONOMIC ANALYSIS BASED ON THE RESULTS OF THE HOUSEHOLDS FINANCIAL AND CONSUMPTION SURVEY*

MS&E 448 Final Presentation High Frequency Algorithmic Trading

ADAPTIVE CREDIT SCORING WITH ANALYTIC HIERARCHY PROCESS

Keywords Financial Structure, Profitability, Manufacturing Companies, Nigeria. Jel Classification L22, L25, L60.

Foreign Exchange Forecasting via Machine Learning

Predicting and Preventing Credit Card Default

101: MICRO ECONOMIC ANALYSIS

Ageing and Vulnerability: Evidence-based social protection options for reducing vulnerability amongst older persons

Technical Documentation for Household Demographics Projection

Journal of Insurance and Financial Management, Vol. 1, Issue 4 (2016)

Relationship Between Household Nonresponse, Demographics, and Unemployment Rate in the Current Population Survey.

Technical Appendices to Extracting Summary Piles from Sorting Task Data

Transcription:

MODELLING HEALTH MAINTENANCE ORGANIZATIONS PAYMENTS UNDER THE NATIONAL HEALTH INSURANCE SCHEME IN NIGERIA *Akinyemi M.I 1, Adeleke I. 2, Adedoyin C. 3 1 Department of Mathematics, University of Lagos, Akoka, Lagos, Nigeria. makinyemi@unilag.edu.ng 2 Department of Business Administration University of Lagos, Akoka, Lagos, adeleke22000@gmail.com 3 Stamford University, Birmingham, Alabama, USA christsonade@gmail.com Abstract: The Nigerian National Health Insurance Scheme (NHIS) is set up to ensure equitable payment of health care bills combining and prudently reducing cost-burden distribution for residents, versus high health care costs. Health maintenance organisations (HMO) are limited liability companies which could be established by private, public or individual entities with a main aim of being players in the scheme. This paper explored logistic regression (LR), linear discriminant analysis (LDA) and random forest (RF) in determining the factors that could determine if a HMO will cover full or part of an individual's healthcare bill. The results do not show a significant difference in the classification accuracies of the three methods. Inferring that the highest number of the Nigerian residents that make use of the NHIS lie between the 31-40yrs age bracket and that largely, ailment classification and the insured s age are key determining factors of whether a HMO would cover all or part of the bill. Keywords: Random forests, Linear discriminant analysis, Logistic regression, Confusion matrix, Health care insurance 1. INTRODUCTION Increasing health care expenditures have given rise to numerous studies on the determinants of health care expenditures, health care policies, health care insurance, health care financing and other aspects of health care. In their 2006 report WHO, (2006), the public spending per capita for health in Nigeria was put at is less than $5, and dropping below $2 in some poorer parts of the nation. The Nigerian government thus inaugurated a committee in the National Council on Health which recommended the need for Health Insurance in Nigeria. In order to ensure the reach of the healthcare scheme, the NIHS put together different programmes aimed at accommodating diverse facets of the Nigerian sociocultural makeup. Modelling health insurance is an ongoing source of research; this is in a bid to deliver free, fair and accessible health care to the critical masses in the world economies. 2. LITERATURE REVIEW Karanikolos, et al., (2013) in their paper study the effects recent economic crisis in Europe and the corresponding responses of governments has had on health systems. Ifanti, Argyriou, Kalofonou, & Kalofonos, (2013) explore data pertaining to effect of financial crisis as it relates to ascetical steps on health care, social services and health furtherance policies in Greece. Adeleke, Hamadu, & Ibiwoye, (2012) evaluates the NHIS capitation governance. Hamadu & Adeleke, (2012) built a model-assisted credibility assessment score for health Insurance claims in Nigeria. Ibiwoye & Adeleke, (2008) apply logistic regression to assess level of employee participation and observed that ISSN: 2408-1906 Page - 13 -

spatial awareness of the scheme is a key factor affecting participation. However, it has been noticed that most times the health care organizations do not cater for the entire cost of treatment. It has been observed that often the insured person still must foot part of the medical bill. This study explores three predictive models viz: logistic regression (LR), linear discriminant analysis (LDA) and Random forest (RF) to identify the factors that could affect whether a HMO will cover full or a fraction of a person s healthcare bill. This study is very useful as this will give insights into the key factors that contribute to the kind of insurance coverage an individual can have access to. The proceeding parts of this paper are set up as follows: In Section 3 a very brief overview of the methods employed in the research is given, we discuss the data as well as the results of some empirical analysis in Section 4. Finally, Section 5 concludes. 3. METHODOLOGY 3.1 Logistic Regression The logistic regression response variable is usually dichotomous Pampel, (2000), that is, it can take the value 1 (probability, π), or 0 (probability,1-π). The model, given as:, P(Y) being the probability that Y will occur, is a constant,,, are the predictor variables and each are the coefficients or weights attached to each predictor. 3.2 Random Forests Random forest (RF) technique developed by Breiman, (2001) is based on the use of classification and regression trees. The RF classification procedure was carried out with the randomforest package in R see, Liaw & Wiener, (2002). 3.3 Linear discriminant analysis In LDA, a linear combination of auxiliary variables is identified which maximises separation between categorical response groups Hastie, Tibshirani, & Friedman, (2009):. The weights are chosen to maximize the separation between groups. For LDA the covariates are assumed to follow a multivariate normal distribution. 3.4 Confusion matrix The performance of the classification system is evaluated using a confusion matrix Kohavi & Provost, (1998). The matrix consists of actual and predicted classifications. Table 1 gives the confusion matrix for a two-class classification. The cell entries in Table 1 in the context of our study mean the following: Table 1: Confusion Matrix Predicted Negative Positive Negative n 11 n 12 Actual Positive n 21 n 22 a11 is the specificity i.e. the count of true predictions of negative instances (number of True negatives (TN)), calculated as :. a12 is the number of incorrect predictions that an instance is positive i.e. the number of ISSN: 2408-1906 Page - 14 -

False positives (FP), calculated as:. a21 is the number of incorrect predictions that an instance negative i.e. the number of False negatives (FN), calculated as:. a22 gives the sensitivity i.e. the number of correct predictions that an instance is positive i.e. the number of True positives (TP), calculated as:. The overall accuracy (AC) calculated as: number of predictions that were correct. is the proportion of the total 4. RESULTS 4.1 Data collection and data management The data considered was collected from the Nigerian National Health Insurance Scheme. The data consists of over 55,000 cases spanning a 2-year period including variables age, classification of illness, kind of ailment, medical bill and HMO payment. Difference between medical bill and HMO payment was taken and then coded as follows: Yes, full cost of treatment catered for = 1, No, full cost of payment not catered for = 0. Table 2 given below shows the distribution of patients who had their medical bills reimbursed in full or not. Table 2: Frequency distribution of medical bill cover Medical bill cover Frequency Percentage (%) Medical bill not paid in full by HMO Medical bill paid in full by HMO 38717 65.05 20801 34.95 Table 2 given above shows the frequency distribution of the insured who had their medical bill covered and those that did not. We observed here that most of the insured (65.05%) did not have their medical bill paid in full. 4.2 Logistic Regression The logistic regression shows (Table 3) that both the age and the classification of ailment a are quite useful in determining whether an insured person will have their medical bill paid in full or not. Table 3: Logistic regression coefficients Parmeter Estimate Std. Error P-value (Intercept) -1.721e+03 4.853e+01 <2e-16*** Age bands 4.851e-02 6.338e-03 1.95e-14*** Classification 4.591e-01 1.274e-01 0.000315*** Year 8.549e-01 2.411e-02 <2e-16*** (*** indicates significance at 99.9% confidence level) Furthermore, we computed the AUC of our model to be 61.23% i.e. given a random patient for whom the HMO pays in full, and a random ISSN: 2408-1906 Page - 15 -

patient for whom the HMO pays part, our logistic model will correctly classify which is which about 61% of the time. 4.3 Confusion matrix Table 4 given below presents the sensitivity and specificity and overall accuracy of the models Table 4: Accuracy, Sensitivity, specificity and AUC Model Sensitivity Specificity Accuracy (%) (%) (%) AUC (%) Logistic Regression 16.59 92.7 66.12 61.23 (t= 0.5) Random Forest 5.52 93.21 66.3 58.59 LDA 17.2 92.1 54.65 61.23 Table 4 given above indicates that all three models will have the similar specificity, that is they will all correctly classify the cases where the HMO will not pay the bill in full over 90% of the time. 5. DISCUSSION & RECOMMENDATIONS Analysis revealed that apart from the general and unspecified ailments (41.66%), pregnancy (9.14%), respiratory (12.54%) and digestive system (9.94%) related ailments make up for the bulk of the insured people while blood, ear, male genital, neurological, psychological and social problems related ailments each make up less than 1% of the insured. However, only the Urological ailment classification has more people with their bills fully paid by the HMO (0.83% paid as against 0.79% not paid). Predominantly bills are usually not fully paid by the HMO across all ailments. In addition, we observed that most of the insured people belong to the 31-40 age group (41.04%) after which we have children less than 10years of age (27.06%) This amounts for the people who are most likely to be in paid employment at "white collar" jobs and their children. These people are likely to have access to the HMO services that their company offers. Furthermore, we found that in year 2013, an almost even split between those who had their medical bills fully paid and those who did not (12.60% vs. 12.44%). Furthermore, we also observed from Error! Reference source not found. that there was a rise in users of the health insurance scheme (i.e. from 1.11% to 73.84%) this figured dipped significantly in 2013 (25.04%). Although the sensitivity of the models is very low the LR model and the LDA model seem to have a better sensitivity than the RF model as they will correctly classify the number of cases where the HMO will pay at the medical bill in full about 16.59% and 17.2 % of the time respectively while the random forest model will only classify such about 5.52% of the time. On the other hand, the overall accuracy of the LR models and the Random forest are not far apart hovering around 66% while that of the LDA model is just over 50%. It is recommended that the government generate more awareness of the NHIS to other clusters of the population especially the unemployed and underemployed members of the society. Furthermore the older dependent members of the populations should be encouraged to take ISSN: 2408-1906 Page - 16 -

advantage of the scheme. 6. CONCLUSION The distribution of the medical bill cover of the people insured under the Nigerian NHIS system was studied. The instance of medical bill payment by the HMOs was modelled using a logistic regression model with age bands, ailment classification and year of claim as the variables. Comparison was made between the logistic regression model, the random forests and linear discriminant analysis. It was found both age, and ailment classification are useful in predicting whether and insured person's medical bill will be paid in part or in full. It was also found that most of the beneficiaries of the NHIS system are the working class of the population and their children. Furthermore, the model comparison shows that the logistic regression and the random forests both have relatively better accuracy than the linear discriminant analysis models, all three models have similar strong specificity, however, the Random forest has the lowest sensitivity and the linear discriminant model has the highest specificity. REFERENCES Adeleke, I., Hamadu, D., & Ibiwoye, A. (2012). Evaluation of the capitation regime of Nigeria Health Insurance Scheme. International Journal of Academic Research Part A, 4(5). Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. Hamadu, D., & Adeleke, I. (2012). Model-assisted credibility rating for health Insurance claims. Journal of Mathematics and Technology, 3(2), 32-37. Hastie, T., Tibshirani, R., & Friedman, J. (2009). THE ELEMENTS OF STATISTICAL LEARNING. In T. Hastie, R. Tibshirani, & J. Friedman, Springer Series in Statistics. Springer. Ibiwoye, A., & Adeleke, I. A. (2008). Does National Health Insurance Promote Access to Quality Health Care? Evidence from Nigeria. The Geneva Papers on Risk and Insurance - Issues and Practice, 33(2), 219-233. Ifanti, A. A., Argyriou, A. A., Kalofonou, F. H., & Kalofonos, H. P. (2013). Financial crisis and austerity measures in Greece: Their impact on health promotion policies and public health care. Health Policy, 113(1-2), 8-12. Karanikolos, M., Mladovsky, P., Cylus, J., Thomson, S., Basu, S., Stuckler, D.,... McKee, M. (2013). Financial crisis, austerity, and health in Europe. The Lancet, 381(9874), 1323-1331. Kohavi, R., & Provost, F. (1998). On Applied Research in Machine Learning. In Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, 30. Liaw, A., & Wiener, M. (2002). Classification and Regression by randomforest. R News, 2(3), 18-22. Retrieved from http://cran.r-project.org/doc/rnews/ Pampel, F. (2000). Logistic regression: a primer. In F. Pampel, Sage university papers series: Quantitative applications in the social sciences. Sage Publications. ISSN: 2408-1906 Page - 17 -