is the bandwidth and controls the level of smoothing of the estimator, n is the sample size and

Similar documents
The SAS System 11:03 Monday, November 11,

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

Topic 30: Random Effects Modeling

New SAS Procedures for Analysis of Sample Survey Data

AP Statistics Chapter 6 - Random Variables

Maximum Likelihood Estimation

Estimation Procedure for Parametric Survival Distribution Without Covariates

The Delta Method. j =.

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

The FREQ Procedure. Table of Sex by Gym Sex(Sex) Gym(Gym) No Yes Total Male Female Total

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Introduction to Population Modeling

Insights into Using the GLIMMIX Procedure to Model Categorical Outcomes with Random Effects

Appendix. A.1 Independent Random Effects (Baseline)

Model 0: We start with a linear regression model: log Y t = β 0 + β 1 (t 1980) + ε, with ε N(0,

Proc SurveyCorr. Jessica Hampton, CCSU, New Britain, CT

Topic 8: Model Diagnostics

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

Stat 328, Summer 2005

1. You are given the following information about a stationary AR(2) model:

Model fit assessment via marginal model plots

Intro to GLM Day 2: GLM and Maximum Likelihood

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Homework Problems Stat 479

Bayesian Multinomial Model for Ordinal Data

AIC = Log likelihood = BIC =

Bloomberg. Portfolio Value-at-Risk. Sridhar Gollamudi & Bryan Weber. September 22, Version 1.0

Five Things You Should Know About Quantile Regression

Where s the Beef Does the Mack Method produce an undernourished range of possible outcomes?

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

Marital Disruption and the Risk of Loosing Health Insurance Coverage. Extended Abstract. James B. Kirby. Agency for Healthcare Research and Quality

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Homework 0 Key (not to be handed in) due? Jan. 10

SAS Simple Linear Regression Example

Window Width Selection for L 2 Adjusted Quantile Regression

Application of statistical methods in the determination of health loss distribution and health claims behaviour

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

Didacticiel - Études de cas. In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA.

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

Introductory Econometrics for Finance

Estimation of Volatility of Cross Sectional Data: a Kalman filter approach

Modelling Returns: the CER and the CAPM

ARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS

Random Variables and Probability Distributions

Full Web Appendix: How Financial Incentives Induce Disability Insurance. Recipients to Return to Work. by Andreas Ravndal Kostøl and Magne Mogstad

STA258 Analysis of Variance

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

The Two Sample T-test with One Variance Unknown

Homework Problems Stat 479

Quantitative Techniques Term 2

Semiparametric Modeling, Penalized Splines, and Mixed Models

Gamma Distribution Fitting

Analysis of Variance in Matrix form

Modeling Panel Data: Choosing the Correct Strategy. Roberto G. Gutierrez

Which GARCH Model for Option Valuation? By Peter Christoffersen and Kris Jacobs

Generalized Linear Models

book 2014/5/6 15:21 page 261 #285

Idiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective

List of tables List of boxes List of screenshots Preface to the third edition Acknowledgements

Log-linear Modeling Under Generalized Inverse Sampling Scheme

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing

Practice Exam 1. Loss Amount Number of Losses

SAS/STAT 14.1 User s Guide. The LATTICE Procedure

MVE051/MSG Lecture 7

Predicting the Success of a Retirement Plan Based on Early Performance of Investments

DATA SUMMARIZATION AND VISUALIZATION

Lecture 2. Probability Distributions Theophanis Tsandilas

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Discrete Choice Modeling

FINITE SAMPLE DISTRIBUTIONS OF RISK-RETURN RATIOS

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

The Optimization Process: An example of portfolio optimization

THE EFFECTS OF FISCAL POLICY ON EMERGING ECONOMIES. A TVP-VAR APPROACH

22S:105 Statistical Methods and Computing. Two independent sample problems. Goal of inference: to compare the characteristics of two different

Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University

Lecture 6: Non Normal Distributions

Quick Guide to Secondary Claims

Monte Carlo Simulation (General Simulation Models)

Empirical Study on Short-Term Prediction of Shanghai Composite Index Based on ARMA Model

A Comparison of Univariate Probit and Logit. Models Using Simulation

Chapter 6 Confidence Intervals

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

Keywords Akiake Information criterion, Automobile, Bonus-Malus, Exponential family, Linear regression, Residuals, Scaled deviance. I.

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

20135 Theory of Finance Part I Professor Massimo Guidolin

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Package optimstrat. September 10, 2018

Building and Checking Survival Models

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Software Tutorial ormal Statistics

Transcription:

Paper PH100 Relationship between Total charges and Reimbursements in Outpatient Visits Using SAS GLIMMIX Chakib Battioui, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to examine the use of different statistical tools to investigate data concerning outpatient visits in a hospital setting. The specific objective is to examine the relationship between the total charges billed by a hospital compared to the payments received for outpatient care. The data were derived from the 2002 Medical Expenditure Panel Survey (MEPS), containing 20,488 outpatient event records. Kernel Density estimation was used to examine the distribution of the total charges compared to reimbursements. Results show that the two distributions are not normal. There is also a shift of cost between the two, a shift that must be paid by the healthcare provider instead of the insurer. The Generalized Linear Mixed Model (GLMMIX) was used to investigate the shift between charges and reimbursements that must be absorbed by the hospital. The GLMMIX is needed when the distribution of the data is not normal or may not have an exact distribution. It is used in place of PROC GENMOD when there are random effects in the data. The different linear models are compared and contrasted in this paper. INTRODUCTION The purpose of this study is to analyze the difference between the total charges billed by a hospital and the reimbursements received by the hospitals for outpatient care. The data used in this project were obtained from the 2002 Medical Expenditure Panel Survey (MEPS). MEPS provide nationally representative estimates of health care use, expenditures, sources of payment, and insurance coverage for the U.S. civilian non-institutionalized population. MEPS is co.-sponsored by the Agency for Healthcare Research and Quality (AHRQ) and the National Center for Health Statistics (NCHS) [1]. The data used in this study give detailed information on outpatient visits covering calendar year 2002. It is divided into three parts: The Household Component (HC) is the core survey and forms the basis for the Medical Provider Component (MPC) and part of the Insurance Component (IC). The HC contains data about demographic characteristics, health conditions, health status, use of medical care services, charges and payments, access to care, satisfaction with care, health insurance coverage, income, and employment. The MPC includes detailed information reported by the HC respondents about physicians, hospitals, pharmacies and health agencies. While the IC contains data about health insurance plans obtained through private and public-sector employers. Healthcare costs have many outliers. We hypothesize that costs are not normally distributed. We are trying to examine whether healthcare providers lose money if they are paid by average cost. METHOD Kernel Density estimation was used to show that the distribution of the total charges and the reimbursement for hospital outpatient services are not normal. Kernel Density estimation provides normal, triangular and quadratic kernel density estimators. The general form of a kernel estimator is where λ is the bandwidth and controls the level of smoothing of the estimator, n is the sample size and 0 is a known density function. Kernel Density provides a very useful means of investigating an entire population. It is commonly used to test the data sample to make sure it is large enough and to determine if it follows a normal distribution. PROC KDE is a procedure used in SAS/Stat to perform the Kernel Density estimation. It uses only the standard normal density but allows different methods to estimate the bandwidth. The Generalized Linear Mixed Model (GLMM) was used to investigate the shift between payments and reimbursements. The Generalized Linear Model (GLM) is a statistical model that applies when the data are not normal. A generalized linear model is a model of the form Y=g(b'X) K

where Y is a vector of dependent variables, X is a column vector of independent variables, b' is a row vector of parameters and g(.) is a possibly random function called a link function [2]. The General Linear Mixed Model (GLMM) extends the GLM when there are random effects in the data. GLMM generalizes the Linear Mixed Model when the response can have a nonlinear distribution. The GLIMMIX procedure is a new procedure in SAS/STAT software. It is used to perform an estimation and statistical inference for GLMM. Proc GLIMMIX must be downloaded from http://support.sas.com/news/feature/04sep/statdownloads.html. RESULTS Using a SAS editor, a program was written to determine the density of total charges (optc02x) and reimbursements (opxp02x), using the following code; proc kde data =sasuser.h67f method=srot; univar optc02x/out=meps1; univar opxp02x/out=meps2; data meps3; length VAR $ 32 VALUE 8 DENSITY 8 COUNT 8; set meps1 meps2; keep VAR VALUE DENSITY COUNT; proc sort data=meps3 out=meps; by var; Note that in the above program, a by statement was used to compare total charges with reimbursements. Method=srot was added to specify the bandwidth method. The graph is computed by using a linear plot with value as the x-variable and density as the y-variable. The result is shown in figure 1. FIGURE1

Note that the two distributions are not normal, which violates the assumptions of the general linear model. On the other hand, by using figure 1 we can t determine the difference between the two distributions. A code was added in order to transform the data using a log function; Data meps; set sasuser.h67f ; optc02x=log(optc02x); opxp02x=log(opxp02x); The result graph is given in figure 2. Clearly, there is a shift of cost between the payments and reimbursements for hospital outpatient services. Notice that if the charge value is very small (dashed line or smaller) then the probability of the reimbursement value is very high. It is clear that there is a shift between amounts charged by the hospital, and the amount the hospital is reimbursed. FIGURE2 The General Linear Model was used in order to estimate the amount of the reimbursement. The program that fit the linear model is given below; proc glm data=sasuser.h67f; model opxp02x=optc02x; Results are shown in tables 1 and 2. The overall F- test is significant (p<0.0001), indicating that the model as a whole accounts for a significant amount of the variation in the charge (optc02x). The R 2 indicates that the model accounts for 67% of the variation in optc02x. TABLE 1 Source DF Sum of Squares Mean Square F Value Pr > F Optc02x 1 17886978939 17886978939 42406.2 <.0001 TABLE 2 R-Square Coeff Var Root MSE Optc02x Mean 0.685727 129.8287 649.4619 500.2450 Using Proc GLM, and aiming for better results, we added the ICD-9 variable (opicd1x) into our program. The ICD-9 is an abbreviation for the International Classification of Disease- 9 th edition; it is used to classify diseases and health conditions on health care claims and is the basis for prospective payment to hospitals, other health care facilities and

health care providers [3]. Results show a marginal improvement in the R 2 (from 67% to 72%). It is clear from this result that sicker patients who require more care are reimbursed at a rate similar to patients who are not as sick. In order to make sure that the results of PROC GLM are not likely to be due to the large data set used in the study (about 20,000 observations), we created a new file by choosing randomly 50 observations from our original data set, then we ran PROC GLM on our sample. Results are given in tables 3 and 4 show that the model is still statistically significant with the same R 2 value. TABLE 3 Source DF Sum of Squares Mean Square F Value Pr > F Optc02x 1 29249062.82 29249062.82 103.64 <. 0001 TABLE 4 R-Square Coeff Var Root MSE Optc02x Mean 0.687991 104.5268 531.2503 508.2433 In the previous model, the normality assumption was violated. PROC GLM requires that the random error must have a normal distribution, which was not the case for the reimbursement (opxp02x) as we showed above. The use of the Generalized Linear Mixed Model (GLMMIX) was needed instead of the general model (GENMOD) because the ICD-9 (opicd1x) must be random since there is a lot of discretion in their assignment. PROC GLIMMIX is used in place of PROC GENMOD when there are random effects in the data. Since our data set is very large and the number of the ICD-9 variable (opicd1x) is big, we created a new file by choosing randomly 50 observations from our original data set, then we ran PROC GLIMMIX on our new data set. The code is given below; proc glimmix data=sasuser.file50; class opicd1x ; MODEL opxp02x = optc02x /SOLUTION dist=gamma link=log; RANDOM opicd1x ; The PROC GLIMMIX statement invokes the procedure. The CLASS statement instructs the procedure to treat the variable opicd1x (ICD-9) as a classification variable. The MODEL statement specifies the response variable. The SOLUTION option in the MODEL statement requests a listing of the fixed-effects parameter estimates. The distribution of the dependent variable is Gamma with default log link function. The RANDOM statement specifies that the model contain the ICD-9 variable as a random effect. The model results of this analysis are shown in table5. TABLE 5 The GLIMMIX Procedure Model Information Data Set Response Variable Response Distribution Link Function Variance Function Variance Matrix Estimation Technique Degrees of Freedom Method SASUSER.File50 OPXP02X Gamma Log Default Not blocked Residual PL Containment Table 6 lists the size of relevant matrices, Table 7 provides information about the methods and size of the optimization problem, while Table 8 shows information about the fitted model.

TABLE 6 Dimensions G-side Cov. Parameters 1 R-side Cov. Parameters 1 Columns in X 2 Columns in Z 10 Subjects (Blocks in V) 1 Max Obs per Subject 43 TABLE 7 Optimization Information Optimization Technique Dual Quasi-Newton Parameters in Optimization 1 Lower Boundaries 1 Upper Boundaries 0 Fixed Effects Profiled Residual Variance Profiled Starting From Data TABLE 8 Fit Statistics -2 Res Log Pseudo-Likelihood 130.87 Generalized Chi-Square 26.57 Gener. Chi-Square / DF 0.65 The ratio of the generalized chi-square statistic and its degrees of freedom in Table 8 is close to 1(0.65). This indicates that the variability in these data has been properly modeled, and that there is very little residual overdispersion. Table 9 below lists the covariance parameter estimates. The variance and standard error of the random ICD-9 variable is estimated as 0.5063 and 0.3381 respectively. This appears to be significant. TABLE 9 Covariance Parameter Estimates Standard Cov Parm Estimate Error OPICD1X 0.5063 0.3381 Residual 0.6481 0.1557 The solution and Type III Tests of Fixed Effects are given in table 10 and 11 respectively. TABLE 10 Solutions for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept 4.7236 0.3017 9 15.66 <.0001 OPTC02X 0.000569 0.000067 32 8.47 <.0001

TABLE 11 Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F OPTC02X 1 32 71.68 <.0001 CONCLUSION The distribution of the total charges compared to reimbursements show that the two distributions are not normal and there is a shift of cost between the two. The results of the GLM procedure show that this model is significant even when we ignore the normality assumption, but the value of R 2 indicates that this model is not perfect. On the other hand, the results of GLIMMIX procedure show that this model is also significant and there is a significant variation of the random effect (ICD-9), which might explain the shift between the total charge and the reimbursement. This shift that must be paid by the healthcare provider instead of by the insurer. Even though the results of the two procedures GLM and GLIMMIX were significant, the use of PROC GLIMMIX was necessary to understand the shift of cost between the payments and reimbursements for hospital outpatient services. REFERENCES 1-http://www.meps.ahrq.gov/PUFFiles/H67G/H67Gdoc.pdf 2-http://economics.about.com/library/glossary/bldef-generalized-linear-model.htm 3-National committee on vital and health statistics (NCVHS) http://www.ncvhs.hhs.gov/031105a1.htm CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Chakib Battioui University of Louisville 2610 Whitehall Ter # 120 Louisville, KY 40220 (502) 439-4367 c0batt01@louisville.edu