Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA

Size: px

Start display at page:

Download "Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA"

Bryan Lewis
5 years ago
Views:

1 Session 178 TS, Stats for Health Actuaries Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA Presenter: Joan C. Barrett, FSA, MAAA

2 Session 178 Statistics for Health Actuaries October 14, 2015 Presented by: Joan C. Barrett, FSA, MAAA Ian Duncan, FSA, FIA, FCIA, FCA, MAAA

3 Today s Agenda Basic Statistics A Quick Look at Regression Analysis Page 2

4 Basic Statistics

5 The Statistical Triad Estimation Prediction Hypothesis Testing Page 4

6 A Few Basic Formulas Standard Mean Variance Deviation Symbol µ = E(X) σ 2 σ Var(X) Formula x i f(x i ) (x i - µ) 2 f(x i ) = E(x 2 )-E 2 (X) Excel Formula AVERAGE VAR.P STDEV.P Page 5

7 Claims Frequencies Bernoulli Binomial (N = 1,000) Intuitive Concept Flip a coin, once Flip a coin, N times Mean (µ) p, the probability of success Variance (σ 2 ) p x (1-p) N x p x (1 p) Np Hospital Admits Mean Variance Any claims Mean Variance 6.0% 5.6% 30% 21% Page 6

8 Sample Calculations (Bernoulli) Variable Step Success Failure Combined Mean = µ Value of x 1 0 N/A Probability of x 6.0% 94.0% 100% Mean = µ = Weighted Average 6.0% 0.0% 6% Variance = σ 2 x - µ 94.0% -6.0% N/A (x - µ) % 0.4% N/A Variance = σ 2 = sum of squares 5.3% 0.3% 5.6% Page 7

9 Additional Formulas for Weighted Averages E(X) = Weight = c % Admits Var(X) c 2 Var(X) Children 33.0% 3.0% 2.9% 0.3% Women < % 5.0% 4.8% 0.2% Women % 10.0% 9.0% 0.4% Men < % 5.0% 4.8% 0.1% Men % 10.0% 9.0% 0.2% Combined 100.0% 6.0% 1.1% Key Formulas E(X+Y) = E(X)+E(Y) E(cX) = ce(x) Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y) Var(X +Y) = Var(X) + Var (Y), if X and Y are independent Var(cX) = c 2 Var(X) Page 8

10 Normal Approximation to Binomial A binomial distribution is approximately normal if N > 30 N x p > 5 N x p x (1 p) > 5 Mean = N x p Variance = N x p x (1 p) Page 9

11 The Standard Normal Curve f(x) 45.0% 40.0% 35.0% 30.0% 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% F(x) 110.0% 100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% Probability Distribution Cumulative Function Standard Normal Mean = 0 Variance = 1 To convert any normal distribution to standard normal Z = x-µ σ Slide 10

12 Excel Functions for Normal Distribution logical value Function Description TRUE FALSE NORMDIST(x, mean, stddev,logical ) Curve Cumulative Bell-shaped Any normal distribution Yes Yes Input x-axis x-axis Returns y-axis y-axis Returns for x = % 39.9% Returns for x = % 5.4% NORM.INV(probability, mean, stdev) Input y-axis N/A Returns x-axis N/A Returns for probability = 50.0% 0.0 N/A Returns for probability = 97.7% 2.0 N/A Use this to graph bellshaped curves Use this to determine confidence limits Assumes standard normal Slide 11

13 Sensitivity Analysis Same shape, just shifted Same center, different shape Slide 12

14 Key Numbers To Remember f(x) 45.0% 40.0% 35.0% 30.0% 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% - 1 standard deviation Probability Distribution +1 standard deviation The standard normal curve is symmetrical We tend to look at variation around the mean Range around the Mean Probability +/- 1 standard deviation 68.3% +/- 2 standard deviations 95.4% +/- 3 standard deviations 99.7% +/ standard deviations 95.0% +/ standard deviations 99.0% Slide 13

15 Central Limit Theorem Suppose we have a sample of n independent draws, X 1, X 2,,X n, from any distribution Then we can define a new random variable, the sample mean Z = x = X 1 + X X n n The sample mean is a standard normal distribution with mean = µ (the population mean) and variance σ 2 /n Which means Z is a standard normal variable where Slide 14

16 Our major concern When we want to be here? Does our data say we are here Will we be here next year? Are results stable year over year? Do I need a margin..how much? Slide 15

17 The trick is to narrow the curve 300.0% 250.0% 200.0% 150.0% 100.0% 50.0% 0.0% n=10 n= 50 The sample variance is σ 2 /n Major trade-off: The more homogenous the group, the smaller the sigma and the smaller the n Slide 16

18 Confidence Interval for IP Admits Members N = 1,000 N= 100,000 N = 1,000,000 Probability of Admit 6.0% 6.0% 6.0% Expected Admits 60 6,000 60,000 Variance = N x p x (1 - p) = σ ,640 56,400 Standard Deviation = σ Multiplier at 95% Confidence Level +/ / /-1.96 Confidence Interval +/14.7 +/ / Confidence Limit as % of Mean +/-24.5% +/-2.5% +/-0.8% If a population has 1,000,000 members, then there is a 95% chance that any sample will be within +/-0.8% of the true mean Slide 17

19 The Standard for Full Credibility The standard can be expressed in terms of a confidence interval. Example: How many observations do I need to be 95% sure that my data is within +/- 1% of the true mean? In our example, full credibility requires roughly 1 million members or 60,000 admits Use logic on previous slide, but solve for N Slide 18

20 Hypothesis Testing Overview Description Null Hypothesis Alternate Hypothesis Mathematical µ = µ* µ µ* In Words The population mean is µ* Our current assumption, µ*, is still correct The population mean is not µ* We need to change our assumptions Hypothesis Accept the Hypothesis Reject the hypothesis True Correct Type 1 error False Type II error Correct Standard practice is to err on the side of avoiding Type 1 error accept the hypothesis unless clear indication to the contrary Slide 19

21 Hypothesis Testing where does your test statistic confidence interval? Reject 2.5% 95.0% Accept Reject 2.5% The curve represents what the distribution will look like if the null hypothesis is true Where does your test-statistic fall? Slide 20

22 Hypothesis Testing Members N = 1,000 N= 100,000 N = 1,000,000 µ * = Expected Admits 60 6,000 60,000 σ = Standard Deviation X = Actual Admits (3% higher than expected) ,180 61,800 Z = Test Statistic = (X - µ*)/σ / / /-1.96 Accept/Reject Accept Reject Reject In each case, the actual admits are 3% higher than expected, but we accept the null hypothesis if we only have 1,000 members but reject it if we have 100,000 or more. Slide 21

23 p-value Basically, the probability of a Type 1 error the probability that your sample or a more extreme one will show a statistically significant difference even when the null hypothesis is true The lower the p-value the better should be less than 1- confidence level (5% at the 95% confidence level) Considered the gold standard for statistically significant differences But. p-value based on one sample from one population: What if the next sample shows there is no difference? What happens if you use a similar but not identical population that shows no difference? This has been controversial since the early days of statistics Recommendation Routinely check p-value using Z.TEST in Excel Reconcile differences Slide 22

24 But how do we know what µ and σ are? We are going to have to estimate µ and σ, but we need some criteria first: Consistent estimator: tends to converge on true value as the sample size becomes larger Maximum likelihood estimator: If the true value of the unknown parameter has this value, then the probability of observing this value is maximized Unbiased estimator: Expected value is equal to the true value Slide 23

25 Some rules of thumb Use the sample mean to estimate the population mean x = x i /n If n > 30, use the sample variance s 2 = (x x i ) 2 /(n-1) In Excel STDEV.P returns population standard deviation (divides by n) STDEV.S returns the sample standard deviation (divides by n-1) Slide 24

26 Chi-Square Distribution: Sampling Χ 2 = Z 12 + Z 22 + Z n 2 Where Z 1, Z 2, etc are independent, standard normal distributions Has n - 1 degrees of freedom E(Χ n ) = n Var(Χ n ) = 2n Σ (A E) 2 /E is approximately Chi-square with n k 1 degrees of freedom where k = number of parameters to be estimated Slide 25

27 The t Distribution for small samples Define a new distribution T = Where Z Y/n Z is a standard normal distribution Y is a Chi-square distribution with n degrees of freedom Example: t = (x µ*) s/ n Note 1: µ* is the hypothetical population mean, usually the current assumption Note 2: t has n-1 degrees of freedom Slide 26

28 t-distribution Examples T.DIST(x, df, logical) returns T.INV(probability, df) returns Degrees of Logical x Freedom True False % 1.1% % 38.9% % 1.1% % 0.8% % 39.4% % 0.8% Degrees of Probability Freedom x 0.7% % % % % % 20 3 Slide 27

29 Other Uses of Chi-Square Distributions Σ (A E) 2 /E is approximately Chi-square with n k 1 degrees of freedom where k = number of parameters to be estimated The chi-square test can be used to test independence between two distributions Can do hypothesis to indicate if there is a real difference in two distributions. Slide 28

30 Sample Probability Distribution: IP Length of Stay ALOS Probability Cumulative Variance Range i x i f(x i) F(x i) (x i -µ) 2 Exactly 1 day % 20.9% 9.0 Exactly 2 days % 50.8% 4.0 Exactly 3 days % 69.8% 1.0 Exactly 4 days % 80.0% 0.0 Exactly 5 days % 85.4% 1.0 Exactly 6 days % 88.7% 4.0 Exactly 7 days % 91.0% 9.0 Exactly 8 days % 92.7% 16.0 Exactly 9 days % 93.8% Days % 100.0% Sum/Sumproduct % 24.1 Slide 29

31 Chi-Square Test Expected Expected Expected Actual Actual Range i LOS Distribution Admits LOS Admits Χ 2 Exactly 1 day % Exactly 2 days % Exactly 3 days % Exactly 4 days % Exactly 5 days % Exactly 6 days % Exactly 7 days % Exactly 8 days % Exactly 9 days % Days % Sum/Sumproduct % Chi-square Statistic 0.95 chisq.test(actual range,expected range) = 99.95% Slide 30

32 Regression Analysis

33 Residual: Observed Value Predicted Value Observation Independent Predicted Observed Number Variable Value Value Residual i x y y y i e i (1.1) (1.9) There is a curve which is the true underlying values Residuals are values of a random variable ϵ i The residual is basically the difference between the dot and the line Slide 32

34 Underlying Assumptions y i = β 0 + β 1 x i + ϵ i x 1, x 2,, x n are non-stochastic variables E(ϵ i ) = 0 and var(ϵ i ) = σ 2 The ϵ i s are independent random variables Note: β 0, β 1 and σ 2 are the true unknown values. We are going to have to estimate these values based on a specific data set Slide 33

35 Analysis of Variance (ANOVA): Total Sum of Squares (TSS) Observation Observed Overall Number Value Mean Δ Δ 2 i y i y y i - y (1.0) (2.0) (1.0) Total 16.0 The purpose of ANOVA is to understand how much of the variance is accounted for by the curve The starting point is calculating the total variance from the overall mean (the red line) Slide 34

36 Regression Sum of Squares (RSS) Predicted Value Overall Mean Observation Predictied Overall Number Value Mean Δ Δ 2 i y x y (1.8) (0.9) Total 8.1 How much of the variance is explained by the fact that we have a curve? Looking the difference between the blue line and red line Slide 35

37 Error Sum of Squares (ERSS): Residuals A Simple Regresssion Example y values Observation Observed Predicted 11 Number Value Value Δ Δ 2 i y i y y e i (1.1) (1.9) Total x values Measures unexplained variance Slide 36

38 TSS = RSS + ERSS Sum of Squares Abv Description of Δ Value Total (Total Variance) Regression (Explained Variance) Error (Unexplained Variance) TSS Observed Values Overall Mean 16.0 RSS Predicted Values Overall Mean 8.1 ERSS Observed Values Predicted Values 7.9 Total Variance = Explained Variance + Unexplained Variance R 2 = Explained Variance/Total Variance = % of Total Variance Explained by Regression R 2 = 8.1/16 = 51% Slide 37

39 Why are non-stochastic values important? Stochastic Variable Non-Stochastic Variable Member lives in Zip 999 Area-adjusted PMPM Member is 42 Male Age-sex adjusted PMPM Member took health risk assessment % taking health risk assessment Stochastic variables introduce variance not accounted for in standard analysis of variance May be ignoring factors important in determining the value Incentive for taking health risk assessment may not be the same for each group Is this variance always material? Slide 38

40 The Bad News Health care costs are not normally distributed, so generalized linear models may have to be used Excel does not handle generalized linear models Can still use other methods, but be careful about disclaimers Slide 39

41 Criteria for estimating β 0, β 1 and σ We are going to use same criteria that we used to estimate µ and σ in Stats 101 Consistent estimator: tends to converge on true value as the sample size becomes larger Maximum likelihood estimator: If the true value of the unknown parameter has this value, then the probability of observing this value is maximized Unbiased estimator: Expected value is equal to the true value Slide 40

42 Least Squares Estimate Basic premise: Find the values b 0 and b 1 which minimize the sum of the squares from each data point to the theoretical line (y i (b 0 -b 1 x i )) 2 Take first derivative and solve for values Results are consistent, maximum likelihood and unbiased estimators Slide 41

43 A Good Candidate for Simple Linear Regression Statistic Value Intercept 4.2 Slope 1.1 R 2 73% Variance appears to be normal ~ 2/3 fall within 1 standard deviation of the mean ~1/3 fall between 1 and 2 standard deviations Line is not too flat High R 2 Slide 42

44 Excel Formulas For Key Values Input Values Intercept: =intercept(known y s, known x s) Slope: =slope(known y s, known x s) R 2 : =rsq(known y s, known x s) x y Slide 43

45 Excel has data analysis add-in Data Data Analysis Requires a one-time set-up Slide 44

46 Data Analysis has several options Choose regression option Slide 45

47 Minimum input: Known y s, known x s, output placement Slide 46

48 Sample Output SUMMARY OUTPUT Regression Statistics KEY VALUES Multiple R R Square Adjusted R Square Standard Error Observations 20 ANOVA ANOVA df SS MS F Significance F Regression Residual Total COEFFICIENT Coefficients Standard Error t Stat P-value Lower 95% Upper 95% TEST DATA Intercept X Variable Additional output available if requested in dialogue box Slide 47

49 Anscombe s Quartet: Data Set 1 vs Data Set 2 Data Set 1 Data Set 2 What is the expected difference in slope, intercept and R 2? Would you rely on the curve for data set 1? For data set 2? Slide 48

50 Anscombe s Quartet: Data Set 1 vs Data Set 3 Data Set 1 Data Set 3 Would you rely on the curve for data set 3? Slide 49

51 Anscombe s Quartet: Data Set 1 vs Data Set 4 Data Set 1 Data Set 4 Would you rely on the curve for data set 4? Slide 50

52 Example: How well does risk score predict cost? $1,200 $1,000 $800 $600 $400 $200 $- Area Adjusted PMPM Risk Score Function Value INTERCEPT $35.18 SLOPE $ CORREL 62% AVERAGE Used retro risk score Random sample of males aged 42 from a large group N = 28 Divided raw PMPM by area factor Slide 51

53 What is the Value of R 2? Slide 52

54 The Basics y i = β 0 + β 1 x β n x n + ϵ i ϵ i is a value of the residual random variable described earlier Slide 53

55 Why Multiple Linear Regression Shape of the curve/polynomial y i = β 0 + β 1 x + β 2 x 2 + ϵ i Additional explanatory variables Age + Gender probably explains costs better than age alone Control for confounding factors All other things being equal Example: Control for area Slide 54

56 Are your independent variables dependent on each other? In most cases, find the independent variable that best explains overall variance Test each independent variable one at a time Also, combinations of variables Test interdependence by doing analytics comparing just the variables in question How well does age-gender predict risk score? How well does risk score predict age-gender? Slide 55

57 Categorical Variables Categorical: Separate into groups even if the variable is not numeric per se Gender: 1 = female, 0 = male Alternately Gender1 = 1 if female, 0 if male Gender2 = 0 if female 1, if male Think in terms of the marginal impact of each variable: The expected value of y i goes up β i with each unit change in x i Slide 56

58 Where Do You Go From Here? Pick a resource Barron s Business Statistics Anything by Jed Frees Practice, practice, practice Start with basics (chi-square, confidence limits, hypothesis testing) Move to regression analysis use adjusted PMPMs etc to get yourself started Make sure you can analyze and explain results Move to advanced analytics GLM for health care costs Trend methods Evaluation (probit, propensity analysis,etc) Disease progression (survival models) Slide 57

59 Q&A and Wrap-Up

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise. Math 224 Q Exam 3A Fall 217 Tues Dec 12 Version A Problem 1. Let X be the continuous random variable defined by the following pdf: { 1 x/2 when x 2, f(x) otherwise. (a) Compute the mean µ E[X]. E[X] x