Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA
|
|
- Bryan Lewis
- 5 years ago
- Views:
Transcription
1 Session 178 TS, Stats for Health Actuaries Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA Presenter: Joan C. Barrett, FSA, MAAA
2 Session 178 Statistics for Health Actuaries October 14, 2015 Presented by: Joan C. Barrett, FSA, MAAA Ian Duncan, FSA, FIA, FCIA, FCA, MAAA
3 Today s Agenda Basic Statistics A Quick Look at Regression Analysis Page 2
4 Basic Statistics
5 The Statistical Triad Estimation Prediction Hypothesis Testing Page 4
6 A Few Basic Formulas Standard Mean Variance Deviation Symbol µ = E(X) σ 2 σ Var(X) Formula x i f(x i ) (x i - µ) 2 f(x i ) = E(x 2 )-E 2 (X) Excel Formula AVERAGE VAR.P STDEV.P Page 5
7 Claims Frequencies Bernoulli Binomial (N = 1,000) Intuitive Concept Flip a coin, once Flip a coin, N times Mean (µ) p, the probability of success Variance (σ 2 ) p x (1-p) N x p x (1 p) Np Hospital Admits Mean Variance Any claims Mean Variance 6.0% 5.6% 30% 21% Page 6
8 Sample Calculations (Bernoulli) Variable Step Success Failure Combined Mean = µ Value of x 1 0 N/A Probability of x 6.0% 94.0% 100% Mean = µ = Weighted Average 6.0% 0.0% 6% Variance = σ 2 x - µ 94.0% -6.0% N/A (x - µ) % 0.4% N/A Variance = σ 2 = sum of squares 5.3% 0.3% 5.6% Page 7
9 Additional Formulas for Weighted Averages E(X) = Weight = c % Admits Var(X) c 2 Var(X) Children 33.0% 3.0% 2.9% 0.3% Women < % 5.0% 4.8% 0.2% Women % 10.0% 9.0% 0.4% Men < % 5.0% 4.8% 0.1% Men % 10.0% 9.0% 0.2% Combined 100.0% 6.0% 1.1% Key Formulas E(X+Y) = E(X)+E(Y) E(cX) = ce(x) Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y) Var(X +Y) = Var(X) + Var (Y), if X and Y are independent Var(cX) = c 2 Var(X) Page 8
10 Normal Approximation to Binomial A binomial distribution is approximately normal if N > 30 N x p > 5 N x p x (1 p) > 5 Mean = N x p Variance = N x p x (1 p) Page 9
11 The Standard Normal Curve f(x) 45.0% 40.0% 35.0% 30.0% 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% F(x) 110.0% 100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% Probability Distribution Cumulative Function Standard Normal Mean = 0 Variance = 1 To convert any normal distribution to standard normal Z = x-µ σ Slide 10
12 Excel Functions for Normal Distribution logical value Function Description TRUE FALSE NORMDIST(x, mean, stddev,logical ) Curve Cumulative Bell-shaped Any normal distribution Yes Yes Input x-axis x-axis Returns y-axis y-axis Returns for x = % 39.9% Returns for x = % 5.4% NORM.INV(probability, mean, stdev) Input y-axis N/A Returns x-axis N/A Returns for probability = 50.0% 0.0 N/A Returns for probability = 97.7% 2.0 N/A Use this to graph bellshaped curves Use this to determine confidence limits Assumes standard normal Slide 11
13 Sensitivity Analysis Same shape, just shifted Same center, different shape Slide 12
14 Key Numbers To Remember f(x) 45.0% 40.0% 35.0% 30.0% 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% - 1 standard deviation Probability Distribution +1 standard deviation The standard normal curve is symmetrical We tend to look at variation around the mean Range around the Mean Probability +/- 1 standard deviation 68.3% +/- 2 standard deviations 95.4% +/- 3 standard deviations 99.7% +/ standard deviations 95.0% +/ standard deviations 99.0% Slide 13
15 Central Limit Theorem Suppose we have a sample of n independent draws, X 1, X 2,,X n, from any distribution Then we can define a new random variable, the sample mean Z = x = X 1 + X X n n The sample mean is a standard normal distribution with mean = µ (the population mean) and variance σ 2 /n Which means Z is a standard normal variable where Slide 14
16 Our major concern When we want to be here? Does our data say we are here Will we be here next year? Are results stable year over year? Do I need a margin..how much? Slide 15
17 The trick is to narrow the curve 300.0% 250.0% 200.0% 150.0% 100.0% 50.0% 0.0% n=10 n= 50 The sample variance is σ 2 /n Major trade-off: The more homogenous the group, the smaller the sigma and the smaller the n Slide 16
18 Confidence Interval for IP Admits Members N = 1,000 N= 100,000 N = 1,000,000 Probability of Admit 6.0% 6.0% 6.0% Expected Admits 60 6,000 60,000 Variance = N x p x (1 - p) = σ ,640 56,400 Standard Deviation = σ Multiplier at 95% Confidence Level +/ / /-1.96 Confidence Interval +/14.7 +/ / Confidence Limit as % of Mean +/-24.5% +/-2.5% +/-0.8% If a population has 1,000,000 members, then there is a 95% chance that any sample will be within +/-0.8% of the true mean Slide 17
19 The Standard for Full Credibility The standard can be expressed in terms of a confidence interval. Example: How many observations do I need to be 95% sure that my data is within +/- 1% of the true mean? In our example, full credibility requires roughly 1 million members or 60,000 admits Use logic on previous slide, but solve for N Slide 18
20 Hypothesis Testing Overview Description Null Hypothesis Alternate Hypothesis Mathematical µ = µ* µ µ* In Words The population mean is µ* Our current assumption, µ*, is still correct The population mean is not µ* We need to change our assumptions Hypothesis Accept the Hypothesis Reject the hypothesis True Correct Type 1 error False Type II error Correct Standard practice is to err on the side of avoiding Type 1 error accept the hypothesis unless clear indication to the contrary Slide 19
21 Hypothesis Testing where does your test statistic confidence interval? Reject 2.5% 95.0% Accept Reject 2.5% The curve represents what the distribution will look like if the null hypothesis is true Where does your test-statistic fall? Slide 20
22 Hypothesis Testing Members N = 1,000 N= 100,000 N = 1,000,000 µ * = Expected Admits 60 6,000 60,000 σ = Standard Deviation X = Actual Admits (3% higher than expected) ,180 61,800 Z = Test Statistic = (X - µ*)/σ / / /-1.96 Accept/Reject Accept Reject Reject In each case, the actual admits are 3% higher than expected, but we accept the null hypothesis if we only have 1,000 members but reject it if we have 100,000 or more. Slide 21
23 p-value Basically, the probability of a Type 1 error the probability that your sample or a more extreme one will show a statistically significant difference even when the null hypothesis is true The lower the p-value the better should be less than 1- confidence level (5% at the 95% confidence level) Considered the gold standard for statistically significant differences But. p-value based on one sample from one population: What if the next sample shows there is no difference? What happens if you use a similar but not identical population that shows no difference? This has been controversial since the early days of statistics Recommendation Routinely check p-value using Z.TEST in Excel Reconcile differences Slide 22
24 But how do we know what µ and σ are? We are going to have to estimate µ and σ, but we need some criteria first: Consistent estimator: tends to converge on true value as the sample size becomes larger Maximum likelihood estimator: If the true value of the unknown parameter has this value, then the probability of observing this value is maximized Unbiased estimator: Expected value is equal to the true value Slide 23
25 Some rules of thumb Use the sample mean to estimate the population mean x = x i /n If n > 30, use the sample variance s 2 = (x x i ) 2 /(n-1) In Excel STDEV.P returns population standard deviation (divides by n) STDEV.S returns the sample standard deviation (divides by n-1) Slide 24
26 Chi-Square Distribution: Sampling Χ 2 = Z 12 + Z 22 + Z n 2 Where Z 1, Z 2, etc are independent, standard normal distributions Has n - 1 degrees of freedom E(Χ n ) = n Var(Χ n ) = 2n Σ (A E) 2 /E is approximately Chi-square with n k 1 degrees of freedom where k = number of parameters to be estimated Slide 25
27 The t Distribution for small samples Define a new distribution T = Where Z Y/n Z is a standard normal distribution Y is a Chi-square distribution with n degrees of freedom Example: t = (x µ*) s/ n Note 1: µ* is the hypothetical population mean, usually the current assumption Note 2: t has n-1 degrees of freedom Slide 26
28 t-distribution Examples T.DIST(x, df, logical) returns T.INV(probability, df) returns Degrees of Logical x Freedom True False % 1.1% % 38.9% % 1.1% % 0.8% % 39.4% % 0.8% Degrees of Probability Freedom x 0.7% % % % % % 20 3 Slide 27
29 Other Uses of Chi-Square Distributions Σ (A E) 2 /E is approximately Chi-square with n k 1 degrees of freedom where k = number of parameters to be estimated The chi-square test can be used to test independence between two distributions Can do hypothesis to indicate if there is a real difference in two distributions. Slide 28
30 Sample Probability Distribution: IP Length of Stay ALOS Probability Cumulative Variance Range i x i f(x i) F(x i) (x i -µ) 2 Exactly 1 day % 20.9% 9.0 Exactly 2 days % 50.8% 4.0 Exactly 3 days % 69.8% 1.0 Exactly 4 days % 80.0% 0.0 Exactly 5 days % 85.4% 1.0 Exactly 6 days % 88.7% 4.0 Exactly 7 days % 91.0% 9.0 Exactly 8 days % 92.7% 16.0 Exactly 9 days % 93.8% Days % 100.0% Sum/Sumproduct % 24.1 Slide 29
31 Chi-Square Test Expected Expected Expected Actual Actual Range i LOS Distribution Admits LOS Admits Χ 2 Exactly 1 day % Exactly 2 days % Exactly 3 days % Exactly 4 days % Exactly 5 days % Exactly 6 days % Exactly 7 days % Exactly 8 days % Exactly 9 days % Days % Sum/Sumproduct % Chi-square Statistic 0.95 chisq.test(actual range,expected range) = 99.95% Slide 30
32 Regression Analysis
33 Residual: Observed Value Predicted Value Observation Independent Predicted Observed Number Variable Value Value Residual i x y y y i e i (1.1) (1.9) There is a curve which is the true underlying values Residuals are values of a random variable ϵ i The residual is basically the difference between the dot and the line Slide 32
34 Underlying Assumptions y i = β 0 + β 1 x i + ϵ i x 1, x 2,, x n are non-stochastic variables E(ϵ i ) = 0 and var(ϵ i ) = σ 2 The ϵ i s are independent random variables Note: β 0, β 1 and σ 2 are the true unknown values. We are going to have to estimate these values based on a specific data set Slide 33
35 Analysis of Variance (ANOVA): Total Sum of Squares (TSS) Observation Observed Overall Number Value Mean Δ Δ 2 i y i y y i - y (1.0) (2.0) (1.0) Total 16.0 The purpose of ANOVA is to understand how much of the variance is accounted for by the curve The starting point is calculating the total variance from the overall mean (the red line) Slide 34
36 Regression Sum of Squares (RSS) Predicted Value Overall Mean Observation Predictied Overall Number Value Mean Δ Δ 2 i y x y (1.8) (0.9) Total 8.1 How much of the variance is explained by the fact that we have a curve? Looking the difference between the blue line and red line Slide 35
37 Error Sum of Squares (ERSS): Residuals A Simple Regresssion Example y values Observation Observed Predicted 11 Number Value Value Δ Δ 2 i y i y y e i (1.1) (1.9) Total x values Measures unexplained variance Slide 36
38 TSS = RSS + ERSS Sum of Squares Abv Description of Δ Value Total (Total Variance) Regression (Explained Variance) Error (Unexplained Variance) TSS Observed Values Overall Mean 16.0 RSS Predicted Values Overall Mean 8.1 ERSS Observed Values Predicted Values 7.9 Total Variance = Explained Variance + Unexplained Variance R 2 = Explained Variance/Total Variance = % of Total Variance Explained by Regression R 2 = 8.1/16 = 51% Slide 37
39 Why are non-stochastic values important? Stochastic Variable Non-Stochastic Variable Member lives in Zip 999 Area-adjusted PMPM Member is 42 Male Age-sex adjusted PMPM Member took health risk assessment % taking health risk assessment Stochastic variables introduce variance not accounted for in standard analysis of variance May be ignoring factors important in determining the value Incentive for taking health risk assessment may not be the same for each group Is this variance always material? Slide 38
40 The Bad News Health care costs are not normally distributed, so generalized linear models may have to be used Excel does not handle generalized linear models Can still use other methods, but be careful about disclaimers Slide 39
41 Criteria for estimating β 0, β 1 and σ We are going to use same criteria that we used to estimate µ and σ in Stats 101 Consistent estimator: tends to converge on true value as the sample size becomes larger Maximum likelihood estimator: If the true value of the unknown parameter has this value, then the probability of observing this value is maximized Unbiased estimator: Expected value is equal to the true value Slide 40
42 Least Squares Estimate Basic premise: Find the values b 0 and b 1 which minimize the sum of the squares from each data point to the theoretical line (y i (b 0 -b 1 x i )) 2 Take first derivative and solve for values Results are consistent, maximum likelihood and unbiased estimators Slide 41
43 A Good Candidate for Simple Linear Regression Statistic Value Intercept 4.2 Slope 1.1 R 2 73% Variance appears to be normal ~ 2/3 fall within 1 standard deviation of the mean ~1/3 fall between 1 and 2 standard deviations Line is not too flat High R 2 Slide 42
44 Excel Formulas For Key Values Input Values Intercept: =intercept(known y s, known x s) Slope: =slope(known y s, known x s) R 2 : =rsq(known y s, known x s) x y Slide 43
45 Excel has data analysis add-in Data Data Analysis Requires a one-time set-up Slide 44
46 Data Analysis has several options Choose regression option Slide 45
47 Minimum input: Known y s, known x s, output placement Slide 46
48 Sample Output SUMMARY OUTPUT Regression Statistics KEY VALUES Multiple R R Square Adjusted R Square Standard Error Observations 20 ANOVA ANOVA df SS MS F Significance F Regression Residual Total COEFFICIENT Coefficients Standard Error t Stat P-value Lower 95% Upper 95% TEST DATA Intercept X Variable Additional output available if requested in dialogue box Slide 47
49 Anscombe s Quartet: Data Set 1 vs Data Set 2 Data Set 1 Data Set 2 What is the expected difference in slope, intercept and R 2? Would you rely on the curve for data set 1? For data set 2? Slide 48
50 Anscombe s Quartet: Data Set 1 vs Data Set 3 Data Set 1 Data Set 3 Would you rely on the curve for data set 3? Slide 49
51 Anscombe s Quartet: Data Set 1 vs Data Set 4 Data Set 1 Data Set 4 Would you rely on the curve for data set 4? Slide 50
52 Example: How well does risk score predict cost? $1,200 $1,000 $800 $600 $400 $200 $- Area Adjusted PMPM Risk Score Function Value INTERCEPT $35.18 SLOPE $ CORREL 62% AVERAGE Used retro risk score Random sample of males aged 42 from a large group N = 28 Divided raw PMPM by area factor Slide 51
53 What is the Value of R 2? Slide 52
54 The Basics y i = β 0 + β 1 x β n x n + ϵ i ϵ i is a value of the residual random variable described earlier Slide 53
55 Why Multiple Linear Regression Shape of the curve/polynomial y i = β 0 + β 1 x + β 2 x 2 + ϵ i Additional explanatory variables Age + Gender probably explains costs better than age alone Control for confounding factors All other things being equal Example: Control for area Slide 54
56 Are your independent variables dependent on each other? In most cases, find the independent variable that best explains overall variance Test each independent variable one at a time Also, combinations of variables Test interdependence by doing analytics comparing just the variables in question How well does age-gender predict risk score? How well does risk score predict age-gender? Slide 55
57 Categorical Variables Categorical: Separate into groups even if the variable is not numeric per se Gender: 1 = female, 0 = male Alternately Gender1 = 1 if female, 0 if male Gender2 = 0 if female 1, if male Think in terms of the marginal impact of each variable: The expected value of y i goes up β i with each unit change in x i Slide 56
58 Where Do You Go From Here? Pick a resource Barron s Business Statistics Anything by Jed Frees Practice, practice, practice Start with basics (chi-square, confidence limits, hypothesis testing) Move to regression analysis use adjusted PMPMs etc to get yourself started Make sure you can analyze and explain results Move to advanced analytics GLM for health care costs Trend methods Evaluation (probit, propensity analysis,etc) Disease progression (survival models) Slide 57
59 Q&A and Wrap-Up
Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.
Math 224 Q Exam 3A Fall 217 Tues Dec 12 Version A Problem 1. Let X be the continuous random variable defined by the following pdf: { 1 x/2 when x 2, f(x) otherwise. (a) Compute the mean µ E[X]. E[X] x
More informationWeek 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals
Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg :
More informationINSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics
INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS 20 th May 2013 Subject CT3 Probability & Mathematical Statistics Time allowed: Three Hours (10.00 13.00) Total Marks: 100 INSTRUCTIONS TO THE CANDIDATES 1.
More informationKey Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions
SGSB Workshop: Using Statistical Data to Make Decisions Module 2: The Logic of Statistical Inference Dr. Tom Ilvento January 2006 Dr. Mugdim Pašić Key Objectives Understand the logic of statistical inference
More information7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4
7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4 - Would the correlation between x and y in the table above be positive or negative? The correlation is negative. -
More informationHomework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a
Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a Announcements: There are some office hour changes for Nov 5, 8, 9 on website Week 5 quiz begins after class today and ends at
More informationBusiness Statistics 41000: Probability 3
Business Statistics 41000: Probability 3 Drew D. Creal University of Chicago, Booth School of Business February 7 and 8, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office: 404
More informationProbability is the tool used for anticipating what the distribution of data should look like under a given model.
AP Statistics NAME: Exam Review: Strand 3: Anticipating Patterns Date: Block: III. Anticipating Patterns: Exploring random phenomena using probability and simulation (20%-30%) Probability is the tool used
More informationThe topics in this section are related and necessary topics for both course objectives.
2.5 Probability Distributions The topics in this section are related and necessary topics for both course objectives. A probability distribution indicates how the probabilities are distributed for outcomes
More informationStatistics 13 Elementary Statistics
Statistics 13 Elementary Statistics Summer Session I 2012 Lecture Notes 5: Estimation with Confidence intervals 1 Our goal is to estimate the value of an unknown population parameter, such as a population
More informationDiploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers
Cumulative frequency Diploma in Business Administration Part Quantitative Methods Examiner s Suggested Answers Question 1 Cumulative Frequency Curve 1 9 8 7 6 5 4 3 1 5 1 15 5 3 35 4 45 Weeks 1 (b) x f
More informationLecture 9. Probability Distributions. Outline. Outline
Outline Lecture 9 Probability Distributions 6-1 Introduction 6- Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7- Properties of the Normal Distribution
More informationHomework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82
Announcements: Week 5 quiz begins at 4pm today and ends at 3pm on Wed If you take more than 20 minutes to complete your quiz, you will only receive partial credit. (It doesn t cut you off.) Today: Sections
More informationStat 213: Intro to Statistics 9 Central Limit Theorem
1 Stat 213: Intro to Statistics 9 Central Limit Theorem H. Kim Fall 2007 2 unknown parameters Example: A pollster is sure that the responses to his agree/disagree questions will follow a binomial distribution,
More informationLecture 9. Probability Distributions
Lecture 9 Probability Distributions Outline 6-1 Introduction 6-2 Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7-2 Properties of the Normal Distribution
More informationPreviously, when making inferences about the population mean, μ, we were assuming the following simple conditions:
Chapter 17 Inference about a Population Mean Conditions for inference Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions: (1) Our data (observations)
More informationIntro to GLM Day 2: GLM and Maximum Likelihood
Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the
More informationStatistics & Statistical Tests: Assumptions & Conclusions
Degrees of Freedom Statistics & Statistical Tests: Assumptions & Conclusions Kinds of degrees of freedom Kinds of Distributions Kinds of Statistics & assumptions required to perform each Normal Distributions
More informationINSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION
INSTITUTE AND FACULTY OF ACTUARIES Curriculum 2019 SPECIMEN EXAMINATION Subject CS1A Actuarial Statistics Time allowed: Three hours and fifteen minutes INSTRUCTIONS TO THE CANDIDATE 1. Enter all the candidate
More informationStatistics for Business and Economics
Statistics for Business and Economics Chapter 5 Continuous Random Variables and Probability Distributions Ch. 5-1 Probability Distributions Probability Distributions Ch. 4 Discrete Continuous Ch. 5 Probability
More informationUnit 5: Sampling Distributions of Statistics
Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate
More informationUnit 5: Sampling Distributions of Statistics
Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate
More informationHomework Assignment Section 3
Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 6 Normal Probability Distributions 6-1 Overview 6-2 The Standard Normal Distribution
More information10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1
PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Pivotal subject: distributions of statistics. Foundation linchpin important crucial You need sampling distributions to make inferences:
More informationIntroduction to Statistics I
Introduction to Statistics I Keio University, Faculty of Economics Continuous random variables Simon Clinet (Keio University) Intro to Stats November 1, 2018 1 / 18 Definition (Continuous random variable)
More informationME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.
ME3620 Theory of Engineering Experimentation Chapter III. Random Variables and Probability Distributions Chapter III 1 3.2 Random Variables In an experiment, a measurement is usually denoted by a variable
More information. (i) What is the probability that X is at most 8.75? =.875
Worksheet 1 Prep-Work (Distributions) 1)Let X be the random variable whose c.d.f. is given below. F X 0 0.3 ( x) 0.5 0.8 1.0 if if if if if x 5 5 x 10 10 x 15 15 x 0 0 x Compute the mean, X. (Hint: First
More informationLaw of Large Numbers, Central Limit Theorem
November 14, 2017 November 15 18 Ribet in Providence on AMS business. No SLC office hour tomorrow. Thursday s class conducted by Teddy Zhu. November 21 Class on hypothesis testing and p-values December
More informationStatistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron
Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to
More informationChapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters
Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters VOCABULARY: Point Estimate a value for a parameter. The most point estimate
More informationECON 214 Elements of Statistics for Economists 2016/2017
ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and
More informationSampling and sampling distribution
Sampling and sampling distribution September 12, 2017 STAT 101 Class 5 Slide 1 Outline of Topics 1 Sampling 2 Sampling distribution of a mean 3 Sampling distribution of a proportion STAT 101 Class 5 Slide
More informationMATH 10 INTRODUCTORY STATISTICS
MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Midterm Exam ٩(^ᴗ^)۶ In class, next week, Thursday, 26 April. 1 hour, 45 minutes. 5 questions of varying lengths.
More informationWhere s the Beef Does the Mack Method produce an undernourished range of possible outcomes?
Where s the Beef Does the Mack Method produce an undernourished range of possible outcomes? Daniel Murphy, FCAS, MAAA Trinostics LLC CLRS 2009 In the GIRO Working Party s simulation analysis, actual unpaid
More information. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:
Statistics Sample Exam 3 Solution Chapters 6 & 7: Normal Probability Distributions & Estimates 1. What percent of normally distributed data value lie within 2 standard deviations to either side of the
More informationσ e, which will be large when prediction errors are Linear regression model
Linear regression model we assume that two quantitative variables, x and y, are linearly related; that is, the population of (x, y) pairs are related by an ideal population regression line y = α + βx +
More informationSampling Distributions For Counts and Proportions
Sampling Distributions For Counts and Proportions IPS Chapter 5.1 2009 W. H. Freeman and Company Objectives (IPS Chapter 5.1) Sampling distributions for counts and proportions Binomial distributions for
More informationSTA 4504/5503 Sample questions for exam True-False questions.
STA 4504/5503 Sample questions for exam 2 1. True-False questions. (a) For General Social Survey data on Y = political ideology (categories liberal, moderate, conservative), X 1 = gender (1 = female, 0
More informationThe normal distribution is a theoretical model derived mathematically and not empirically.
Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.
More informationStatistics for Business and Economics: Random Variables:Continuous
Statistics for Business and Economics: Random Variables:Continuous STT 315: Section 107 Acknowledgement: I d like to thank Dr. Ashoke Sinha for allowing me to use and edit the slides. Murray Bourne (interactive
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis These models are appropriate when the response
More informationSTATISTICAL DISTRIBUTIONS AND THE CALCULATOR
STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either
More informationStat 328, Summer 2005
Stat 328, Summer 2005 Exam #2, 6/18/05 Name (print) UnivID I have neither given nor received any unauthorized aid in completing this exam. Signed Answer each question completely showing your work where
More information1/2 2. Mean & variance. Mean & standard deviation
Question # 1 of 10 ( Start time: 09:46:03 PM ) Total Marks: 1 The probability distribution of X is given below. x: 0 1 2 3 4 p(x): 0.73? 0.06 0.04 0.01 What is the value of missing probability? 0.54 0.16
More informationSection Sampling Distributions for Counts and Proportions
Section 5.1 - Sampling Distributions for Counts and Proportions Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Distributions When dealing with inference procedures, there are two different
More informationNormal Probability Distributions
Normal Probability Distributions Properties of Normal Distributions The most important probability distribution in statistics is the normal distribution. Normal curve A normal distribution is a continuous
More informationLinear Regression with One Regressor
Linear Regression with One Regressor Michael Ash Lecture 9 Linear Regression with One Regressor Review of Last Time 1. The Linear Regression Model The relationship between independent X and dependent Y
More informationLecture 2. Probability Distributions Theophanis Tsandilas
Lecture 2 Probability Distributions Theophanis Tsandilas Comment on measures of dispersion Why do common measures of dispersion (variance and standard deviation) use sums of squares: nx (x i ˆµ) 2 i=1
More informationEXCEL STATISTICAL Functions. Presented by Wayne Wilmeth
EXCEL STATISTICAL Functions Presented by Wayne Wilmeth Exponents 2 3 Exponents 2 3 2*2*2 = 8 Exponents Exponents Exponents Exponent Examples Roots? *? = 81? *? *? = 27 Roots =Sqrt(81) 9 Roots 27 1/3 27^(1/3)
More informationMeasure of Variation
Measure of Variation Variation is the spread of a data set. The simplest measure is the range. Range the difference between the maximum and minimum data entries in the set. To find the range, the data
More informationDetermining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2
Determining Sample Size Slide 1 E = z α / 2 ˆ ˆ p q n (solve for n by algebra) n = ( zα α / 2) 2 p ˆ qˆ E 2 Sample Size for Estimating Proportion p When an estimate of ˆp is known: Slide 2 n = ˆ ˆ ( )
More informationLESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY
LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY 1 THIS WEEK S PLAN Part I: Theory + Practice ( Interval Estimation ) Part II: Theory + Practice ( Interval Estimation ) z-based Confidence Intervals for a Population
More information**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:
**BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,
More informationChapter 6: Random Variables and Probability Distributions
Chapter 6: Random Variables and Distributions These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Pec, published by CENGAGE Learning, 2015. Random variables
More informationBasic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract
Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, 2013 Abstract Introduct the normal distribution. Introduce basic notions of uncertainty, probability, events,
More informationIOP 201-Q (Industrial Psychological Research) Tutorial 5
IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,
More information6. THE BINOMIAL DISTRIBUTION
6. THE BINOMIAL DISTRIBUTION Eg: For 1000 borrowers in the lowest risk category (FICO score between 800 and 850), what is the probability that at least 250 of them will default on their loan (thereby rendering
More informationDATA SUMMARIZATION AND VISUALIZATION
APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296
More informationSection 0: Introduction and Review of Basic Concepts
Section 0: Introduction and Review of Basic Concepts Carlos M. Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1 Getting Started Syllabus
More informationInterval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems
Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1 Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide
More informationCHAPTER 8. Confidence Interval Estimation Point and Interval Estimates
CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates A point estimate is a single number, a confidence interval provides additional information about the variability of the estimate Lower
More informationThe Normal Probability Distribution
1 The Normal Probability Distribution Key Definitions Probability Density Function: An equation used to compute probabilities for continuous random variables where the output value is greater than zero
More informationMidTerm 1) Find the following (round off to one decimal place):
MidTerm 1) 68 49 21 55 57 61 70 42 59 50 66 99 Find the following (round off to one decimal place): Mean = 58:083, round off to 58.1 Median = 58 Range = max min = 99 21 = 78 St. Deviation = s = 8:535,
More informationEconometric Methods for Valuation Analysis
Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 25 Outline We will consider econometric
More informationMAKING SENSE OF DATA Essentials series
MAKING SENSE OF DATA Essentials series THE NORMAL DISTRIBUTION Copyright by City of Bradford MDC Prerequisites Descriptive statistics Charts and graphs The normal distribution Surveys and sampling Correlation
More informationExam 2 Spring 2015 Statistics for Applications 4/9/2015
18.443 Exam 2 Spring 2015 Statistics for Applications 4/9/2015 1. True or False (and state why). (a). The significance level of a statistical test is not equal to the probability that the null hypothesis
More informationNORMAL RANDOM VARIABLES (Normal or gaussian distribution)
NORMAL RANDOM VARIABLES (Normal or gaussian distribution) Many variables, as pregnancy lengths, foot sizes etc.. exhibit a normal distribution. The shape of the distribution is a symmetric bell shape.
More informationSTAT Chapter 6: Sampling Distributions
STAT 515 -- Chapter 6: Sampling Distributions Definition: Parameter = a number that characterizes a population (example: population mean ) it s typically unknown. Statistic = a number that characterizes
More informationData that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.
Chapter 8 Measures of Center Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Data that can only be integer
More informationPresented at the 2003 SCEA-ISPA Joint Annual Conference and Training Workshop -
Predicting Final CPI Estimating the EAC based on current performance has traditionally been a point estimate or, at best, a range based on different EAC calculations (CPI, SPI, CPI*SPI, etc.). NAVAIR is
More informationAP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE
AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,
More informationContents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)
Contents (ix) Contents Preface... (vii) CHAPTER 1 An Overview of Statistical Applications 1.1 Introduction... 1 1. Probability Functions and Statistics... 1..1 Discrete versus Continuous Functions... 1..
More informationStatistics for Business and Economics
Statistics for Business and Economics Chapter 7 Estimation: Single Population Copyright 010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-1 Confidence Intervals Contents of this chapter: Confidence
More informationTests for One Variance
Chapter 65 Introduction Occasionally, researchers are interested in the estimation of the variance (or standard deviation) rather than the mean. This module calculates the sample size and performs power
More informationSTA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER
STA2601/105/2/2018 Tutorial letter 105/2/2018 Applied Statistics II STA2601 Semester 2 Department of Statistics TRIAL EXAMINATION PAPER Define tomorrow. university of south africa Dear Student Congratulations
More informationOn one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2
Continuous Random Variable If I spin a spinner, what is the probability the pointer lands... On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2 )? 360 = 1 180.
More informationSimulation Wrap-up, Statistics COS 323
Simulation Wrap-up, Statistics COS 323 Today Simulation Re-cap Statistics Variance and confidence intervals for simulations Simulation wrap-up FYI: No class or office hours Thursday Simulation wrap-up
More informationChapter 7. Inferences about Population Variances
Chapter 7. Inferences about Population Variances Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from
More information1 Inferential Statistic
1 Inferential Statistic Population versus Sample, parameter versus statistic A population is the set of all individuals the researcher intends to learn about. A sample is a subset of the population and
More informationChapter 3 - Lecture 5 The Binomial Probability Distribution
Chapter 3 - Lecture 5 The Binomial Probability October 12th, 2009 Experiment Examples Moments and moment generating function of a Binomial Random Variable Outline Experiment Examples A binomial experiment
More informationCH 5 Normal Probability Distributions Properties of the Normal Distribution
Properties of the Normal Distribution Example A friend that is always late. Let X represent the amount of minutes that pass from the moment you are suppose to meet your friend until the moment your friend
More informationStatistical Intervals (One sample) (Chs )
7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and
More informationStatistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage
7 Statistical Intervals Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to
More informationChapter 5 Basic Probability
Chapter 5 Basic Probability Probability is determining the probability that a particular event will occur. Probability of occurrence = / T where = the number of ways in which a particular event occurs
More informationChapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables
Chapter 5 Continuous Random Variables and Probability Distributions 5.1 Continuous Random Variables 1 2CHAPTER 5. CONTINUOUS RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS Probability Distributions Probability
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #6 EPSY 905: Maximum Likelihood In This Lecture The basics of maximum likelihood estimation Ø The engine that
More informationCase Study: Heavy-Tailed Distribution and Reinsurance Rate-making
Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making May 30, 2016 The purpose of this case study is to give a brief introduction to a heavy-tailed distribution and its distinct behaviors in
More informationWeek 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.
Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.
More informationFinal Exam - section 1. Thursday, December hours, 30 minutes
Econometrics, ECON312 San Francisco State University Michael Bar Fall 2013 Final Exam - section 1 Thursday, December 19 1 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.
More informationDiploma Part 2. Quantitative Methods. Examiner s Suggested Answers
Diploma Part 2 Quantitative Methods Examiner s Suggested Answers Question 1 (a) The binomial distribution may be used in an experiment in which there are only two defined outcomes in any particular trial
More informationChapter 7 Sampling Distributions and Point Estimation of Parameters
Chapter 7 Sampling Distributions and Point Estimation of Parameters Part 1: Sampling Distributions, the Central Limit Theorem, Point Estimation & Estimators Sections 7-1 to 7-2 1 / 25 Statistical Inferences
More informationMATH 143: Introduction to Probability and Statistics Worksheet for Tues., Dec. 7: What procedure?
MATH 143: Introduction to Probability and Statistics Worksheet for Tues., Dec. 7: What procedure? For each numbered problem, identify (if possible) the following: (a) the variable(s) and variable type(s)
More informationStat 139 Homework 2 Solutions, Fall 2016
Stat 139 Homework 2 Solutions, Fall 2016 Problem 1. The sum of squares of a sample of data is minimized when the sample mean, X = Xi /n, is used as the basis of the calculation. Define g(c) as a function
More informationSTA 103: Final Exam. Print clearly on this exam. Only correct solutions that can be read will be given credit.
STA 103: Final Exam June 26, 2008 Name: } {{ } by writing my name i swear by the honor code Read all of the following information before starting the exam: Print clearly on this exam. Only correct solutions
More informationMixed models in R using the lme4 package Part 3: Inference based on profiled deviance
Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance Douglas Bates Department of Statistics University of Wisconsin - Madison Madison January 11, 2011
More informationGETTING STARTED. To OPEN MINITAB: Click Start>Programs>Minitab14>Minitab14 or Click Minitab 14 on your Desktop
Minitab 14 1 GETTING STARTED To OPEN MINITAB: Click Start>Programs>Minitab14>Minitab14 or Click Minitab 14 on your Desktop The Minitab session will come up like this 2 To SAVE FILE 1. Click File>Save Project
More informationCHAPTER 6 DATA ANALYSIS AND INTERPRETATION
208 CHAPTER 6 DATA ANALYSIS AND INTERPRETATION Sr. No. Content Page No. 6.1 Introduction 212 6.2 Reliability and Normality of Data 212 6.3 Descriptive Analysis 213 6.4 Cross Tabulation 218 6.5 Chi Square
More informationCS134: Networks Spring Random Variables and Independence. 1.2 Probability Distribution Function (PDF) Number of heads Probability 2 0.
CS134: Networks Spring 2017 Prof. Yaron Singer Section 0 1 Probability 1.1 Random Variables and Independence A real-valued random variable is a variable that can take each of a set of possible values in
More informationChapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS
Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS Part 1: Introduction Sampling Distributions & the Central Limit Theorem Point Estimation & Estimators Sections 7-1 to 7-2 Sample data
More information