BIOSTATS 540 Fall Estimation Page 1 of 72. Unit 6. Estimation. Use at least twelve observations in constructing a confidence interval

Similar documents
Statistics for Economics & Business

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

Estimating Proportions with Confidence

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

Lecture 4: Probability (continued)

Topic-7. Large Sample Estimation

Sampling Distributions and Estimation

Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

. (The calculated sample mean is symbolized by x.)

Basic formula for confidence intervals. Formulas for estimating population variance Normal Uniform Proportion

Introduction to Probability and Statistics Chapter 7

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Chapter 8: Estimation of Mean & Proportion. Introduction

CHAPTER 8 Estimating with Confidence

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

5. Best Unbiased Estimators

BASIC STATISTICS ECOE 1323

Confidence Intervals Introduction

14.30 Introduction to Statistical Methods in Economics Spring 2009

Lecture 5 Point Es/mator and Sampling Distribu/on

Lecture 5: Sampling Distribution

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

ii. Interval estimation:

A point estimate is the value of a statistic that estimates the value of a parameter.

Sampling Distributions and Estimation

B = A x z

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

Chapter 8 Interval Estimation. Estimation Concepts. General Form of a Confidence Interval

Math 124: Lecture for Week 10 of 17

1. Suppose X is a variable that follows the normal distribution with known standard deviation σ = 0.3 but unknown mean µ.

1 Random Variables and Key Statistics

Parametric Density Estimation: Maximum Likelihood Estimation


5 Statistical Inference

Chapter 10 - Lecture 2 The independent two sample t-test and. confidence interval

ST 305: Exam 2 Fall 2014

NOTES ON ESTIMATION AND CONFIDENCE INTERVALS. 1. Estimation

Statistics for Business and Economics

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

1 Estimating the uncertainty attached to a sample mean: s 2 vs.

Sampling Distributions & Estimators

18.S096 Problem Set 5 Fall 2013 Volatility Modeling Due Date: 10/29/2013

4.5 Generalized likelihood ratio test

Exam 2. Instructor: Cynthia Rudin TA: Dimitrios Bisias. October 25, 2011

AY Term 2 Mock Examination

These characteristics are expressed in terms of statistical properties which are estimated from the sample data.

0.1 Valuation Formula:

CHAPTER 8 CONFIDENCE INTERVALS

The Idea of a Confidence Interval

CAPITAL PROJECT SCREENING AND SELECTION

Maximum Empirical Likelihood Estimation (MELE)

Unbiased estimators Estimators

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

Monetary Economics: Problem Set #5 Solutions

Models of Asset Pricing

Exam 1 Spring 2015 Statistics for Applications 3/5/2015

Models of Asset Pricing

Models of Asset Pricing

1 Estimating sensitivities

FOUNDATION ACTED COURSE (FAC)

SCHOOL OF ACCOUNTING AND BUSINESS BSc. (APPLIED ACCOUNTING) GENERAL / SPECIAL DEGREE PROGRAMME

Limits of sequences. Contents 1. Introduction 2 2. Some notation for sequences The behaviour of infinite sequences 3

Institute of Actuaries of India Subject CT5 General Insurance, Life and Health Contingencies


Research Article The Probability That a Measurement Falls within a Range of n Standard Deviations from an Estimate of the Mean

The material in this chapter is motivated by Experiment 9.

SOLUTION QUANTITATIVE TOOLS IN BUSINESS NOV 2011

Chpt 5. Discrete Probability Distributions. 5-3 Mean, Variance, Standard Deviation, and Expectation

CAPITAL ASSET PRICING MODEL

Subject CT5 Contingencies Core Technical. Syllabus. for the 2011 Examinations. The Faculty of Actuaries and Institute of Actuaries.

Appendix 1 to Chapter 5

1. Find the area under the standard normal curve between z = 0 and z = 3. (a) (b) (c) (d)

of Asset Pricing R e = expected return

Calculation of the Annual Equivalent Rate (AER)

Bayes Estimator for Coefficient of Variation and Inverse Coefficient of Variation for the Normal Distribution

Subject CT1 Financial Mathematics Core Technical Syllabus

x satisfying all regularity conditions. Then

Non-Inferiority Logrank Tests

A Bayesian perspective on estimating mean, variance, and standard-deviation from data

Control Charts for Mean under Shrinkage Technique

of Asset Pricing APPENDIX 1 TO CHAPTER EXPECTED RETURN APPLICATION Expected Return

An Empirical Study of the Behaviour of the Sample Kurtosis in Samples from Symmetric Stable Distributions

When you click on Unit V in your course, you will see a TO DO LIST to assist you in starting your course.

Section 3.3 Exercises Part A Simplify the following. 1. (3m 2 ) 5 2. x 7 x 11

Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

STRAND: FINANCE. Unit 3 Loans and Mortgages TEXT. Contents. Section. 3.1 Annual Percentage Rate (APR) 3.2 APR for Repayment of Loans

Notes on Expected Revenue from Auctions

Using Math to Understand Our World Project 5 Building Up Savings And Debt

Department of Mathematics, S.R.K.R. Engineering College, Bhimavaram, A.P., India 2

Topic 14: Maximum Likelihood Estimation

FINM6900 Finance Theory How Is Asymmetric Information Reflected in Asset Prices?

Outline. Populations. Defs: A (finite) population is a (finite) set P of elements e. A variable is a function v : P IR. Population and Characteristics

Supersedes: 1.3 This procedure assumes that the minimal conditions for applying ISO 3301:1975 have been met, but additional criteria can be used.

Anomaly Correction by Optimal Trading Frequency

Systematic and Complex Sampling!

ECON 5350 Class Notes Maximum Likelihood Estimation

Transcription:

BIOSTATS 540 Fall 015 6. Estimatio Page 1 of 7 Uit 6. Estimatio Use at least twelve observatios i costructig a cofidece iterval - Gerald va Belle What is the mea of the blood pressures of all the studets at the Amherst Regioal High School? It s too much work to measure the blood pressure of every idividual studet i this populatio. So we will make a guess based o the blood pressures of a sample of studets. This is estimatio. Estimatio ivolves usig a statistic calculated for our sample as our estimate of the populatio quatity of iterest. I this example, where the populatio mea is of iterest, possible choices might be (1) the sample mea blood pressure; () the sample media blood pressure; (3) the average of the smallest ad largest blood pressure; ad so o. The poit is, there is o oe choice for estimatio. What we mea by a good choice of estimator is oe focus of this uit. The other focus of this uit is cofidece iterval estimatio. A cofidece iterval is a sigle estimate together with a safety et, which ca also be thought of as a margi of error

BIOSTATS 540 Fall 015 6. Estimatio Page of 7 Table of Cotets Topic 1. Uit Roadmap.. Learig Objectives 3. Itroductio........ a. Goals of Estimatio. b. Notatio ad Defiitios. c. How to Iterpret a Cofidece Iterval 4. Prelimiaries: Some Useful Probability Distributios. a. Itroductio to the Studet t- Distributio. b. Itroductio to the Chi Square Distributio.. c. Itroductio to the F-Distributio. d. Sums ad Differeces of Idepedet Normal Radom Vars.. 5. Normal Distributio: Oe Group a. Cofidece Iterval for µ, σ Kow.... b. Cofidece Iterval for µ, σ Ukow c. Cofidece Iterval for σ. 6. Normal Distributio: Paired.. a. Cofidece Iterval for µ DIFFERENCE... b. Cofidece Iterval for σ DIFFERENCE. 7. Normal Distributio: Two Idepedet Groups:. a. Cofidece Iterval for [µ 1 - µ ] b. Cofidece Iterval for σ σ... 1 8. Biomial Distributio: Oe Group.. a. Cofidece Iterval for π... 9. Biomial Distributio: Two Idepedet Groups.. a. Cofidece Iterval for [ π1 π ]... Appedices i. Derivatio of Cofidece Iterval for µ Sigle Normal with σ Kow.. ii. Derivatio of Cofidece Iterval for σ Sigle Normal. iii. SE of a Biomial Proportio.... 3 4 5 8 10 13 0 0 4 8 31 33 33 38 41 44 45 48 49 49 57 60 60 64 64 67 70 7

BIOSTATS 540 Fall 015 6. Estimatio Page 3 of 7 1. Uit Roadmap Uit 6. Estimatio / Populatios Relatioships Recall that umbers that are calculated from the etirety of a populatio are called populatio parameters. They are represeted by Greek letters such as µ (populatio mea) ad σ (populatio variace). Ofte, it is ot feasible to calculate the value of a populatio parameter. Numbers that we calculate from a sample are called statistics. They are represeted by Roma letters such as X (sample mea) ad S (sample variace). I our itroductio to samplig distributios, we leared that a sample statistic such as X is a radom variable i its ow right. This ca be uderstood by imagiig that there are ifiitely may replicatios of our study so that there are ifiitely may X. Puttig this together, if a sample statistic such as X is to be used as a estimate of a populatio parameter such as µ, the icorporatio of a measure of its variability (i this case the stadard error of X ) allows us to costruct a margi of error about X as our estimate of µ. The result is a cofidece iterval estimate.

BIOSTATS 540 Fall 015 6. Estimatio Page 4 of 7. Learig Objectives Whe you have fiished this uit, you should be able to: Explai that there is more tha oe way to estimate a populatio parameter based o data i a sample. Explai the criteria of ubiased ad miimum variace i the selectio of a good estimator. Defie the Studet t, chi square, ad F probability distributio models. Explai that the sum ad differece of idepedet radom variables that are distributed ormal are also distributed Normal. Iterpret a cofidece iterval. Calculate poit ad cofidece iterval estimates of the mea ad variace of a sigle Normal distributio. Calculate poit ad cofidece iterval estimates of the mea ad variace of a sigle Normal distributio i the paired data settig. Calculate poit ad cofidece iterval estimates of the differece betwee the meas of two idepedet Normal distributios. Calculate poit ad cofidece iterval estimates of the ratio of the variaces of two idepedet Normal distributios.. Calculate poit ad cofidece iterval estimates of the π (evet probability) parameter of a sigle biomial distributio. Calculate poit ad cofidece iterval estimates of the differece betwee the π (evet probability) parameters of two idepedet biomial distributios.

BIOSTATS 540 Fall 015 6. Estimatio Page 5 of 7 3. Itroductio Recall Biostatistics is the applicatio of probability models ad associated tools to observed pheomea for the purposes of learig about a populatio ad gaugig the relative plausibility of alterative explaatios. Descriptio - Iformatio i a sample is used to summarize the sample itself. It is also used to make guesses of the characteristics of the source populatio. Hypothesis Testig Iformatio i a sample is used to help us compare differet explaatios for what we have observed. Uit 6 is about usig iformatio i a sample to make estimates of the characteristics (parameters) of the populatio that gave rise to the sample. Recall the distictio betwee statistics ad parameters: statistics are estimators mea X µ variace S σ of populatio parameters

BIOSTATS 540 Fall 015 6. Estimatio Page 6 of 7 What does it mea to say we kow X from a sample but we do t kow the populatio mea µ? Suppose we have a simple radom sample of observatios X 1 X from some populatio. We have calculated the sample average X. What populatio gave rise to our sample? I theory, there are ifiitely may possible populatios. For simplicity here, suppose there are just 3 possibilities, schematically show below: µ I µ II µ III X Suppose this is the locatio of our X.

BIOSTATS 540 Fall 015 6. Estimatio Page 7 of 7 Okay, sorry. Here, I m imagiig four possibilities istead of three. Look at this page from the bottom up. Aroud our X, I ve costructed a cofidece iterval. Notice the dashed lies extedig upwards ito the 4 ormal distributios. µ I ad µ IV are outside the iterval aroud X. µ II ad µ III are iside the iterval. I. µ I II. µ II III. µ III IV. µ IV X Cofidece Iterval

BIOSTATS 540 Fall 015 6. Estimatio Page 8 of 7 We are cofidet that µ could be either µ II or µ III. 3a. Goals of Estimatio Whether a estimator is good or ot good depeds o what criteria we use to defie good. There are potetially lots of criteria. Here, we ll use oe set of two criteria: ubiased ad miimum variace. Covetioal Criteria for a Good Estimator 1. I the log ru, correct (ubiased). I the short ru, i error by as little as possible (miimum variace) 1. Ubiased - I the Log Ru Correct - Tip: Recall the itroductio to statistical expectatio ad the meaig of ubiased (See Uit 4 Beroulli & Biomial pp 7-10. I the log ru correct. Imagie replicatig the study over ad over agai, ifiitely may times. Each time, calculate your statistic of iterest so as to produce the samplig distributio of that statistic of iterest. Now calculate the mea of the samplig distributio for your statistic of iterest. Is it the same as the populatio parameter value that you are tryig to estimate? If so, the that statistic is a ubiased estimate of the populatio parameter that is beig estimated. Example Uder ormality ad simple radom samplig, S as a ubiased estimate of σ. I the log ru correct meas that the statistical expectatio of S, computed over the samplig distributio of S, is equal to its target σ. S i = σ # samples i samplig dist all possiblesamples"i" Recall that we use the otatio E [ ] to refer to statistical expectatio. Here it is E [ S ] = σ.

BIOSTATS 540 Fall 015 6. Estimatio Page 9 of 7. Miimum Variace I Error by as Little as Possible I error by as little as possible. We would like that our estimates ot vary wildly from sample to sample; i fact, we d like these to vary as little as possible. This is the idea of precisio. Whe the estimates vary by as little as possible, we have miimum variace. Puttig together the two criteria ( log ru correct ad i error by as little as possible ) Suppose we wat to idetify the miimum variace ubiased estimator of µ i the settig of a simple radom sample from a ormal distributio. ~ Cadidate estimators might iclude the sample mea X or the sample media X as estimators of the populatio mea µ. Which would be a better choice accordig to the criteria i the log ru correct ad i the short ru i error by as little as possible? Step 1 First, idetify the ubiased estimators Step From amog the pool of ubiased estimators, choose the oe with miimum variace. Illustratio for data from a ormal distributio 1. The ubiased estimators are the sample mea X ~ ad media X. variace [ X ~ ] < variace [ X ] Choose the sample mea X. It is the miimum variace ubiased estimator. For a radom sample of data from a ormal probability distributio, X is the miimum variace ubiased estimator of the populatio mea µ. Take home message: Here, we will be usig the criteria of miimum variace ubiased. However, other criteria are possible.

BIOSTATS 540 Fall 015 6. Estimatio Page 10 of 7 3b. Notatio ad Defiitios Estimatio, Estimator, Estimate - Estimatio is the computatio of a statistic from sample data, ofte yieldig a value that is a approximatio (guess) of its target, a ukow true populatio parameter value. The statistic itself is called a estimator ad ca be of two types - poit or iterval. The value or values that the estimator assumes are called estimates. Poit versus Iterval Estimators - A estimator that represets a "sigle best guess" is called a poit estimator. Whe the estimate is of the form of a "rage of plausible values", it is called a iterval estimator. Thus, A poit estimate is of the form: [ Value ], A iterval estimate is of the form: [ lower limit, upper limit ] Example - The sample mea X, calculated usig data i a sample of size, is a poit estimator of the populatio mea µ. If X mea µ. = 10, the value 10 is called a poit estimate of the populatio

BIOSTATS 540 Fall 015 6. Estimatio Page 11 of 7 Samplig Distributio Recall the idea of a samplig distributio. It is a theoretically obtaied etity obtaied by imagiig that we repeat, over ad over ifiitely may times, the drawig of a simple radom sample ad the calculatio of somethig from that sample, such as the sample mea X based o a sample size draw of size equal to. The resultig collectio of all possible sample meas is what we call the samplig distributio of X. Recall. The samplig distributio of X plays a fudametal role i the cetral limit theorem. Ubiased Estimator A statistic is said to be a ubiased estimator of the correspodig populatio parameter if its mea or expected value, take over its samplig distributio, is equal to the populatio parameter value. Ituitively, this is sayig that the "log ru" average of the statistic, take over all the possibilities i the samplig distributio, has value equal to the value of its target populatio parameter.

BIOSTATS 540 Fall 015 6. Estimatio Page 1 of 7 Cofidece Iterval, Cofidece Coefficiet A cofidece iterval is a particular type of iterval estimator. Iterval estimates defied as cofidece itervals provide ot oly several poit estimates, but also a feelig for the precisio of the estimates. This is because they are costructed usig two igrediets: 1) a poit estimate, ad ) the stadard error of the poit estimate. May Cofidece Iterval Estimators are of a Specific Form: lower limit = (poit estimate) - (cofidece coefficiet multiplier)(stadard error) upper limit = (poit estimate) + (cofidece coefficiet multiplier)(stadard error) The "multiple" i these expressios is related to the precisio of the iterval estimate; the multiple has a special ame - cofidece coefficiet. A wide iterval suggests imprecisio of estimatio. Narrow cofidece iterval widths reflects large sample size or low variability or both. Exceptios to this geeric structure of a cofidece iterval are those for a variace parameter ad those for a ratio of variace parameters Take care whe computig ad iterpretig a cofidece iterval!! A commo mistake is to calculate a cofidece iterval but the use it icorrectly by focusig oly o its midpoit.

BIOSTATS 540 Fall 015 6. Estimatio Page 13 of 7 3c. How to Iterpret a Cofidece Iterval A cofidece iterval is a safety et. Tip: I this sectio, the focus is o the idea of a cofidece iterval. For ow, do t worry about the details. Example Suppose we wat to estimate the average icome from wages for a populatio of 5000 workers, X The average icome that we wat to estimate is the populatio mea µ. µ = 5000 X i i=1 5000,..., X 1 5000 For purposes of this illustratio, suppose we actually kow the populatio σ = $1,573. I real life, we would t have such luxury! Suppose the ukow µ = $19,987. Note I m oly tellig you this so that we ca see how well this illustratio performs!. We ll costruct two cofidece iterval estimates of µ to illustrate the importace of sample size i cofidece iterval estimatio: (1) from a sample size of =10, versus () from a sample size of =100

BIOSTATS 540 Fall 015 6. Estimatio Page 14 of 7 (1) Carol uses a sample size =10 Carol s data are X 1,, X 10 X =10 = 19,887 σ = 1,573 SE X =10 = σ 10 = 3,976 () Ed uses a sample size =100 Ed s data are X 1,, X 100 X =100 = 19,813 σ = 1,573 SE X =100 = σ 100 = 1,57

BIOSTATS 540 Fall 015 6. Estimatio Page 15 of 7 Compare the two SE, oe based o =10 ad the other based o =100 The variability of a average of 100 is less tha the variability of a average of 10. It seems reasoable that, all other thigs beig equal, we should have more cofidece (smaller safety et) i our sample mea as a guess of the populatio mea whe it is based o a larger sample size (100 versus 10).. Takig this oe step further we ought to have complete (100%) cofidece (o safety et required at all) if we iterviewed the etire populatio!. This makes sese sice we would obtai the correct aswer of $19,987 every time. Defiitio Cofidece Iterval (Iformal): A cofidece iterval is a guess (poit estimate) together with a safety et (iterval) of guesses of a populatio characteristic. I most istaces, it is easy to see the 3 compoets of a cofidece iterval: 1) A poit estimate (e.g. the sample mea X ) ) The stadard error of the poit estimate ( e.g. SEX = σ ) 3) A cofidece coefficiet (cof. coeff) I most istaces (meas, differeces of meas, regressio parameters, etc), the structure of a cofidece iterval is calculated as follows: Lower limit = (poit estimate) (cofidece coefficiet)(se) Upper limit = (poit estimate) + (cofidece coefficiet)(se) I other istaces (as you ll see i the ext pages), the structure of a cofidece iterval looks differet, as for cofidece itervals for Populatio variace Populatio stadard deviatio Ratio of two populatio variaces relative risk Odds ratio

BIOSTATS 540 Fall 015 6. Estimatio Page 16 of 7 Example: Carol samples = 10 workers. mea X = $19,887 Stadard error of sample mea, SE X = σ = $3,976 for =10 Cofidece coefficiet for 95% cofidece iterval = 1.96 Lower limit = (poit estimate) (cofidece coefficiet)(se) = $19,887 (1.96)($3976) = $1,094 Upper limit = (poit estimate) + (cofidece coefficiet)(se) = $19,887 + (1.96)($3976) = $7,680 Width = ($7,680 - $1,094) = $15,586 Example: Ed samples = 100 workers. mea X = $19,813 Stadard error of sample mea, SE X = σ = $1,57 for =100 Cofidece coefficiet for 95% cofidece iterval = 1.96 Lower limit = (poit estimate) (cofidece coefficiet)(se) = $19,813 (1.96)($157) = $17,349 Upper limit = (poit estimate) + (cofidece coefficiet)(se) =$19,813 + (1.96)($157) = $,77 Width = ($,77 - $17,349) = $4,98 Estimate 95% Cofidece Iterval Carol 10 $19,887 ($1,094, $7,680) Wide Ed 100 $19,813 ($17,349, $,77) Narrow Truth 5000 $19,987 $19,987 No safety et Defiitio 95% Cofidece Iterval If all possible radom samples (a ifiite umber) of a give sample size (e.g. 10 or 100) were obtaied ad if each were used to obtai its ow cofidece iterval, The 95% of all such cofidece itervals would cotai the ukow; the remaiig 5% would ot.

BIOSTATS 540 Fall 015 6. Estimatio Page 17 of 7 But Carol ad Ed Each Have Oly ONE Iterval: So ow what?! The defiitio above does t seem to help us. What ca we say? Carol says: With 95% cofidece, the iterval $1,094 to $7,680 cotais the ukow true mea µ. Ed says: With 95% cofidece, the iterval $17,349 to $,77 cotais the ukow true mea µ. Cautio o the use of Cofidece Itervals: 1) It is icorrect to say The probability that a give 95% cofidece iterval cotais µ is 95% A give iterval either cotais µ or it does ot. ) The cofidece coefficiet (recall this is the multiplier we attach to the SE) for a 95% cofidece iterval is the umber eeded to esure 95% coverage i the log ru (i probability).

BIOSTATS 540 Fall 015 6. Estimatio Page 18 of 7 Here is a picture of a lot of cofidece itervals, each based o a sample of size =10 Notice (1) Ay oe cofidece iterval either cotais µ or it does ot. This illustrates that it is icorrect to say There is a 95% probability that the cofidece iterval cotais µ () For a give sample size (here, =10), the width of all the cofidece itervals is the same. Here is a picture to get a feel for the ideas of cofidece iterval, safety et, ad precisio = 10 = 100 = 1000 Now you ca also see (3) As the sample size icreases, the cofidece itervals are more arrow (more precise) (4) As à ifiity, µ is i the iterval every time.

BIOSTATS 540 Fall 015 6. Estimatio Page 19 of 7 Some additioal remarks o the iterpretatio of a cofidece iterval might be helpful Each sample gives rise to its ow poit estimate ad cofidece iterval estimate built aroud the poit estimate. The idea is to costruct our itervals so that: IF all possible samples of a give sample size (a ifiite #!) were draw from the uderlyig distributio ad each sample gave rise to its ow iterval estimate, THEN 95% of all such cofidece itervals would iclude the ukow µ while 5% would ot Aother Illustratio of - It is NOT CORRECT to say: The probability that the iterval (1.3, 9.5) cotais µ is 0.95. Why? Because either µ is i (1.3, 9.5) or it is ot. For example, if µ=5.3 the µ is i (1.3, 9.5) with probability = 1. If µ=1.0 the µ is i (1.3, 9.5) with probability=0. I toss a fair coi, but do t look at the result. The probability of heads is 1/. I am 50% cofidet that the result of the toss is heads. I other words, I will guess heads with 50% cofidece. Either the coi shows heads or it shows tails. I am either right or wrog o this particular toss. I the log ru, if I were to do this, I should be right about 50% of the time hece 50% cofidece. But for this particular toss, I m either right or wrog. I most experimets or research studies we ca t look to see if we are right or wrog but we defie a cofidece iterval i a way that we kow i the log ru 95% of such itervals will get it right.

BIOSTATS 540 Fall 015 6. Estimatio Page 0 of 7 4. Prelimiaries: Some Useful Probability Distributios 4a. Itroductio to the Studet t-distributio Lookig ahead. Percetiles of the studet t-distributio are used i cofidece itervals for meas whe the populatio variace is NOT kow. There are a variety of defiitios of a studet t radom variable. A particularly useful oe for us here is the followig. It appeals to our uderstadig of the z-score. A Defiitio of a Studet s t Radom Variable Cosider a simple radom sample X usual way:... X from a Normal(µ, σ ) distributio. Calculate X ad S i the 1 X = i=1 X i ad S = i=1 ( X i X) -1 A studet s t distributed radom variable results if we costruct a t-score istead of a z-score. t - score = t = X - DF=-1 s / µ is distributed Studet s t with degrees of freedom = (-1) Note The abbreviatio df is ofte used to refer to degrees of freedom

BIOSTATS 540 Fall 015 6. Estimatio Page 1 of 7 The features of the Studet s t-distributio are similar, but ot idetical, to those of a Normal Distributio.4.3..1 0 - - 1 0 1 x Bell Shaped Symmetric about zero Flatter tha the Normal (0,1). This meas (i) The variability of a studet t variable greater tha that of a stadard ormal (0,1) (ii) Thus, there is more area uder the tails ad less at ceter (iii) Because variability is greater, resultig cofidece itervals will be wider. The relative greater variability of a Studet s t- distributio (compared to a Normal) makes sese. We have added ucertaity i our cofidece iterval because we are usig a estimate of the stadard error rather tha the actual value of the stadard error.

BIOSTATS 540 Fall 015 6. Estimatio Page of 7 Each degree of freedom (df) defies a separate studet s t-distributio. As the degrees of freedom gets larger, the studet s t-distributio looks more ad more like the stadard ormal distributio with mea=0 ad variace=1. Normal (0,1) Studet s t DF=5 Studet s t DF=5 Degrees of freedom=5 Degrees of freedom=5

BIOSTATS 540 Fall 015 6. Estimatio Page 3 of 7 How to Use the Studet t Distributio Calculator Provided by SurfStat Source: http://surfstat.au.edu.au/surfstat-home/tables/t.php From the pictures, choose betwee: left tail, right tail, betwee, or two tailed I the box d.f., eter degrees of freedom To obtai a probability, eter your t-statistics i the box t value, eter the value To obtai a percetile, eter your cumulative probability i the box probability Example Solutio for a probability: Probability [ Studet t DF=1 < 3.078 ] =.90 http://surfstat.au.edu.au/surfstat-home/tables/t.php Example Solutio for a percetile value: The 97.5 th Percetile of a Studet t DF=9 =.90 http://surfstat.au.edu.au/surfstat-home/tables/t.php

BIOSTATS 540 Fall 015 6. Estimatio Page 4 of 7 4b. Itroductio to the Chi Square Distributio Lookig ahead. Percetiles of the chi square distributio are used i cofidece itervals for a sigle populatio variace or sigle populatio stadard deviatio. Suppose we have a simple radom sample from a Normal distributio. We wat to calculate a cofidece iterval estimate of the ormal distributio variace parameter, σ. To do this, we work with a ew radom variable Y that is defied as follows: (-1)S Y=, σ I this formula, S is the sample variace. Uder simple radom samplig from a Normal(µ,σ ) (-1)S Y= is distributed Chi Square with degrees of freedom = (-1) σ Mathematical Defiitio Chi Square Distributio The above ca be stated more formally. (1) If the radom variable X follows a ormal probability distributio with mea µ ad variace σ, The the radom variable V defied: V= ( X-µ ) σ is distributed chi square distributio with degree of freedom = 1. () If each of the radom variables V 1,..., V k is distributed chi square with degree of freedom = 1, ad if these are idepedet, The their sum, defied: V 1 +... + V k is distributed chi square distributio with degrees of freedom = k

BIOSTATS 540 Fall 015 6. Estimatio Page 5 of 7 NOTE: For this course, it is ot ecessary to kow the probability desity fuctio for the chi square distributio. Features of the Chi Square Distributio: (1) Whe data are a radom sample of idepedet observatios from a ormal probability distributio ad iterest is i the behavior of the radom variable defied as the sample variace S, the assumptios of the chi square probability distributio hold. () The first mathematical defiitio of the chi square distributio says that it is defied as the square of a stadard ormal radom variable. (3) A chi square radom variable caot be egative. Because the chi square distributio is obtaied by the squarig of a radom variable, this meas that a chi square radom variable ca assume oly o-egative values. That is, the probability desity fuctio has domai [0, ) ad is ot defied for outcome values less tha zero. Thus, the chi square distributio is NOT symmetric. Here is a picture. Two Pictures of the Chi Square Distributio: Ofte, olie calculators for the chi square distributios are right tail oly. Tip! 1 = left tail area + right tail area Source: www.slideshare.et Because this distributio is NOT symmetric about 0, Remember - You will eed to solve for percetile values whe usig the Chi square distributio i cofidece itervals Source: cmaps.cmappers.et

BIOSTATS 540 Fall 015 6. Estimatio Page 6 of 7 Features of the Chi Square Distributio - cotiued: (4) The fact that the chi square distributio is NOT symmetric about zero meas that for Y=y where y>0: Pr[Y > y] is NOT EQUAL to Pr[Y < -y] However, because the total area uder a probability distributio is 1, it is still true that 1 = Pr[Y < y] + Pr[Y > y] (5) The chi square distributio is less skewed as the umber of degrees of freedom icreases. See below. Source: web.mstate.edu (6) Like the degrees of freedom for the Studet's t-distributio, the degrees of freedom associated with a chi square distributio is a idex of the extet of idepedet iformatio available for estimatig populatio parameter values. Thus, the chi square distributios with small associated degrees of freedom are relatively flat to reflect the imprecisio of estimates based o small sample sizes. Similarly, chi square distributios with relatively large degrees of freedom are more cocetrated ear their expected value.

BIOSTATS 540 Fall 015 6. Estimatio Page 7 of 7 How to Use the Chi Square Distributio Calculator Provided by SurfStat Source: http://surfstat.au.edu.au/surfstat-home/tables/chi.php From the pictures, choose betwee: left tail, right tail, betwee, or two tailed I the box d.f., eter degrees of freedom To obtai a probability, eter your t-statistics i the box t value, eter the value To obtai a percetile, eter your cumulative probability i the box probability Example Solutio for a probability: Probability [ Chi-Square t DF=4 > 6. ] =.1847 http://surfstat.au.edu.au/surfstat-home/tables/chi.php Example Solutio for a percetile value: The 97.5 th Percetile of a Chi-Square DF=9 = 19.0 http://surfstat.au.edu.au/surfstat-home/tables/chi.php

BIOSTATS 540 Fall 015 6. Estimatio Page 8 of 7 4c. Itroductio to the F Distributio Lookig ahead. Percetiles of the F distributio are used i cofidece itervals for the ratio of two idepedet variaces. Suppose we are Iterested i Comparig Two Idepedet Variaces Ulike the approach used to compare two meas i the cotiuous variable settig (where we will look at their differece), the compariso of two variaces is accomplished by lookig at their ratio. Ratio values close to oe are evidece of similarity. Of iterest will be a cofidece iterval estimate of the ratio of two variaces i the settig where data are comprised of two idepedet samples of data, each from a separate Normal distributio. Examples - I have a ew measuremet procedure. Are the results more variable tha those obtaied usig the stadard procedure? I am doig a prelimiary aalysis to determie whether or ot it is appropriate to compute a pooled variace estimate or ot, whe the goal is comparig the mea levels of two groups. Whe comparig two idepedet variaces, we will use a RATIO rather tha a differece. Specifically, we will look at the ratios of variaces of the form: s x /sy If the value of the ratio is close to 1, this suggests that the populatio variaces are similar. If the value of the ratio is very differet from 1, this suggests that the populatio variaces are ot the same. We use percetiles from the F distributio to costruct a cofidece iterval for σ σ X Y

BIOSTATS 540 Fall 015 6. Estimatio Page 9 of 7 A Defiitio of the F-Distributio Suppose X 1,..., X x are idepedet ad a simple radom sample from a ormal distributio with mea ad variace σ. Suppose further that Y X 1,..., Y y are idepedet ad a simple radom sample from a ormal distributio with mea µ ad variace σ. Y If the two sample variaces are calculated i the usual way Y µ X S The = x i=1 X ( x x) i -1 x ad S = Y i=1 Y ( y y) i -1 Y F S σ X x x 1,y-1 SY σ y = is distributed F with two degree of freedom specificatios Numerator degrees of freedom = x -1 Deomiator degrees of freedom = y -1 For the advaced reader This ca be skipped if you are usig a olie calculator There is a relatioship betwee the values of percetiles for pairs of F Distributios that is defied as follows: F d,d ; α / 1 = F d,d ;(1 α )/ 1 1 Notice that (1) the degrees of freedom are i opposite order, ad () the solutio for a left tail percetile is expressed i terms of a right tail percetile. This is useful whe the published table does ot list the required percetile value; usually the missig percetiles are the oes i the left tail.

BIOSTATS 540 Fall 015 6. Estimatio Page 30 of 7 How to Use the F Distributio Calculator Provided by Daielsoper.com Tip This is a right tail calculator ONLY!! Source: http://www.daielsoper.com/statcalc3/default.aspx You will eed to scroll dow to get to the F distributio calculator. The drop dow meu gives you choices: Example Solutio for a.5 th ad 97.5 th percetile value: The.5 th ad 97.5 th Percetiles of a F- distributio with umerator df=4 ad deomiator df=3 are 0.1 ad 3.41, respectively. For the.5 th percetile, right tail area =.975 For the 97.5 th percetile, right tail area =.05 http://www.daielsoper.com/statcalc3/default.aspx

BIOSTATS 540 Fall 015 6. Estimatio Page 31 of 7 4d. Sums ad Differeces of Idepedet Normal Radom Variables This is review. See agai course otes, 5. The Normal Distributio, pp 3-4 Lookig ahead. We will be calculatig cofidece itervals of such thigs as the differece betwee two idepedet meas (eg cotrol versus itervetio i a radomized cotrolled trial) Suppose we have to idepedet radom samples, from two idepedet ormal distributios. eg radomized cotrolled trial of placebo versus treatmet groups). We suppose we wat to compute a cofidece iterval estimates of the differece of the meas. Poit Estimator: How do we obtai a poit estimate of the differece [ µ Group 1 - µ Group ]? A good poit estimator of the differece betwee populatio meas is the differece betwee sample meas, X X ] [ Group 1 Group Stadard Error of the Poit Estimator: We eed the stadard error of [ XGroup 1 XGroup ] Defiitios IF (for group 1): X 11, X 1, X 11 is a simple radom sample from a Normal (µ 1, σ ) 1 (for group ): X 1, X, X is a simple radom sample from a Normal (µ, σ ) THEN This is great! We already kow the samplig distributio of each sample mea X is distributed Normal (µ 1, σ / ) Group 1 Group 1 1 / X is distributed Normal (µ, σ ) [XGroup 1 X Group ] is also distributed Normal with [ µ µ ] Mea = Group1 Group Variace = σ1 σ + 1

BIOSTATS 540 Fall 015 6. Estimatio Page 3 of 7 Be careful!! The stadard error of the differece is NOT the sum of the two separate stadard errors. Notice You must first sum the variace ad the take the square root of the sum. SE X σ σ X = + Group 1 Group 1 1 A Geeral Result Hady! If radom variables X ad Y are idepedet with E [X] = µ X ad Var [X] = E [Y] = µ Y ad Var [Y] = σ σ X Y The E [ ax + by ] = aµ X + bµ Y Var [ ax + by ] = a σ + b X σ ad Y Var [ ax - by ] = a σ + b σ X Y Tip o variaces: This result ALSO says that, whe X ad Y are idepedet, the variace of their differece is equal to the variace of their sum. This makes sese if it is recalled that variace is defied usig squared deviatios which are always positive.

BIOSTATS 540 Fall 015 6. Estimatio Page 33 of 7 5. Normal Distributio: Oe Group 5a. Cofidece Iterval for µ (σ Kow) Itroductio ad where we are goig Hopefully, you will see that the logic ad mechaics of cofidece iterval costructio are very similar across a variety of settigs. I this uit, we cosider the settig of data from a ormal distributio (or two ormal distributios) ad the settig of data from a biomial distributio (or two biomial distributios). We wat to compute a cofidece iterval estimate of µ for a populatio distributio that is ormal with σ kow. Available are data from a radom sample of size=. These pages show you how to costruct a cofidece iterval. Appedix 1 gives the statistical theory uderlyig this calculatio 1. The Poit Estimate of µ is the Mea X Recall that, for a sample of size=, the sample mea is calculated as Features: X X = i= 1 i 1. Uder simple radom samplig, the sample mea (X) is a ubiased estimator of the populatio mea parameter µ, regardless of the uderlyig probability distributio.. Whe the uderlyig probability distributio is ormal, the sample mea X also satisfies the criterio of beig miimum variace ubiased (See page 5).

BIOSTATS 540 Fall 015 6. Estimatio Page 34 of 7. The Stadard Error of X is σ The precisio of X as a estimate of the ukow populatio mea parameter µ is reflected i its stadard error. Recall: SE(X ) = variace(x ) = σ SE is smaller for smaller σ (measuremet error) SE is smaller for larger (study desig) 3. The Cofidece Coefficiet The cofidece coefficiet for a 95% cofidece iterval is the umber eeded to isure 95% coverage i the log ru (i probability). See agai, page 18. 95% coverage leaves 5% i the tails. This is split evely i the tails. Thus, for a 95% cofidece iterval (5% i tails total/ tails) =.5% i each tail. For a 95% cofidece iterval, this umber will be the 97.5 th percetile of the Normal (0,1) distributio = 1.96. For a (1-α)100% cofidece iterval, this umber will be the (1-α/)100 th percetile of the Normal (0,1) distributio. The table below gives some of these values i the settig of costructig a cofidece iterval estimate of µ whe data are from a Normal distributio with σ kow. Cofidece Coefficiet = Percetile Value from Cofidece Level Percetile Normal (0,1).50 75 th 0.674.75 87.5 th 1.15.80 90 th 1.8.90 95 th 1.645.95 97.5 th 1.96.99 99.5 th.576 (1-α ) (1-α/)100 th - Example - For a 50% CI, th.50 = (1- α ) says α=.50 ad says (1-α/)=.75. Thus, use 75 percetile of N(0,1)=0.674

BIOSTATS 540 Fall 015 6. Estimatio Page 35 of 7 Example - The followig data are the weights (micrograms) of drug iside each of 30 capsules, after subtractig the capsule weight. Suppose it is kow that σ = 0.5. Uder the assumptio of ormality, calculate a 95% cofidece iterval estimate of µ. 0.6 0.3 0.1 0.3 0.3 0. 0.6 1.4 0.1 0.0 0.4 0.5 0.6 0.7 0.6 0.0 0.0 0. 1.6-0. 1.6 0.0 0.7 0. 1.4 1.0 0. 0.6 1.0 0.3 The data are simple radom sample of size =30 from a Normal distributio with mea = µ ad variace = σ. The populatio variace is kow ad has value σ = 0.5 Remark I real life, we will rarely kow σ! This example is for illustratio oly. The solutio for the cofidece iterval is poit estimate + safety et: Lower limit = ( poit estimate ) - ( multiple ) (SE of poit estimate) Upper limit = ( poit estimate ) + ( multiple ) (SE of poit estimate) Poit Estimate of µ is the Mea X =30 X i i=1 X =30 = =30 = 0.51 The Stadard Error of X is σ SE(X =30 ) = variace(x =30 ) = σ = 0.5 30 = 0.0913

BIOSTATS 540 Fall 015 6. Estimatio Page 36 of 7 The Cofidece Coefficiet For a 95% cofidece iterval, this umber will be the 97.5 th percetile of the Normal (0,1) distributio. From the table o page 34 (or a Normal(0,1) calculator o the web), obtai the value 1.96. Desired Cofidece Level Value of Cofidece Coefficiet.95 1.96 th.95 = (1- α ) says α=.05 ad says (1-α/)=.975. Thus, use 97.5 percetile of N(0,1)=1.96 For a 95% CI, Puttig this all together Lower limit = ( poit estimate ) - ( multiple ) (SE of poit estimate) = 0.51 - ( 1.96 ) (0.0913 ) = 0.33 Upper limit = ( poit estimate ) + ( multiple ) (SE of poit estimate) = 0.51 + ( 1.96 ) (0.0913 ) = 0.69 Thus, we have the followig geeral formula for a (1 - α)100% cofidece iterval - ( ) th X ± [1-α100 percetile of Normal(0,1)] SE(X )

BIOSTATS 540 Fall 015 6. Estimatio Page 37 of 7 How to Calculate the Proportio of Meas i a Give Iterval (Use the idea of stadardizatio) We leared how to do this i Uit 5. The Normal Distributio. Example A sample of size =100 from a ormal distributio with ukow mea yields a sample mea X =100 = 67.43. The populatio variace of the ormal distributio is kow to be equal to σ = 36,764.3. What proportio of meas of size=100 will lie i the iterval [00,300] if it is kow that µ = 50 Aswer = 99.1% Solutio: Step 1 - The radom variable that we eed to stadardize is X =100. Mea = 50 SE = σ 100 = 36, 764. 3 100 = 19. 174 Step - Probability [ 00 < X =100 < 300] by the stadardizatio formula is = Pr 00-50 19.174 < X µ σ / < 300 50 19.174 = Pr -.608 < Z-score < +.608 = 0.9910.

BIOSTATS 540 Fall 015 6. Estimatio Page 38 of 7 5b. Cofidece Iterval for µ (σ NOT kow) - I sectio 5a, we assumed that σ is kow ad obtaied a cofidece iterval for µ of the form ( ) ( ) lower limit = X - z (1-α /)100 σ / upper limit = X + z (1-α /)100 σ / - The required cofidece coefficiet (z 1-α/ ) was obtaied as a percetile from the stadard ormal, N(0,1), distributio. (e.g. for a 95% CI, we used the 97.5 th percetile) - More realistically, however, σ will ot be kow. Now what? Reasoably, we might replace σ with s. Recall that s is the sample stadard deviatio ad we get it as follows: s= s where s - So far so good. But there is a problem. Whereas = i=1 X µ ( x x) i -1 IS distributed Normal (0,1) σ X µ is NOT distributed Normal (0,1). s - We have to modify our machiery (specifically the SE piece of our machiery) to accommodate the ukow-ess of σ. Whereas we previously used whe σ was kow z-score Percetile from Normal(0,1) With σ ukow we ow use t-score Percetile from Studet s t Uder simple radom samplig from a ormal distributio, the cofidece iterval for a ukow mea µ, the cofidece iterval will be of the followig form ( ) ( ) lower limit = X - t DF; (1-α /)100 s / upper limit = X + t DF; (1-α /)100 s /

BIOSTATS 540 Fall 015 6. Estimatio Page 39 of 7 Whe σ is ot kow, the computatio of a cofidece iterval for the mea µ is ot altered much. We simply replace the cofidece coefficiet from the N(0,1) with oe from the appropriate Studet s t-distributio (the oe with df = -1) We replace the (ow ukow) stadard error with its estimate. The latter looks early idetical except that it utilizes s i place of σ Recall Thus, s= ( Xi X) i=1 ( 1) Cofidece Iterval for µ i two settigs of a sample from a Normal Distributio σ is KNOWN σ is NOT Kow X ± (z 1 α/ )(σ/ ) X ± (t 1;1 α/ )(s/ )

BIOSTATS 540 Fall 015 6. Estimatio Page 40 of 7 Example A radom sample of size =0 duratios (miutes) of cardiac bypass surgeries has a mea duratio of X = 67 miutes, ad variace s = 36,700 miutes. Assumig the uderlyig distributio is ormal with ukow variace, costruct a 90% CI estimate of the ukow true mea, µ. Aswer: (193., 340.8) miutes Solutio: Step 1 - Poit Estimate of µ is the Mea X X i i= 1 X=0 = = 67 =0 miutes. Step The Estimated Stadard Error of X is s S 36,700 SE(X ˆ ˆ =0) = variace(x =0) = = = 4.7 miutes 0 Step 3 - The Cofidece Coefficiet For a 90% cofidece iterval, this umber will be the 95 th percetile of the Studet s t-distributio that has degrees of freedom = (-1) = 19. This value is 1.79. Puttig this all together Lower limit = ( poit estimate ) - ( cofidece coefficiet. ) (SE of poit estimate) = 67 - ( 1.79 ) (4.7 ) = 193.17 Upper limit = ( poit estimate ) + ( cofidece coefficiet ) (SE of poit estimate) = 67 + ( 1.79 ) (4.7 ) = 340.83 Thus, a 90% cofidece iterval for the true mea duratio of surgery is (193., 340.8) miutes.

BIOSTATS 540 Fall 015 6. Estimatio Page 41 of 7 5c. Cofidece Iterval for σ A cofidece iterval for σ is calculated usig percetiles from the chi square distributio. The followig are some settigs where our iterest lies i estimatio of the variace, σ - Stadardizatio of equipmet repeated measuremet of a stadard should have small variability - Evaluatio of techicias are the results from a particular techicia too variable - Compariso of measuremet techiques are the measuremets obtaied usig a ew techique too variable compared to the precisio of the old techique? We have a poit estimator of σ. It is S. How do we get a cofidece iterval? The aswer will utilize a ew stadardized variable, based o the way i which S is computed. It is a chi square radom variable. The defiitio of the chi square distributio gives us what we eed to costruct a cofidece iterval estimate of σ whe data are a simple radom sample from a ormal probability distributio. The approach here is similar to that for estimatig the mea µ. The table below shows how to costruct a cofidece iterval. For the iterested reader, Appedix is the derivatio behid the calculatio. (1-α)100% Cofidece Iterval for σ Settig Normal Distributio Lower limit = (-1)S χ 1- α/ Upper limit = (-1)S χ α /

BIOSTATS 540 Fall 015 6. Estimatio Page 4 of 7 Example A precisio istrumet is guarateed to read accurately to withi + uits. A sample of 4 readigs o the same object yield 353, 351, 351, ad 355. Fid a 95% cofidece iterval estimate of the populatio variace σ ad also for the populatio stadard deviatio σ. Aswer: (1.18, 51.0) uits squared Solutio: 1. Obtai the poit estimate of σ. It is the sample variace S To get the sample variace S, we will eed to compute the sample mea first. X i i=1 X= = 35.5 ad ( ) Xi X i= 1 S = = 3.67-1. Determie the correct chi square distributio to use. It has degrees of freedom, df = (4-1) = 3. 3. Obtai the correct multipliers. Because the desired cofidece level is 0.95, we set 0.95 = (1-α). Thus α =.05 For a 95% cofidece level, the percetiles we wat are (i) (α/)100 th =.5 th percetile (ii) (1 α/)100 th = 97.5 th percetile Obtai percetiles for chi square distributio with degrees of freedom = 3 http://www.stat.tamu.edu/~west/applets/chisqdemo.html (i) (ii) χ = = df 3,.05 0.158 df 3,.975 9.348 χ = =

BIOSTATS 540 Fall 015 6. Estimatio Page 43 of 7 4. Thus, (i) Lower limit = (ii) Upper limit = (-1)S χ 1- α/ (-1)S χ α / (3)(3.67) = 1.178 9.348 = (3)(3.67) = 51.0 0.158 = Obtai a Cofidece Iterval for the Populatio Stadard Deviatio σ Aswer: (1.09, 7.14) uits Solutio: Step 1 Obtai a cofidece iterval for σ (1.178, 51.0) Step The associated cofidece iterval for σ is obtaied by takig the square root of each of the lower ad upper limits - 95% Cofidece Iterval = ( 1.178, 51.0) = (1.09,7.14) - Poit estimate = 3.67 = 1.9 Remarks o the Cofidece Iterval for σ It is NOT symmetric about the poit estimate; the safety et o each side of the poit estimate is of differet legths. These itervals ted to be wide. Thus, large sample sizes are required to obtai reasoably arrow cofidece iterval estimates for the variace ad stadard deviatio parameters.

BIOSTATS 540 Fall 015 6. Estimatio Page 44 of 7 Itroductio to Paired 6. Normal Distributio: Paired Paired data occur whe each idividual (more specifically, each uit of measuremet) i a sample is measured twice. Paired data are familiar: "pre/post, "before/after", right/left,, paret/child, etc. Examples - 1) Blood pressure prior to ad followig treatmet, ) Number of cigarettes smoked per week measured prior to ad followig participatio i a smokig cessatio program, 3) Number of sex parters i the moth prior to ad i the moth followig a HIV educatio campaig. I each of these examples that the two occasios of measuremet are liked by virtue of the two measuremets beig made o the same idividual. We are iterested i comparig the two paired outcomes. Whe the paired data are cotiuous, the compariso focuses o the differece betwee the two paired measuremets. Note We ll see later that whe the data are discrete, a aalysis of paired data might focus o the ratio (eg. relative risk) of the two measuremets rather tha o the differece. Examples: 1) Blood pressure prior to ad followig treatmet. Iterest is d = pre-post. Large differeces are evidece of blood pressure lowerig associated w treatmet. ) Number of cigarettes smoked per week measured prior to ad followig participatio i a smokig cessatio program. Iterest is d=pre-post. Large differeces d are evidece of smokig reductio. 3) Number of sex parters i the moth prior to ad i the moth followig a HIV educatio campaig. Iterest is d=pre post. Large differeces are evidece of safer sex behaviors.

BIOSTATS 540 Fall 015 6. Estimatio Page 45 of 7 6a. Cofidece Iterval for µ DIFFERENCE Suppose two paired measuremets are made of the same pheomeo (eg. blood pressure, # cigarettes/week, etc) o each idividual i a sample. Call them X ad Y. If X ad Y are each ormally distributed, the their differece is also distributed ormal. (see agai, sectio 4d) Thus, the settig is our focus o the differece D ad the followig assumptios (1) D = (X - Y) is distributed Normal with () Mea of D = µ differece. Let s write this as µ d (3) Variace of D = σ Let s write this as DIFFERENCE σ d I this settig, estimatio for paired data is a special case of selected methods already preseted. Attetio is restricted to the sigle radom variable defied as the differece betwee the two measuremets. The methods already preseted that we ca use here are (1) Cofidece Iterval for µ d - Normal Distributio σ d ukow () Cofidece Iterval for σ d - Normal Distributio Example source: Aderso TW ad Sclove SL. Itroductory Statistical Aalysis. Bosto: Houghto Miffli, 1974. page 339 A researcher is iterested assessig the improvemet i readig skills upo completio of the secod grade (Y) i compariso to those prior to the secod grade (X). For each child, his or her improvemet is measured usig the differece d which is defied d = Y X. A sample of =30 childre are studied. The data are show o the ext page.

BIOSTATS 540 Fall 015 6. Estimatio Page 46 of 7 ID PRE(X) POST(Y) d=(y-x) 1 1.1 1.7 0.6 1.5 1.7 0. 3 1.5 1.9 0.4 4.0.0 0.0 5 1.9 3.5 1.6 6 1.4.4 1.0 7 1.5 1.8 0.3 8 1.4.0 0.6 9 1.8.3 0.5 10 1.7 1.7 0.0 11 1. 1. 0.0 1 1.5 1.7 0. 13 1.6 1.7 0.1 14 1.7 3.1 1.4 15 1. 1.8 0.6 16 1.5 1.7 0. 17 1.0 1.7 0.7 18.3.9 0.6 19 1.3 1.6 0.3 0 1.5 1.6 0.1 1 1.8.5 0.7 1.4 3.0 1.6 3 1.6 1.8 0. 4 1.6.6 1.0 5 1.1 1.4 0.3 6 1.4 1.4 0.0 7 1.4.0 0.6 8 1.5 1.3-0. 9 1.7 3.1 1.4 30 1.6 1.9 0.3 Calculate (1) A 99% cofidece iterval for µ d () A 80% cofidece Iterval for σ d

BIOSTATS 540 Fall 015 6. Estimatio Page 47 of 7 Solutio for a 99% Cofidece Iterval for µ d Step 1 Poit Estimate of µ d is the Mea =30 d i i= 1 d=30 = = 0.51 =30 d Step The Estimated Stadard Error of d is S d Calculate the sample variace of the idividual differeces: ( ) di d i= 1 Sd = = 0.416-1 The estimated variace of the sample mea of the differeces is therefore: Thus, ˆ Sd 0.416 variace(d ˆ =30) = = 30 d SE(d =30) = variace(d =30) = = = 0.0897 Step 3 The Cofidece Coefficiet ˆ S 0.416 30 For a 99% cofidece iterval, this umber will be the 99.5 th percetile of the Studet s t-distributio that has degrees of freedom = (-1) = 9. This value is.756. Step 4 Substitute ito the formula for a cofidece iterval Lower limit = ( poit estimate ) - ( cofidece coefficiet ) (SE of poit estimate) = 0.51 - (.756 ) (0.0897 ) = 0.67 Upper limit = ( poit estimate ) + ( cofidece coefficiet. ) (SE of poit estimate) = 0.51 + (.756 ) (0.0897 )

BIOSTATS 540 Fall 015 6. Estimatio Page 48 of 7 = 0.7573 6b. Cofidece Iterval for σ DIFFERENCE Solutio for a 80% Cofidece Iterval for σ d. Step 1 - Obtai the poit estimate of σ d. ( ) di d i= 1 Sd = = 0.416-1 Step - Determie the correct chi square distributio to use. It has df = (30-1) = 9. Step 3 - Obtai the correct multipliers. Because the desired cofidece level is 0.80, set 0.80 = (1-α). Thus α =.0 For a 80% cofidece level, α =.0 ad α/ =.10 so we wat: (i) (α/)100 th = 10 th percetile (ii) (1 α/)100 th = 90 th percetile Usig a chi square distributio calculator, set degrees of freedom, df = 9 (i) (ii) χ = = df 9,.10 19.77 df 9,.90 39.09 χ = = Step 4 Substitute ito the formula for the cofidece iterval (i) Lower limit = (ii) Upper limit = (-1)S χ 1- α/ (-1)S χ α / d d (9)(0.416) = = 0.179 39.09 (9)(0.416) = = 0.3544 19.77

BIOSTATS 540 Fall 015 6. Estimatio Page 49 of 7 7. Normal Distributio: Two Idepedet Groups Illustratio of the Settig of Two Idepedet Groups Example - A researcher performs a drug trial ivolvig two idepedet groups. o A cotrol group is treated with a placebo while, separately; o A idepedet itervetio group is treated with a active aget. o Iterest is i a compariso of the mea cotrol respose with the mea itervetio respose uder the assumptio that the resposes are idepedet. o The tools of cofidece iterval costructio described for paired data are ot appropriate. 7a. Cofidece Iterval for [ µ GROUP1 - µ GROUP ] Suppose we wat to compare the mea respose i oe group with the mea respose i a separate group. Suppose further that two groups are idepedet. Examples - 1) Is mea blood pressure the same for males ad females? ) Is body mass idex (BMI) similar for breast cacer cases versus o-cacer patiets? 3) Is legth of stay (LOS) for patiets i hospital A the same as that for similar patiets i hospital B? For cotiuous data, the compariso of two idepedet groups focuses o the differece betwee the meas of the two groups. Similarity of the two groups is reflected i a differece betwee meas that is ear zero. Focus is o [ µ Group 1 - µ Group ]

BIOSTATS 540 Fall 015 6. Estimatio Page 50 of 7 Poit Estimator: We wat a poit estimate of the differece [ µ Group 1 - µ Group ] Our poit estimator will be the differece betwee sample meas, [ XGroup 1 XGroup ] Stadard Error of the Poit Estimator: We eed the value (or estimate of ) the stadard error of [ X X ]. We ca use what we leared i sectio 4d (see pp 31-3) Group 1 Group Sice X 11, X 1, X 11 is a simple radom sample from a Normal (µ 1, σ ) Ad sice X 1, X, X is a simple radom sample from a Normal (µ, σ ) 1 We have Ad thus, X is distributed Normal (µ 1, σ / ) Group 1 Group 1 1 / X is distributed Normal (µ, σ ) [XGroup 1 X Group ] is also distributed Normal with [ µ µ ] Mea = Group1 Group Variace = σ1 σ + 1

BIOSTATS 540 Fall 015 6. Estimatio Page 51 of 7 How to Estimate the Stadard Error The correct solutio depeds o σ ad 1 σ. Solutio 1 - σ ad 1 σ are both kow SE X σ σ X = + Group 1 Group 1 1 Solutio - σ ad 1 σ are both NOT kow but are assumed EQUAL SEˆ X S S X = + Group 1 Group pool pool 1 S pool is a weighted average of the two separate sample variaces, with weights equal to the associated degrees of freedom cotributios. S ( 1)S + ( 1)S 1 1 pool = (1 1) + ( 1) Solutio 3 - σ ad 1 σ are both NOT kow ad NOT EQUAL SEˆ X S S X = + Group 1 Group 1 1

BIOSTATS 540 Fall 015 6. Estimatio Page 5 of 7 Cofidece Coefficiet ( Multiplier) Agai, the correct solutio depeds o σ ad 1 σ. Solutio 1 - σ ad 1 σ are both kow Use percetile of Normal(0,1) Solutio - σ ad 1 σ are both NOT kow but are assumed EQUAL Use percetile of Studet s t Degrees of freedom = ( 1 1) + ( 1) Solutio 3 - σ ad 1 σ are both NOT kow ad NOT EQUAL Use percetile of Studet s t Degrees of freedom = f where f is give by formula (Satterthwaite) S1 S + 1 f= S 1 S 1 + 1 1 1 Horrible, is t it!

BIOSTATS 540 Fall 015 6. Estimatio Page 53 of 7 Summary Normal Distributio: Cofidece Iterval for [ µ 1 - µ ] (Two Idepedet Groups) Sceario à Estimate CI = [poit estimate] σ ad 1 σ are both kow X ± (cof.coeff)se[poit estimate] σ ad 1 σ are both NOT kow but are assumed EQUAL σ ad 1 σ are both NOT kow ad NOT Equal Group 1 XGroup XGroup 1 XGroup XGroup 1 XGroup SE to use σ1 σ SE XGroup 1 XGroup = + 1 SEˆ X S S X = + Group 1 Group where you already have obtaied: pool pool 1 SEˆ X S S X = + Group 1 Group 1 1 S ( 1)S + ( 1)S 1 1 pool = (1 1) + ( 1) Cofidece Coefficiet Use Percetiles from Normal Studet s t Studet s t Degrees freedom Not applicable ( 1 1) + ( 1) S1 S + 1 f= S 1 S 1 + 1 1 1