E509A: Principle of Biostatistics. GY Zou

Size: px

Start display at page:

Download "E509A: Principle of Biostatistics. GY Zou"

Shawn Bryant
5 years ago
Views:

1 E509A: Principle of Biostatistics (Week 2: Probability and Distributions) GY Zou

2 Reporting of continuous data If approximately symmetric, use mean (SD), e.g., Antibody titers ranged from 25 to 347 ng/ml and had a mean (SD) of 110 ng/ml (43 ng/ml). If markedly asymmetric, use median (interquartile range), e.g., Antibody titers ranged from 25 to 347 ng/ml, with a median (interquartile range) of 110 g/ml (61 to 159 ng/ml). It s better not to use: mean ± SD or mean ± s.e. Why?

3 The ultimate goal of statistical inference is to use sample to draw conclusions about the population of interest. This boils down to use sample statistics to estimate population parameters (unknown). It seems then the logic to understand what is going on in the population, if we know the population parameters. Probability theory is the key to accomplish this.

4 Probability: definitions a vehicle to convey an opinion: may be, might be, perhaps,...; Relative frequency in the action-outcome scenario, i.e. the ratio of events over all the possible outcome (sample space). Flip a coin outcome: the coin will fall each time (deterministic); outcome: head or tail (not sure before the experiment is done) random; probability of a random event (getting a head): if we flip the coin n times, we d know the relative frequency of getting a head, or the probability of getting a head; in a long run (flip many times), the probability will be 50%, or 0.5.

5 Event: before talking about probability, we must define the event of interest. Example 3.1 (p. 89): a population of N = 6 people with gender and age information, we could be interested in female: Pr = 3/6 = 0.5 male: Pr = 3/6 = 0.5 subject over 65: Pr = 2/6 = 0.3 odds is not probability! odds = Pr(A) 1 Pr(A) Probability of complex event is what makes people dizzy, yet we cannot always avoid it. e.g., female over 65, male younger than 65...

6 Relationships between two simple events The Complement of an event consists of all outcomes in the sample space that are not in the event. e.g., male complement female, because normal people can either be a male or female, but not both. Pr(A) = 1 Pr(A ) mutually exclusive: when one happens the other cannot happen. e.g., a thrown of a die may show a one or two, but not both. They have nothing in common. Union: either, or. denote as A B Intersection: and. denote as A B

7 Space A B Addition rule Pr(A B) = Pr(A) + Pr(B) P(A B) P(A B) = Pr(A) + Pr(B) Pr(A B)

8 Example 3.2A (p. 93). Educational level of pts went to allergy clinic (contingency table) gender < > 17 Total Male Female Total Randomly select a patient, What is the probability that The pt is male: 78/ male or had at most 12 yrs:, since there were 78 males, among 208 females there are had at most 12 yrs ed male and had at most 12 yrs: =

9 Conditional probability (very hard to apply) The probability of one event (A), given that the other event (B) has happened already. Pr(A B) =? Can we define Pr(B A) =? Is Pr(A B) = Pr(B A)? Let A denote AIDS, and B test positive. The probability of a person has aids, given s/he tested positive Pr(A B) The probability of a person tested positive, given that s/he has aids. Pr(B A)

10 Development of conditional probability (Change space or denominator) Space A B A B Conditional: new space Pr(A B) = Pr(A B)/ Pr(B) Pr(B A) = Pr(A B)/ Pr(A)

11 Example 3.2A (p. 93). Educational level of pts went to allergy clinic (contingency table) gender < > 17 Total Male Female Total Randomly select a patient, Pr(< 9 Male)? This is to say suppose the patient is a male, what is the probability that he has less than 9 yrs of ed? Pr(< 9 Male) =? = / = Pr(Male < 9)? This is to say suppose the patient had less than 9 yrs ed, what is the probability that he is a male? Pr(Male < 9) = = / = Pr(< 9 Male) = 15 78

12 Again Pr(A B) Pr(B A) Fail to recognize this has become the tragedy of modern statistics.

13 In the development of a diagnostic test, one must evaluate the accuracy of the test from two aspect, 1) if a subject has the targeted disease, what is the probability of test being positive, i.e., Pr(T + D + ); and 2) if a subject does not have the disease, what is the probability of test being negative, ie., Pr(T D ). Now, a subject has a positive test result (T + ), what is the probability that s/he has the disease (D + ), i.e., Pr(D + T + ). Another piece of info is Pr(D + ), some rough idea before ordering the test. Pr(D + T + ) = Pr(D+ T + ) Pr(T + ) = Pr(D+ T + )Pr(T + ) Pr(T + ) = Pr(D + T + ) Did not achieve anything? What if we write Pr(D + T + ) = Pr(T + D + )Pr(D + )?, ie. Pr(D + T + ) = Pr(T+ D + ) Pr(D + ) Pr(T + ),

14 Space D + T + We have Pr(D + T + ) = Pr(T+ D + )Pr(D + ) Pr(T + ) T + may be regarded as a sum of two parts: T + D + and T + D Pr(T + ) = Pr(T + D + ) + Pr(T + D ) = Pr(T + D + )Pr(D + ) + Pr(T + D )Pr(D ) = Pr(T + D + )Pr(D + ) + [1 Pr(T D )][(1 Pr(D + )]

15 Now, a subject has a negative test result (T ), what is the probability that s/he does not have the disease (D ), i.e., Pr(D T ). Similar argument can be used to calculate this. From diagnostic accuracy to accurate diagnosis (Zou 2004 Medical Decision Making 24: ). From sensitivity Pr(T + D + ) and specificity Pr(T D ) to positive predicted value Pr(T + D + ), Pr(T D ) Pr(D+ ) Pr(D + T + ) and From sensitivity Pr(T + D + ) and specificity Pr(T D ) to negative predicted value Pr(T + D + ), Pr(T D ) Pr(D+ ) Pr(D T )

16 A and B are independent means (no association) Pr(A B) = Pr(A) and Pr(B A) = Pr(B) Pr(A B) = Pr(A B) Pr(B) Pr(A B) Pr(B) = Pr(A B) Indep Pr(A) Pr(B)

17 Example 3.2A (p. 93). Educational level of pts went to allergy clinic (contingency table) gender 12 > 12 Total Male Female Total Is education and gender independent? Pr(male) = Pr(male ( 12)) = and Pr( 12) = Pr(male) Pr( 12) Note these calculations do not take random error into account.

18 Random variable and its distribution Random variable: a characteristic can take multiple values with given probabilities. e.g., outcome (X) of flipping a coin can either be head (= 1) or tail (= 0), each with probability of 0.5. The set of probabilities make up a (probability) distribution. The distribution of flipping a coin is f(x) = 0.5, X = 1 = 0.5, X = 0 or Pr(X = i) = 0.5, i = 0, 1 This distribution is known as Bernoulli distribution.

19 Let flip two coins at once. Four possible outcomes with equal probability of 1/4: 1) Head, head 2) Head, tail 3) Tail, head 4) Tail, tail. Let s define the event of interest be total number of heads, X X could be 0, 1, 2 Pr(X = 0) = 1/4 Pr(X = 1) = 1/4 + 1/4 Pr(X = 2) = 1/4

20 Binomial distribution We can extend the flipping of two coins to n coins, and define the event of interest as the total number of heads, X. A key assumption here is independence. We could enumerate the probabilities for Y = 0, 1,, n as we did for n = 2 coins, but that will take a while, instead we will use the formula presented in the book (p. 104) P(X = x) = ( n x ) p x (1 p) n x = n! x!(n x)! px (1 p) n x where p is the probability of head showing. For unbiased coin, p = 0.5. Note, binomial distribution is determined by n, p, which we call parameters for binomial, B(n, p). B(n, p) = n B(p): Binomial is a sum of Bernoulli s

21 Let n = 2, we get Pr(X = 0) = 2! 0!(2 0)! p0 (1 p) 2 0 = p, similar calculation can bet Pr(X = 1) and Pr(X = 2).

22 Ex 3.5 (p. 105) Application of Binomial distribution: Antibiotic with p = 0.7 If the antibiotic is given to 5 unrelated pts, what is the probability that it will be effective in exactly 3? Binomial (n = 5, p = 0.7) Pr(X = 3) = 5! 3!(5 3)! (0.7)3 (1 0.7) 5 3 = The probability that the antibiotic will be effect in none of the 5 pts. Pr(X = 0) = 5! 0!(5 0)! (0.7)0 (1 0.7) 5 0 = (0.3) 5 =

23 Population mean and variance Mean, also called expectation (E(X)): it is a sum of (each possible value multiplied by its probability). For Bernoulli(p): E(X) = 0 (1 p) + 1 p = p Variance (σ 2 ): average of squared deviation from the mean. For Bernoulli (p) σ 2 = (0 p) 2 Pr(X = 0) + (1 p) 2 Pr(X = 1) = p 2 (1 p) + (1 p) 2 p = p p 2 = p(1 p)

24 Properties of mean and variance Let X be a random variable, c be a constant E(X ± c) = E(X) ± c V ariance(x ± c) = V ariance(x) = σ 2 Measure the average height of a population, let s denote it as E(X) = µ. Now, we add a constant to height by measuring the height of people standing on a box with height c, so the mean will be µ + c. We can also subtract a constant from µ by measuring the height of people standing in a hole with depth c, so the mean will be µ c. Either people standing on a box or in a hole, the variability has not changed, so the variance will still be σ 2.

25 What happen to cx? E(cX) =? E(cX) = c E(X) = cµ? V ariance(cx) =? = (cxi cµ) 2 N 1 c 2 (X i µ) 2 = N 1 = c 2 σ 2

26 How about X ± Y, where X and Y are independent variables? Imagine measuring the height (X) of people standing on boxes of random height (Y ). The mean should be E(X) + E(Y ). Since some short people could be on the small boxes, while tall people could be on the large boxes, the variability is thus certainly increased. In fact V ariance(x + Y ) = σ 2 X + σ 2 Y Imagine measure the heights above ground levels of our people standing in holes of random depth. The mean above ground would be E(X) E(Y ). Since some short people could be in deep holes while some tall people in shallow holes, the variability of heights above ground would be increased. In fact, V ariance(x Y ) = σ 2 X+σ 2 Y

27 Mean and variance of binomial distribution (n, p) Recall, Binomial is the sum of n Bernoulli s (p) The mean for Bernoulli (p) is p, the mean for binomial must be np The variance for Bernoulli (p) is p(1 p), the variance for Binomial (n, p) must be np(1 p). Compare to p. 108

28 Continuous distribution For discrete variable in general and binomial in particular, we obtain the distributions by simply counting them. Although it is tedious, but still countable. Image what would happen if we try to assign probabilities to numbers in an interval of [0,1] We don t even known how many possible numbers in there!

29 How about we assign probability not to a single number, but to an interval!

30 Normal distribution One essential continuous distribution is normal distribution, or Gaussian distribution, after Karl Fredrich Gauss ( ). Pr(a < X < b) = b f(x)dx, where f(x) = 1 σ (x µ) 2 2π e 2σ 2, < x <, a f(x) is called density function. e x = y x = lny = log y. e =? π =?

31 Empirical rule comes from N(µ, σ 2 ): Pr(µ 1σ < X < µ + 1σ).68 Pr(µ 2σ < X < µ + 2σ).95 Pr(µ 3σ < X < µ + 3σ).99 Mean = Median This implies Pr(X < µ) = Pr(X > µ) =.50

32 Example 3.7 (p. 110). µ = 70in and σ = 2in. Heights for a specific age and gender are approximately normally distributed This is an assumption we have to make. Otherwise we cannot just make something out of nothing. X N(70, 2 2 ) Suppose a male age 25 is selected at random.

33 Ex 3.7 (cont) a) What is the probability that his height is more than 70? Pr(X > 70) = Pr(X > µ) = 0.50 b) What is the probability that his height is between 70 to 72? Pr(70 < X < 72) = Pr(µ < X < µ + σ) = 1/2 Pr(µ 1σ < X < µ + 1σ) =.68/2 =.34 c) What is the probability that his height exceeds 72? Pr(X > 72) = Pr(X > µ + σ) = Pr(X > µ) Pr(µ < X < µ + σ) = = 0.16

34 From the normal density f(x) = 1 σ (x µ) 2 2π e 2σ 2, < x <, we can see a family of functions, determined by µ, σ. Denote normal distribution as N(µ, σ 2 ). 2 Since it is rather hard to do 1 1 σ 2π (x µ) 2 e 2σ 2 dx, e.g. suppose we want to know the probability of X (1, 2), where X N(2, 3 2 ) Pr(1 < X < 2) = (x 2)2 e dx =? 2π 3 2

35 Use tables! But we need to standardized the variable first. subtract X from population mean µ X µ standardize by standard deviation σ: X µ σ call this Z (has nothing to do with Zou!) it is really Z N(0, 1). Z = X µ σ

36 The normal law of errors stands out in the endeavors of mankind as one of the broadest generalization of natural philosophy. It serves as the guiding instrument for research in the physical and social sciences and in medicine, agricultural and in the biological world in general.it is a tool important for the analyses and interpretations of the basic information obtained by observation and by experimental design to produce fundamental statistical inferences.

37 Example 3.8 (p. 114): BP X is assume to be N(108, 14 2 ) in a population. If a person is randomly sampled from this population. a) What is that probability that this person s BP is below 112. Pr(X < 112) = Pr(Z < ) = Pr(Z <.29) =.6141 Interpretation based on random sample: The probability of a person s BP below 112 is % of the population have BP less than 112. b) c) g) are left for your reading.

38 Numbers to remember Percentile Z Notation Z Z Z Z 1 0.2

39 Table B.2 on pp You should remember and % %

40 Flip an unbiased coin 5 time Probability of all HEADS Probability of all TAILS The sum is 6.25% 5%

41 Example 3.9. (p. 120) BP X N(108, 14 2 ). a) Find the 5 percentile in BP X = ( 1.645) 14 b) Find the 90 percentile in BP X =

42 Normal to approximate binomial Example 3.10 (p. 122). Antibiotic with p = 70% =.7 success rate to give to 25 people. a) What is the probability that the antibiotic is effective in more than 15 patients. Pr(X > 15) Pr(X = x) = ( ) n x p x (1 p) n x n = 25, p = 0.7 Pr(X > 15) = Pr(X = 16) + Pr(x = 17) + + Pr(X = 25) = ( ) (0.7) 16 (1 0.7) ( ) (0.7) 25 =

43 µ = np = = 17.5 σ 2 = npq = = 5.25 Standardize as Z X µ σ = X np npq Z X =.25/ 5.25 = 1.09 Pr(Z > 1.09) = 1 Pr(Z < 1.09) = = Recall exact calculation result is

44 To make the approximation more accurate, we use Z = X Why? See my chicken scratch on board. Pr(X > 15) Z > X = Pr(Z > 0.87) = Suppose we want to know the probability of the antibiotic is effective fewer than 12 people?, i.e. Pr(X < 12). Not Pr(X 12) (see that in the book).

45 Pr(X < 12) Z < Pr(Z < 2.62) = No adjustment calculation results in Exact calculation result is

Chapter 5. Sampling Distributions

Lecture notes, Lang Wu, UBC 1 Chapter 5. Sampling Distributions 5.1. Introduction In statistical inference, we attempt to estimate an unknown population characteristic, such as the population mean, µ,