Epidemiology 9509 Principle of Biostatistics Chapter 5 Probability Distributions (continued) John Koval Department of Epidemiology and Biostatistics University of Western Ontario
What was covered previously 1. probability P(A) sets P(A and B); P(A or B) 2. probability distributions 2.1 discrete 2.1.1 equiprobable 2.1.2 bernoulli 2.1.3 binomial 2.1.4 poisson 2.2 continuous 2.2.1 uniform 2.2.2 normal 3. calculating probabilities 3.1 discrete Pr(X = x) 3.2 continuous intervals: Pr(X < a), Pr(a < X < b)
What is being covered now Using SAS to 1. calculate probabilities 2. calculate and plot probability distributions
Calculating probabilities SAS function PDF title calculate binomial probability ; data binom1; prob = pdf( binomial, 4, 0.4, 10); output ; proc print data=binom1;
binomial probability calculate binomial probability Obs prob 1 0.25082 Does this agree with previous calculations?
binomial probability calculate binomial probability Obs prob 1 0.25082 Does this agree with previous calculations? 0.251, Lecture Chapter 5, page 8
Calculating probability distribution title "calculate binomial probability distribution ; data binom2; do x = 0 to 10 by 1; prob = pdf( binomial, x, 0.4, 10); output; end; proc print data=binom2; proc gplot; plot prob*x; run;
binomial probability distribution calculate binomial probability distribution Obs x prob 1 0 0.00605 2 1 0.04031 3 2 0.12093 4 3 0.21499 5 4 0.25082 6 5 0.20066 7 6 0.11148 8 7 0.04247 9 8 0.01062 10 9 0.00157 11 10 0.00010
GPLOT of pdf
Calculating cumulative probabilities values up to and including SAS function CDF title calculate cumulative binomial probability ; data binom3; prob = cdf( binomial, 7, 0.4, 20); output ; proc print data=binom3; run;
binomial cumulative distribution calculation calculate cumulative binomial probability Obs prob 1 0.41589 Does this agree with previous calculations?
binomial cumulative distribution calculation calculate cumulative binomial probability Obs prob 1 0.41589 Does this agree with previous calculations? 0.4159, using R, Lecture Chapter 5, page 30
cumulative continuous probabilities Pr(X ( < b) ) = Pr Z N < b µ σ ) = Φ ( b µ σ Φ() given by SAS function PROBNORM
example Recall normal approximation to binomial want Pr(X norm < 7.5) = Pr(Z N < ( ) 7.5 8 2.19 = Φ(.228) title calculate Normal probability ; data norm1; prob =probnorm(-0.228); output; proc print data=norm1; run; ;
binomial cumulative distribution calculation calculate Normal probability Obs prob 1 0.40982 Does this agree with previous calculations?
binomial cumulative distribution calculation calculate Normal probability Obs prob 1 0.40982 Does this agree with previous calculations? 0.4098 by linear interpolation, see lecture Chapter 5, page 30
Probability of interval Pr(17 < X < 22) = Pr ( 17 20 5 < Z N < 22 20 ) 5 = Pr( 0.6 < Z N < 0.4) = Φ(0.4) Φ( 0.6) title calculate Normal probability for interval ; data norm2; a=-0.6; b=0.4; proba =probnorm(a); probb = probnorm(b); probint = probb - proba; output; proc print data=norm2; run;
binomial cumulative distribution calculation calculate Normal probability for interval Obs a b proba probb probint 1-0.6 0.4 0.27425 0.65542 0.38117 Does this agree with previous calculations?
binomial cumulative distribution calculation calculate Normal probability for interval Obs a b proba probb probint 1-0.6 0.4 0.27425 0.65542 0.38117 Does this agree with previous calculations? 0.3809, see lecture Chapter 5, page 26
Plotting normal density function not usually done in practice data norm3; do x = 0 to 10 by 0.05; density = pdf( normal, x, 4, 1.55); output ; end; proc gplot data = norm3; plot density*x; symbol interpol=join;
GPLOT of pdf of Normal N(4,2.4)
normal approximation to binomial title Normal approximation to binomial ; data normbinom; n=20; pi=0.4; mu = n*pi; var = n*pi*(1-pi); sd = sqrt(var); do i = 0 to 20.975 by 0.025; binompdf = pdf( binomial, floor(i), pi, n); x = i-0.5; normpdf = pdf( normal, x, mu, sd); output normbinom; end;
normal approximation to binomial(continued) proc gplot data=normbinom; plot binompdf * x normpdf * x/ haxis=-1 to 21 by 1 vaxis=0 to 0.2 by 0.05 overlay; symbol interpol=join;
GPLOT of normal approximation to Bin(20,0.4)
another normal approximation to binomial non-symmetric distribution Bin(10,.2) data normbinom2; n=10; pi=0.2; mu = n*pi; var = n*pi*(1-pi); sd = sqrt(var); do i = 0 to 10.9075 by 0.025; binompdf = pdf( binomial, floor(i), pi, n); x = i-0.5; normpdf = pdf( normal, x, mu, sd); output normbinom2; end;
non-symmetric distribution (continued) proc gplot data=normbinom2; plot binompdf * x normpdf * x / haxis=-1 to 11 by 1 vaxis=0 to 0.5 by 0.05 overlay; symbol interpol=join;
Normal approximation to Bin(10,0.2) original distribution is asymmetric not a good fit to the normal