Chapter 4 Probabilit and Probabilit Distributions Sections 4.6-4.10
Sec 4.6 - Variables Variable: takes on different values (or attributes) Random variable: cannot be predicted with certaint Random Variables Qualitative eg. political affiliation, color preference, gender Quantitative measureable, numeric outcomes Discrete eg. # heads tossed, enrollement Continuous eg. Age of marriage, income tax return amts, height Recall: We want to know the probabilit of observing a particular sample
4.7 Probabilit Distributions for Discrete RVs Discrete random variable: quantitative random variable, the variable can onl assume a countable number of values What is the probabilit associated with each value of the variable,? Probabilit Distribution of : theoretical relative frequencies obtained from the probabilities for each value of The probabilit distribution for a discrete r.v., displas the probabilit P() associated with each value of.
Probabilit Distributions Discrete RVs Example. Consider the tossing of coins, and define the variable,, to be the number of heads observed. Possible values of : 0, 1,. Suppose that empirical sampling ields the following: freq 0 19 1 4 19 Empirical probabilit distribution of : freq rel. freq 0 19 0.58 1 4 0.484 19 0.58 Theoretical probabilit distribution of : P() 0 0.5 1 0.5 **Theoretical and empirical probabilit distributions
4.9 Probabilit Distributions for Continuous RVs Continuous Random Variable: quantitative, variable assumes values on an interval, un-countabl man possible values Example. Consider the random variable,, that is the average height of 18 ear old males in the US. The following is sample data collected from 400 individuals: 5.4959 5.507 5.559 5.5698 5.5446 5.4464 5.884 5.837 5.4901 5.4569 5.18 5.6931 4.93 5.9798 5.0576 6.478 6.1558 6.6181 6.0048 6.1135 5.1775 6.184 6.378 6.0983 6.0165 6.1591 5.4195 5.5411 5.7411 5.6197 5.341 5.8045 5.665 6.033 5.8679 5.9166 6.0485 5.1919 5.8154 5.0156 5.55 5.781 5.355 5.6197 5.341 6.1074 5.6618 5.8685 5.848 5.4685 5.758 5.683 5.7863 5.4616 5.718 5.854 5.8888 5.6631 6.4617 5.8419 5.5149 5.76 5.4401 6.809 5.834 6.0809 4.9667 5.941 6.718 5.5195 5.5634 5.1731 6.311 5.7405 5.7851 5.514 6.07 5.0959 5.5863 5.55 5.8677 5.3949 5.8159 5.3006 5.7134 5.6737 6.084 5.656 6.316 6.0855 6.1686 5.436 5.4665 6.5448 5.9669 5.7581 5.806 6.0079 5.3411 5.9654 6.0338 6.063 5.0646 6.3141 6.059 5.6471 5.764 6.345 5.3717 5.19 5.9169 5.944 5.4851 5.47 5.6306 5.716 5.7367 5.748 6.66 5.1307 5.7611 5.196 5.847 5.718 5.9569 5.4853 5.0979 5.8701 5.687 5.6347 5.158 5.8158 5.1913 5.8076 4.9118 5.847 5.6585 5.4951 5.814 5.6896 6.0666 5.5501 5.5753 6.0568 5.084 5.9461 6.066 5.177 4.9793 5.618 5.4857 6.163 5.6608 6.1057 5.619 5.551 5.7406 5.758 5.4758 5.438 5.445 6.0701 5.469 5.855 5.5485 6.0436 5.806 6.656 6.0661 5.743 5.8049 6.104 5.651 5.635 5.7107 5.130 5.95 6.1118 5.903 5.3639 6.0563 5.581 5.443 6.666 5.661 5.6967 5.847 5.4449 5.5194 5.6584 6.1407 5.941 6.1833 4.8951 5.785 5.5433 5.857 5.9 6.0596 5.954 6.0389 5.849 5.531 6.1674 5.8486 5.88 5.6159 5.665 6.085 5.445 5.764 4.9846 5.148 6.4544 5.8351 6.3308 6.109 5.6398 5.6678 5.5356 5.8694 5.6393 5.5884 6.0101 6.01 6.048 5.7914 5.877 6.1343 5.7689 5.7496 5.9386 5.5588 5.88 6.054 6.193 5.4785 5.8039 5.7008 6.4147 5.8676 6.0046 5.740 5.7745 5.8013 6.1333 4.8571 4.9746 5.9478 5.7179 5.79 6.17 5.8119 5.799 5.7891 5.6666 6.1177 5.9385 5.5016 5.9354 5.657 6.1379 6.3875 5.785 6.071 5.8701 5.7518 5.597 5.975 5.8168 6.018 5.7141 5.7858 5.734 5.1043 5.7719 6.1106 5.4786 5.7649 5.8087 5.5939 4.88 6.117 5.1014 5.087 5.496 5.986 6.0805 5.816 5.95 5.5037 6.0471 5.3983 5.817 5.8639 5.4055 5.7776 6.4469 5.5847 5.936 6.0166 5.3819 5.5075 5.6116 6.183 5.5771 6.01 5.9787 5.9914 5.7378 6.136 6.947 5.593 6.155 5.4893 5.0933 5.576 5.1963 5.989 6.3131 5.5738 6.0115 6.1356 5.8364 6.63 6.1083 6.147 5.613 5.9585 5.561 5.931 6.116 6.0367 5.0873 6.0336 5.97 6.0865 5.113 5.6348 5.9155 5.8398 5.831 5.765 5.9536 5.8978 5.9475 6.014 5.8874 6.0786 5.7364 5.7579 5.813 6.0458 5.8416 5.8506 5.436 5.6194 6.434 5.794 4.8988 5.6871 5.87 5.968 6.3543 6.086 5.4783 6.0511 5.0799 5.888 5.4756 5.764 5.457 6.1518 5.734 5.8335 5.863 5.691 5.3864 5.5351 6.3403
Probabilit Distribution for Continuous RV Example (ctd). The variable values have to be binned relative frequenc histogram. The interval lengths and numbers of bins can be refined 18 bins here 40 bins here with more data, and finer binning, the histogram outline will approach a smooth curve.
1000 data points. Smooth curve outline appears to be emerging. The smooth curve is the probabilit distribution associated with variable, the height of an 18 r old male in the US.
Discrete and Continuous Probabilit Distributions Probabilit distributions provide a means of quantifing the probabilit of obtaining a certain sample outcome. Note: Probabilities are equal to the fraction of the total histogram area corresponding to the values of interest Discrete case: 1. Probabilit of observing two heads when a coin is tossed two times is 0.5.. Probabilit of observing at least one head is 0.5 + 0.5 = 0.75 Probabilit of observing Either no heads or two Heads is 0.5 + 0.5.
Discrete and Continuous Probabilit Distributions Continuous case: 1. Does it make sense to ask what is the probabilit that an 18.o. male is 5 10? NO. Note: The distribution plot was created using relative frequencies total area under the plot is 1. 3. We compute the probabilit of a value falling in a certain range of values, b computing the area that lies under the distribution plot, over that range. The probabilit that an 18.o. male has a height that lies between 5.7 and 5.8 feet is approx 0.1.
Half-wa Summar So far: 1. How to create probabilit distributions from empirical/theoretical discrete and continuous random variables.. How to determine probabilities of a variable attaining a certain value (discrete) or attaining a value that lies within a certain range (continuous). 3. Wh is this useful? (Q: what is the probabilit of obtaining a particular sample) 4. Some common known distributions bionomial (discrete), normal (continuous), t-distribution (continuous), chi-squared (continuous) 5. Can make assumptions about the tpe of distribution associated with particular populations of interest one of the known distributions 6. Can determine features of the underling distributions b simulation, other empirical observations
The Binomial Distribution - Discrete Binomial Distribution properties: 1. experiment has n identical trials. each trial is either a success or failure ( possible outcomes) 3. P(success) = π for ever trial, fixed 4. trials are independent 5. variable, = # of successes in the n trials Outcome of one trial does not affect the outcome of an other(s) Examples. 1. = # heads when a coin is tossed n times (success = heads). = # light bulbs that fail inspection when n selected from a batch are tested (success = failed inspection) 3. = # of people who test positive for a bacterial infection out of n who have been exposed to the bacteria (success = positive test result)
The Binomial Distribution (ctd) P() = probabilit of obtaining successes in n trials of a binomial exp Example (Computing P()). Suppose there is a 5% chance that a pregnanc test fails. What is the probabilit that out of a sample of 5 tests, all 5 fail? i.e. What is P(5)? P( 0) (0.5) * (0.5) *...* (0.5) Now, what is P()? P(5) = P(the 1 st test fails and the nd test fails and the 3 rd test fails and and the 5 th test fails) 5 (0.5) 0.000977
The Binomial Distribution (ctd) What is P()? P() = P(1 st fails and nd fails and rest don t OR 1 st fails and 3 rd fails and rest don t OR ) P() (0.5)(0.5)(0.75)(0.75)(0.75) (0.5)(0.75)(0.5)(0.75)(0.75)... (0.75)(0.75)(0.75)(0.5)(0.5) P() 5 (0.5) (0.75) 5! 3 0.5 0.75 3!! 0.637 3 P() = (# was to select failing tests out of 5)* (probabilit of test failing)*(probabilit of 3 tests not failing) = 5 C *0.5 *0.75 3
The Binomial Distribution (ctd) Probabilit of successes in n trials of a binomial experiment: P( ) n!!( n )! (1 ) ( n) = # successes in n trials n = # trials π = probabilit of success on a single trial Mean and Standard Deviation of the Binomial Distribution: Mean: n Standard n ( 1 ) Deviation:
The Binomial Distribution (ctd) Example. What is the probabilit that 6 out of 0 tests fail, if the probabilit that an one test fails is 5%? Success = test fails So, π = 0.5, n = 0, = 6 0! 6 14 P(6) 0.5 0.75 6! 14! 0 *19 *18*17 *16 *15 0.5 6 *5* 4*3* *1 0.1686 6 0.75 What are the mean and deviation of this distribution? 14 0 * 0.5 5 1.94 0 * 0.5(0.75) Note: P( 7) = P(7) + P(8) + P(9) + + P(0) = 1 P( 6)
The Normal Distribution - Continuous Bell-shaped curve, smmetric about mean Numerous continuous random variables have a normal distribution eg. test scores, weight, 100m sprint times Normal curve is defined b μ and σ Empirical rule holds: approx 68% of the population lies within ± 1σ of μ P( 1 < ) = area under normal curve between = 1 and = f ( ) 1 e Normal curve, f() ( )
Computing probabilities for normall distributed populations: The Normal Distribution ) ( 1 ) ( e f 1 ) ( 1 1 1 ) ( ) ( e f P P(5.5 x <5.7) = 0.1844
The Normal Distribution Standard Normal Computing probabilities (ctd): - Normal curves var b variable values (x-axis), depend on μ and σ, but are identical in shape - Standard normal distribution: μ = 0 and σ = 1 - Tables exist for areas under this graph (Table 1, Appendix of text) - In a standards normal distribution, these are known as z- values x values between z = 0.5 and z = 1.1 are measurements that lie between 0.5 and 1.1 standard deviations awa from the mean of 0.
The Normal Distribution Reading from the table Table 1 contains areas under the standard normal curve that lie to the left of a particular z-value. P(z<0.5) i.e. Reading the entr corresponding to z 1 we obtain P(x < z 1 ) P(z<1.1) So P(0.5 x < 1.1) = P(x < 1.1) - P( x < 0.5) = 0.8643-0.6915 = 0.178 z-values P(0.5 z<1.1)
The Normal Distribution Z-scores We can use Table 1 for arbitrar normal distributions, as long μ and σ are known. This is done b standardizing the measurement values,, to standard normal values known as z-scores: z Example. Consider a normal distribution with μ = 5 and σ = 3.5. Compute the probabilit that the value of a measurement lies between 7 and 30. 7 5 30 5 P( 7 30) P( z ) P( z 1.486) P( z 3.5 3.5 0.936 0.7157 1 z 1 z 0. 079 0.5714) There is a 0.79% probabilit that takes a value between 7 and 30.
The Normal Distribution Percentiles Def: The 100pth percentile of a distribution is the value p such that 100p% of the population values lie below p and 100(1-p)% lie above p. To find percentiles of standard normal distribution reverse lookup of Table 1 Example. Find the 33 rd percentile of the standard normal distribution. Need to find z p such that 100p% of values lies below z p. I.e. Find z p such that P(z z p ) = 33% From Table 1: z p = -0.44 So, 33 rd percentile is -0.44
The Normal Distribution Percentiles To appl this idea to general normal distributions, we do a reverse standardizing: The 100pth percentile is p such that 100p% of measurements lie below p. I.e. P( p ) = 100p% we can find the z-score associated with 100p%, and convert it back to -values using: z p Example. For the normal distribution with μ =5.75 and σ = 0.4, find the 40 th percentile. p From Table 1, z p = -0.5 p = 5.75 + (-0.5)*0.4 = 5.65 The 40 th percentile of this distribution is is 5.65.