II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

II. Random Varables Random varables operate n much the same way as the outcomes or events n some arbtrary sample space the dstncton s that random varables are smply outcomes that are represented numercally. As wth events, we denote a random varable as a captal letter, such as X, Y, or Z. Realzatons of a random varable are denoted usng lower case. For nstance, x may be used to represent a possble value of the varable X. Random Varables Varable Types Dscrete/Categorcal (e.g., ethnc group, gender) Contnuous (e.g., age, heght, weght) O O 4 Varables Map Outcomes to umbers Expermental Outcomes O O3 Possble umerc Values of a Random Varable x 3 x x x 4 Example II.A Consder an experment where we flp a con: the sample space s S = {H, T}. We can defne a random varable X such that f the flp turns up heads and 0 f the flp s tals. Example II.B Consder an experment where we sample an adult and ask for hs/her age n years: we mght vew the sample space as somethng lke S = {8, 9, 0,,0}. Suppose we let Y = the person s age n years. Then n ths case the mappng of outcomes n S to numerc values s straghtforward.

Categorcal Random Varables Possble values of a categorcal random varable have probabltes determned by a probablty mass functon, or pmf. Suppose that x,x,,x are the possble outcomes of a random varable X. The pmf for X, denoted by f(x) or P( x), must have the followng propertes:.. 0 P( x ), for =, K,, and = P( x ) =. Example II.C Let X be the sum of two far sx-sded dce. Then the pmf for X s gven by x 3 4 5 6 7 8 9 0 P( x ) /36 /36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 /36 /36 P(X <= x ) /36 3/36 6/36 0/36 5/36 /36 6/36 30/36 33/36 35/36 P( x) Example II.C (cont d) The thrd column n the prevous table contans the cumulatve dstrbuton functon (cdf), or P(X x ). ote that the pmf determnes the cdf, and vce versa. The pmf and cdf are plotted below: 0.0 0.05 0.0 0.5 PDF 3 4 5 6 7 8 9 0 P(X <= x) 0.0 0. 0.4 0.6 0.8.0 CDF 4 6 8 0 Contnuous Random Varables As opposed to a probablty mass functon, a contnuous random varable has a probablty densty functon, or pdf. Suppose a contnuous random varable X can take any value between l and u. We denote the pdf of X by f(x), for l < x < u. However, note that f(x) P( x). Why must ths be so? How then do we nterpret the pdf for a contnuous random varable? x x

Interpretng the Probablty Densty Functon It doesn t make sense wth a contnuous varable X to talk about P( x). However, we can use the pdf f(x), for l < x < u, to compute the probablty that we observe a value for X that s wthn an nterval, for example P(a X b). ote: the probablty P(a X b) s gven by the area under the curve f(x) between the ponts a and b, where a b and a and b are possble values of X. Queston: how does one fnd the area under a contnuous curve between two ponts? Propertes of the Probablty Densty Functon Gven a pdf f(x), we can compute the probablty over an nterval by P( a X b) = f ( x) dx. Ths probablty has two nterpretatons:. It represents the probablty that a randomly selected ndvdual from the underlyng populaton wll have a value of X that falls wthn the nterval (a, b).. It represents the proporton of ndvduals n the underlyng populaton who have values for X that fall wthn the nterval (a, b). b a Propertes of the Probablty Densty Functon Hence, a pdf f(x) must have the followng propertes:. f(x) 0, for l x u. u l. f ( x) dx =. In other words, a pdf f(x) must be postve over ts doman (l, u), and the area under the curve over that doman must be equal to. Example II.D Vctms of a partcular type of cancer have post-dagnoss survval tmes that follow the pdf plotted on the followng slde. If X represents the survval tme (n months) of a gven cancer patent, then the pdf for X s f ( x) = 5 Verfy that ths s a legtmate pdf. e x /5, x 0. On the plot, dagram the area that represents the probablty that a gven cancer patent survves no longer than 0 months. Dagram the area that represents the proporton who survve between and 8 months. Dagram the area representng the proporton who survve longer than 8 months. 3

Example II.D, cont d Probablty densty functon for survval tme of cancer patents. Example II.D, cont d Compute the probabltes that you dagrammed on the plot of the pdf. PDF P.D.F. f(x) of of Chp Survval Lfetme Tme -- f(x) X 0.0 0.0 0.0 0.03 0.04 0.05 0.06 0 0 0 30 40 50 Survval Tme Lfetme X After of Chp Dagnoss -- x (months) Fnd and plot the cumulatve dstrbuton functon. Use the cdf to compute the probabltes that you dagrammed on the plot of the pdf. What s the medan survval tme? What s the 75 th percentle of survval? The Mean or Expectaton of a Random Varable Based on Example II.C, suppose that I propose the followng game: you pay me $7.50, then roll the dce. I pay you the sum of the dce n dollars. From a purely economc standpont, should you play? In order to answer that queston, you need to know somethng about the mean of X, or the average sum of the dce. By defnton, the average or expected value of a categorcal random varable X s gven by X ) = = x P( x ). Defnton of Expectaton By defnton, the average or expected value of a categorcal random varable X wth pmf at a value x gven by P(X= x ) for =,, s computed as X ) = = x P( x The expected value of a contnuous random varable X wth pdf gven by f(x), for l x u, s computed as X ) = u l xf ( x) dx. In ether case, we can vew the expectaton as a weghted average. ). 4

Example II.E The expected value of the dstrbuton n Example II.C s gven by X ) = (/ 36) + 3( / 36) + L+ (/ 36) = 7. Playng the game descrbed on the prevous slde probably wouldn t be your smartest move that s, what would be your average net gan (or loss)? In general, how do we nterpret expectaton? How would you descrbe expectaton to the average person? (Hnt: recall our dscusson of long-term relatve frequency.) Example II.F What s the average survval tme for the cancer patents n Example II.D? The expectaton n ths case s computed by X ) = 0 = lm x x x /5 x /5 x /5 ( x) dx = e dx = { xe 5e } 0 0 5 x /5 x /5 { xe 5e } (0 5) = 0 0 + 5 = 5. xf Hence, the average lfe expectancy after dagnoss s 5 months. How do the mean and medan compare? Why are they dfferent? The Varance of a Random Varable The mean s used to descrbe the center of a dstrbuton. The varance s used to descrbe the spread of a dstrbuton about ts mean. As shown n the plot below, two dstrbutons wth the same mean can have very dfferent degrees of varablty about the mean: Interpretng the Varance The varance s actually the average squared devaton of the possble values of a random varable about ts mean. In some sense, you can thnk of varance probablstcally: () If you sample one observaton at random from a dstrbuton wth hgh varance, then there s a relatvely hgh probablty that the observaton wll le far away from the mean. () If you sample an observaton from a dstrbuton wth low varance, then there s a relatvely small probablty that the observaton wll le far from the mean. 5

Defnton of the Varance By defnton, the varance of a categorcal random varable X wth pmf at a value x gven by P(X= x ) for =,, s computed as Var( X ) = [ x X )] P( x ) = x = The varance of a contnuous random varable X wth pdf gven by f(x), for l x u, s computed as u l Var( X ) = { x X )} f ( x) dx = = u l P( x ) [ X )]. x f ( x) dx { X )}. ote that the last part of the equatons gven above result from the fact that Var(X) = X ) {X)}. Ths s typcally much easer to compute by hand. Example II.G For the dstrbuton n Example II.C, Var( X ) = (/ 36) + 3 ( / 36) + L+ (/ 36) 7 = 54.83 49 = 5.83 Hence, the standard devaton s.4. Example II.H For the dstrbuton of survval tmes n Example II.D, x x /5 X ) = x f ( x) dx = 0 e dx 0 5 = = x /5 x /5 x /5 { x e 5xe ()5 e } 0 x /5 x /5 x /5 lm{ x e 5xe ()5 e } x = 0 0 0 + ()5 = ()5. The varance then s Var(X) = X ) {X)} = (5) 5 = 5 (months). Hence, the standard devaton s 5 months. {0 0 ()5 } Some Addtonal Propertes of the Mean and Varance In practcal research settngs, we are almost always nterested n the dstrbuton of sums of random varables. In partcular, as we shall see wthn a few weeks, we need to know somethng about the mean and varance of the sample mean n n = X, where the values X,,X n represent a sample of observatons from some populaton (e.g., we sample 0 USU students and measure ther ndvdual heghts). Queston: f each subject n the sample has a mean X ) = µ, and a varance Var(X ) = σ, then what are the mean and varance of X? 6

Mean and Varance of a Lnear Functon Let X be a random varable such that X) = µ and Var(X) = σ. Suppose we re nterested n a random varable Y defned as Y = a + bx, where a and b are constants. In other words, Y s a lnear functon of X. Then Y) = a + bx) = a) + bx) = a + bµ, and Var(Y) = Var(a + bx) = Var(a) + b Var(X) = 0 + b σ = b σ. Example II.I The average hgh temperature durng the month of September n Logan, UT, s 67.6 F, wth a standard devaton of 4. F. That s, f X s a varable that represents the hgh temperature n Logan on a randomly selected September day, then X) = 67.6 and Var(X) = 4.. What s the average daly temperature n degrees Celsus? What s the standard devaton of daly temperature n degrees Celsus? Mean and Varance of a Lnear Combnaton of Independent Random Varables Let X be a random varable such that X ) = µ and Var(X ) = σ, and X be a random varable such that X ) = µ and Var(X ) = σ, where X and X are ndependent. Suppose we re nterested n a random varable Y defned as Y = c X + c X, where c and c are constants. In other words, Y s a lnear combnaton of X and X. Then Y) = c X + c X ) = c X ) + c X ) = c µ + c µ, and Var(Y) = Var(c X + c X ) = c Var(X ) + c Var(X ) = c σ + c σ. ote that the expectaton of Y gven above holds regardless of the dependence of X and X. However, X and X must be ndependent for the varance formula to hold. Example II.J Consder the cancer patents whose survval follows the dstrbuton n Example II.D. Suppose that we randomly sample ndvduals wth ths cancer, and measure ther survval after dagnoss. What s the expected combned survval of these ndvduals, n months? In years? What s the standard devaton of ther combned survval, n months? In years? 7

The Mean and Varance of a Sample Mean It mght sound odd to talk about the mean and varance of a mean, but recall how a sample mean (whch s just a smple average) s computed: we randomly sample n subjects X,,X n from a populaton wth an underlyng expected value of µ and varance of σ. That s, X ) = µ and Var(X ) = σ, for =,,n. The sample mean s defned as n n = X, In other words, the sample mean s just a lnear combnaton of the ndependent random varables X,,X n. Hence, the sample mean s also random varable, and so t has a dstrbuton of ts own, wth a mean and and a varance. The Mean and Varance of a Sample Mean ote the dstncton between () a sample mean and () the expectaton and varance of the underlyng dstrbuton (we can t emphasze ths enough): X Sample Mean µ and σ Expectaton and Varance Random Varable Fxed Parameters (Constants) The Mean and Varance of a Sample Mean Havng sampled the n subjects X,,X n at random from a populaton wth an underlyng expected value of µ and varance of σ, we can assume that these subjects are ndependent. Then usng the rules gven four sldes prevous the expectaton of the dstrbuton of the sample mean X s µ, and varance of X s σ /n. Can you demonstrate why? Example II.K Consder agan the cancer patents whose survval follows the dstrbuton n Example II.D. Suppose that we randomly select 0 ndvduals wth ths cancer, and compute the average survval tme X for the sample. What s the expected value of X? What s the varance of X? By what factor do we need to ncrease our sample sze to dvde the standard devaton of X n half? 8