Random Varables Generally the object of an nvestgators nterest s not necessarly the acton n the sample space but rather some functon of t. Techncally a real valued functon or mappng whose doman s the sample space s called a Random Varable, t s these that are usually the object of an nvestgators attenton. If the mappng s onto a fnte (or countably nfnte set of ponts on the real lne, the random varable s sad to be dscrete. Otherwse, f the mappng s onto an uncountably nfnte set of ponts, the random varable s contnuous. Ths dstncton s a nusance because the nature of thng whch descrbes the probablstc behavour of the random varable, called a Probablty Densty Functon (denoted here as f(x and referred to as a p.d.f., wll dffer accordng to whether the varable s dscrete or contnuous. In the case of a dscrete random varable X, wth typcal outcome x, (t shall be assumed for convenence that the x s are ordered wth I from smallest to largest, the probablty densty functon f(x s smply the sum of the probabltes of outcomes n the sample space whch result n the random varable takng on the value x. Bascally the p.d.f. for a Dscrete Random Varable obeys rules:. f ( x. f ( all possble x In the dscrete case f(x P(Xx, n the contnuous case t s not possble to nterpret the p.d.f. n the same way ndeed, snce X can take on any one of an uncountably nfnte set of values, we cannot gve t a subscrpt. when the random varable X s contnuous wth typcal value x, the probablty densty functon f(x s a functon that, when ntegrated over the range (a, b, wll yeld the probablty that a sample s realzed such that the resultant x would fall n that range (much as the probablty functon for contnuous sample spaces was defned above. In ths case the p.d.f. wll obey three basc rules: b.. x. dx P( a x b a 3. f ( x dx Assocated wth these denstes are Cumulatve Dstrbuton Functons F(x whch n each case yeld the probablty that the random varable X s less than some value x. Algebracally these may be expressed as:
P( X x F( x k t f ( x ; where x x f P ( X x F( x ( z dz k x x k+ Note that n the case of dscrete random varables F(x s defned over the whole range of the random varable and n the case of contnuous dstrbutons d(fx/dx f(x.e. the dervatve of the c.d.f. of x gves us the p.d.f. of x. Expected Values and Varances The Expected Value of a functon g(x of a random varable s a measure of ts locaton and s defned for dscrete and contnuous random varables respectvely as follows: E ( g( X g( x f ( all possble g( x x x E ( g( X dx The expectatons operator E( s smply another mathematcal operator. Just as when the operator d/dx n front of a functon g(x tells us to take the dervatve of the functon g(x wth respect to x accordng to a well specfed set of rules so E(g(X tells us to perform one of the above calculatons dependent upon whether X s contnuous or dscrete. Lke the dervatve and ntegral operators, the expectatons operator s a lnear operator so that the expected value of a lnear functon of random varables s the same lnear functon of the expected values of those random varables. Ths property wll be of useful shortly. Asde from the general applcablty of the above formulae there are many specal types of g( functon of nterest to statstcans, each generatng partcular detals of the nature of the random varable n queston, thngs lke moment generatng functons and characterstc functons that are the materal of a more advanced text and thngs lke the Expected Value functon and Varance functon that are of nterest to us here. Two g( functons of specal nterest. g(x X Obvously ths yelds E(X, the expected value of the random varable tself (frequently referred to as the mean and represented by the character μ, whch s a constant provdng
a measure of where the centre of the dstrbuton s located. The metrc here s the same as that of the random varable so that, f f(x s an ncome dstrbuton measured n $US, then ts locaton wll be n terms of a $US value. The usefulness of the lnearty property of the expectatons operator can be seen by lettng g(x a + bx where a and b are fxed constants. Takng expectatons of ths g(x yelds the expected value of a lnear functon of X whch, followng the respectve rules of summaton and ntegraton, can be shown to be the same lnear functon of the expected value of X as follows: E( a + bx ( a + bx a f ( x + b xf ( x a + be( X all possble x all possble x ( a + bx dx a dx + b all possble E ( a + bcx x dx a + be( X What the above demonstrates s two basc rules for expectatons operators regardless of whether random varables are dscrete or contnuous:. E(a a The expected value of a constant s a constant. E(bX be(x The expected value of a constant tmes a random varable s equal to the constant tmes the expected value of the random varable x. g(x (X-E(X, Functons of ths form yeld th moments about the mean. The metrc here s the th power of that of the orgnal dstrbuton so that f f(x relates to ncomes measured n $US then the th moment about the mean s measured n ($US. Sometmes the th root of g(x s employed snce t yelds a measure of the approprate characterstc n terms of the orgnal unts of the dstrbuton. Furthermore to make dstrbutons measured under dfferent metrcs comparable the functon g(x deflated by the approprate power of E(X (provded t s not s consdered, provdng a metrc free comparator. The second moment ( s of partcular nterest snce as the varance (frequently represented as σ or V(X, ts square root s referred to as the standard devaton t provdes a measure of how spread out a dstrbuton s. Of partcular nterest here s the Coeffcent of Varaton (CV gven by: The Coeffcent of Varaton: E( X E( X CV ( E( X σ μ The thrd moment provdes a measure of skewness or how asymmetrc a dstrbuton s and the fourth, a measure of kurtoss or how peaked a dstrbuton s but these wll not be consdered here.
Whch may be nterpreted as a metrc free measure of dsperson. Returnng to the varance, regardless of whether the varable s dscrete or contnuous, notce that: V(x E((X-E(X E(X E(XX + (E(X E(X E(E(XX + (E(X E(X (E(X So that the varance s equal to the expected value of X less the square of the expected value of X (these are not the same thng. Furthermore, agan regardless of whether the random varable s dscrete or contnuous notce that for Y a + bx: V(Y E((Y-E(Y E((a + bx (+be(x E((bX be(x b E((X-E(X b V(X So that the varance of a constant s zero and the varance of a constant tmes a random varable s the square of the constant tmes the random varable. The varance of a lnear functon of several random varables s a lttle more complcated, dependng as t does on the relatonshps between the random varables t wll be dealt wth when multvarate analyss s consdered later on. There are numerous dscrete and contnuous probablty densty functons to sut all knds of purposes, one of the arts n practcng statstcs s that of choosng the one most approprate for a partcular problem. Here we shall consder two examples of each to get an dea of what they are lke. Examples of Dscrete Probablty Densty Functons The Bnomal Dstrbuton. The Bnomal Dstrbuton s founded upon a process n whch the same experment s ndependently repeated n tmes under dentcal condtons. The Sample Space for the experment contans two possble events A and A c and the ssue at hand s how many tmes n n repettons A occurs. Suppose that P(Ap (and consequently P(A c -p and when A happens n the th repetton of the experment so that X s a random varable such that P(X p and P(X -p for,,, n. Lettng x be the outcome of the th experment f(x, the p.d.f. for the th experment s gven by: f ( x p x x ( p
So that when X returns a the p.d.f. s p and when t returns a zero the p.d.f. s -p. usng the notons of Expected Values and Varances t can be shown that E(Xp and V(x p(-p as follows: E( X all possblex xf ( x. p ( p +.p ( p p V ( X ( x E( x f ( x ( p. p ( p + ( p. p ( p all possble x p ( p + ( p p p p + p p + p p p Snce the repettons of the experment are ndependent the jont probablty of the n repettons s: n n n x ( n x x x f ( x f ( x... f ( x p ( p p ( p n If A occurs k tmes the sum of the x s wll equal k and ths formula may be wrtten as p k (-p n-k and corresponds to the probablty of gettng a partcular sequence of experments where A occurred k tmes. The number of ways that A could happen k tmes n n experments s n!/(k!(n-k! so that the probablty of k occurrences n n experments s gven by: Whch s the Bnomal Dstrbuton. n! k ( n k p ( p k,,,..., n k!( n k! 3. 3 The Posson Dstrbuton The Posson Dstrbuton s employed n stuatons where the object of nterest s the number of tmes gven event occurs n a gven amount of space or perod of tme. Thus for example, t could be used to study the number of crashes that take place at a partcular spot over a perod of a week or t could be used to nvestgate the number of faults n a fxed length of steel. The presumpton n ths model s that successve weeks, or successve lengths of steel are ndependent of one another and that the same probablty model s applcable n each successve observaton. Ths s much the same..d. assumpton we made n the case of the Bnomal Dstrbuton. Lettng x be the number of occurrences of the event the p.d.f. s gven by: x x e λ λ λ >, x,,,...! The unknown parameter s such that E(x λ and V(x λ.
Examples of Contnuous Probablty Densty Functons The Unform or Rectangular Dstrbuton Ths s perhaps the smplest dstrbuton avalable. It descrbes the behavour of a contnuous random varable X that exsts n the nterval [a, b] whose probablty of beng n an nterval [c, d] layng wthn [a, b] s gven by (d-c/(b-a. That s to say the probablty of t layng n any nterval n ts range s equal to the proportonate sze of the nterval wthn the range. Furthermore X wll have the same probablty of landng n any one of a collecton of equal szed ntervals n the range (hence the name unform. Ths dstrbuton wth a and b s frequently used as the bass for random number generators n software packages and computer games, largely because t s relatvely easy to generate other more complex random varables from t. Formally: for a X b a otherwse b F ( x for x < a for a X b b x a a for x > b The Normal Dstrbuton The normal dstrbuton s probably the most frequently employed dstrbuton n statstcs wth good reason, there are sound theoretcal reasons why t can be employed n a wde range of crcumstances where averages are used. Its p.d.f. s of the form: f ( x μ σ ( x e πσ The parameters μ and σ respectvely correspond to the mean and varance of the random varable x. The fact that X s normally dstrbuted wth a mean μ and a varance σ s often denoted by X ~ N(μ, σ. The normal dstrbuton does not have a closed form representaton for the cumulatve densty F(X (.e. we cannot wrte down an algebrac expresson for t but ths wll not present any dffcultes snce t s tabulated and most statstcal software packages are capable of performng the approprate calculatons. The dstrbuton s symmetrc about the mean, bell shaped (hence the termnology are you gong to bell the mark sr? wth extremely thn tals to the extent that more than 99% of the dstrbuton lays wthn μ±σ.
Normal random varables possess the very useful property that lnear functons of them are also normal. Hence f X s normal then Z a + bx s also normal, and usng our rules for expectatons, E(Z a + be(x and V(Z bv(x. Lettng a -μ/σ and b /σ, Z ~ N(, whch s referred to as a Standard Normal Random Varable (ndeed the standard normal varable s frequently referred to wth the letter z, hence the term z score. Ths s most useful snce N(. s the dstrbuton that s tabulated n textbooks and programmed n software packages. Suppose we need to calculate P(X<x s some known value and X ~ N(μ, σ where the mean and varance are known values. Then snce: P ( X < x P( X μ < x μ P( X μ σ < x μ σ And snce Z (X-μ/σ: x μ σ P ( X < x P( Z < So all that s needed s to calculate the value (x -μ/σ and employ the standard normal tables or software package to evaluate the probablty.