Probability and statistics - PDF Free Download

4 Probability ad statistics Basic deitios Statistics is a mathematical disciplie that allows us to uderstad pheomea shaped by may evets that we caot keep track of. Sice we miss iformatio to predict the idividual evets, we cosider them radom. However, the collective eect of may such evets ca usually be uderstood ad predicted. Precisely this problem is the subject of statistical mechaics, which applies mathematical statistics to describe physical systems with may degrees of freedom whose dyamics appears radom to us due to complexity. Evets or outcomes are mathematically represeted by values that a radom variable x ca take i a measuremet. Most geerally, if the radom variable ca have ay value from a set X, the a evet is deed by a subset A X. If oe measures x ad ds x A, the the evet A occurred. The most elemetary evet {x} is specied by a sigle value that x ca take. Esemble ad probability: Oe assumes that every measuremet is doe i exactly the same coditios ad idepedetly of all other measuremets. A way to facilitate such a complete equality of all measuremets is to prepare a esemble of idetical systems ad perform oe measuremet of x o each. If oe obtais x times a measuremet outcome x, the the objective probability px of this outcome is deed as: x px p{x} lim For o-elemetary evets: pa x A p{x} lim x A where x A is the umber of evets A i measuremets the umber of times it was foud that x A. Oe must ofte estimate a objective probability usig fudametal priciples, as i statistical mechaics; such estimates ca be cosidered subjective probabilities. I geeral, probability has the followig properties which trivially follow from the deitio: px x X px px x X A evet represeted by the uio A B of two sets A ad B occurs if either oe of the evets A or B occurs. It is possible for both evets A ad B to occur at the same time if their itersectio A B is ot a empty set. However, we cosider A ad B as outcomes of a sigle measuremet o a scalar radom variable. The evets A ad B are ot idepedet sice the occurrece of oe ca exclude the other whe their itersectio is empty. Uder these assumptios, the followig ca be easily deduced from the deitio of probability: pa B pa + pb pa B Two evets A ad B are idepedet if they correspod to the measuremets of two ucorrelated radom variables a X a ad b X b. The evet A occurs if a A ad the evet B occurs if b B. If we label by AB or A B the evet i which both A ad B happe mathematically, a, b A B, the it ca be easily show from the deitio of probability that: pab papb Coditioal probability. If a evet B occurs ad oe observes it, the the coditioal probability pa B that a evet A would occur as well at the same time is: pa B pab pb If A ad B are idepedet, the the probability of A has othig to do with B ad pa B pa.

5 If the radom variable x takes values from a cotiuous set X such as real umbers, the oe ca sesibly dee probability oly for a rage of values, e.g.: x x < x 2 px, x 2 lim where x x < x 2 is the umber of outcomes x x < x 2 amog experimets. Cumulative probability fuctio: P x p, x is the probability that a measuremet outcome will be smaller tha x. The most direct substitute of the probability fuctio itself is probability distributio fuctio PDF, or probability desity: fx dp x dx The probability that a measuremet outcome will be i the iterval x, x + dx is fxdx. The PDF is ormalized: fxdx The average, or expectatio value, of the radom variable x X is labeled as x or x, ad deed by: x ˆ xpx or x dx xfx x X for discrete ad cotiuous distributios respectively. It is the sigle value that represets best the etire distributio of radom outcomes. If the distributio scatters little about the most probable outcome x most, the x x most but they are geerally ot equal. If the distributio scatters symmetrically about x which eed ot be equal to x most, the x x. The average value is especially useful for learig about the statistics of the sum of radom variables. Oe similarly dees the expectatio value of ay fuctio of x: gx ˆ gxpx or gx dx gxfx x X The variace of a radom distributio is deed by: Varx x x 2 x Xx x 2 px It is usually more coveiet to calculate it as: Stadard deviatio x x 2 x 2 2x x + x 2 x 2 2 x x + x 2 x 2 x 2 σ Varx x x 2 is the measure of how much the radom outcomes scatter about the average value of the distributio the average of the deviatio from the mea; this is squared iside the average i order to avoid the cacellatio of scatters i opposite directios, but the square root evetually udoes this squarig ad makes σ directly comparable to the outcomes of x. A sharp distributio that scatters little has a small σ.

6 Uiform distributio The radom variable x whose all possible outcomes are equally likely belogs to the uiform probability distributio UX. If the set X of all possible outcomes is ite ad has elemets, the: px x X Otherwise, if X a, b is a cotiuous rage of real umbers betwee a ad b, the: Biomial distributio The radom variable where x ˆb a x i fx b a dx xfx ˆb dx x b 2 a 2 b + a b a b a 2 2 a x i {, with probability p, with probability p belogs to the biomial distributio: B, p. This distributio describes the statistics of the umber m of radom evets x i, which idepedetly occur with probability p. Example applicatios are tossig a coi, throwig a dice how may times oe gets a particular umber o the dice with p 6 i attempts, radom walk how far does a radom walker reach after steps if the probability of a forward step is p, etc. Let W be the probability that evets will occur i attempts. I order to determie W experimetally, oe must carry out M times a sequece of measuremets of the radom variables x i, ad cout how may times M there were positive x i outcomes i a sequece. We will determie this probability aalytically. Cosider a sigle sequece. The probability that the sequece will be, for example, is equal to the probability that x p ad x 2 p ad x 3 p ad... x p ad x p. Sice all idividual evets are idepedet, the probability of the whole sequece is the product of idividual evet probabilities: W p p p p p The probability of a particular log sequece ca be very small because the umber of possible sequeces is very large. If there were outcomes x i i the sequece, ad hece outcomes x i, the: W sequece p p } i There are dieret sequeces that have the same umber of positive outcomes. idepedet ad o-overlappig, so their probabilities add up: W p p They are also where the biomial coeciet! + + 2!!!

is the umber of sequeces that have positive outcomes, i.e. the umber of ways oe ca select elemets from a set of elemets there are ways to select the st elemet, the ways to select the 2 d elemet, dow to + ways to select the last th elemet from the remaiig oes; the umerator o the right is the umber of possible selectios, but amog them each group of selected elemets is assembled i! dieret orders ad we do ot care about the order. ormalizatio: W p p p + p] To see this, just cosider brute-force expadig the square brackets above without disassemblig the brackets for p. There are copies of p + p] that are multiplied together ad geerate the sum of all possible products of factors that ca be either p or p... The expectatio value of biomial distributio B, p is p, which ca be easily uderstood give that it represets the umber of positive outcomes of probability p out of attempts recall the very deitio of objective probability. Formally: W p p p!!! p p!! ]! p p p p!! p p The last lie follows from the ormalizatio of the biomial distributio B, p. The variace of B, p is p p. We d this from: 2 ad!!! p p 2 p 2 p 2 p 2 2 2 2 2!! p p 2! 2! 2 2]! p 2 p 2 2 2 p p 2 Var 2 2 2 + 2 p 2 + p p 2 2 p p Poisso distributio Cosider a physical system that experieces radom ucorrelated evets of certai kid with average rate of λ evets per uit time for example, a sample of radioactive atoms whose lifetime is τ λ will experiece decay evets with rate λ. Sice the evets are ucorrelated, their umber t i a time iterval t does ot deped o the istat t at which oe begis to cout them. The statistics of the radom variable t is described by the Poisso distributio, t Poisλ, t.

8 The probability W λ,t to observe evets i a time iterval t ca be obtaied from the biomial distributio. Divide the time iterval t ito iitesimal itervals t t/. If p is the probability of a sigle evet to occur i a time iterval t, the the average umber of evets that occur i t is: t p + p p λ t λt We computed this rst as a statistical expectatio value of the radom variable t that couts how may evets occurred i the iterval t. I doig so, we eglected the possibility that more tha oe evet could have happeed i such a small iterval t, ad treated t was a biary variable that takes oly values or. The, we used the deitio of rate λ to express t i terms of t. ow, sice t is biary, we ca regard the total umber of evet i the period t t i t as a radom variable from the biomial distributio B, p i the limit, p λt/. Each idividual measuremet i the equivalet B, p distributio refers to whether a evet i t occurred with probability p i a iitesimal iterval i t t < i t. Therefore, W λ,t lim p p lim!!!! λt!! λt λt lim 2 + λt lim lim λt e λt! λt λt λt λt λt λt! λt λt lim λt I the 2 d lie we just caceled out the commo factors i! ad! ad reorgaized the terms ivolvig λ, without makig ay approximatios. I 3 rd lie we eglected all appearaces of ext to, sice is ite ad. ext, we pulled the factor of ito the term, ad the approximated λt/ i the deomiator. I the last step, we used the deitio of the expoetial fuctio expx lim + x/. The expectatio value ad variace of Poisλ, t ca be computed brute-force usig the obtaied probability W λ,t, or elegatly from the Biomial distributio results: t lim p lim λt λt Var t lim p p lim λt λt λt We see that the average has the expected value from the deitio of the rate λ as the average umber of evets per uit time, ad the variace is equal to the average. Expoetial distributio The time iterval τ > betwee two successive evets that occur with a averate rate λ is a radom variable belogig to the expoetial distributio Expλ. This distributio is cotiuous.

9 If fτ is the expoetial PDF, the fτdτ is the probability that a evet will occur i the time iterval τ t < τ + dτ after the previous evet. We ca obtai this probability as the product of Poisso probabilities for two idepedet outcomes: o evet occurs i t < τ, ad 2 oe evet occurs i τ t < τ + dτ: I the limit δτ, we d: fτdτ W λ,τ W λ,dτ λτ! fτ λe λτ Verify by ormalizatio usig the chage of variables x λτ: dτ fτ λ dτ e λτ e λτ λdτ e λdτ! dx e x The average time iterval betwee two successive Poisso evets is as expected from the deitio of rate λ: The variace is: τ dτ τfτ λ dτ τe λτ λ Varτ τ 2 τ 2 λ dτ τ 2 e λτ λ 2 λ 2 dx xe x λ dx x 2 e x λ 2 2 λ 2 λ 2 λ 2 so the stadard deviatio is equal to the mea. The last itegral is solved usig itegratio by parts, ad is kow as the Gamma fuctio: Γ + Gaussia distributio dx x e x x e x + dx x e x! dx e x! Gaussia distributio is of fudametal importace i mathematics ad statistical mechaics because of the cetral limit theorem proved later: the average of may idepedet radom variables always coverges to a Gaussia distributio. May quatities of iterest i statistical mechaics will be averages over may degrees of freedom i macroscopic systems e.g. the average velocity of a particle i a gas, so their statistics will be described by Gaussia distributios. Gaussia or ormal distributio µ, σ is a cotiuous probability distributio deed o x, by the PDF: fx σ 2π exp x µ2 2σ 2 The mea ad stadard deviatio are µ ad σ respectively. The coeciet /σ 2π is required by ormalizatio: dx fx σ 2π x µ2 dx exp 2σ 2 σ 2π 2σ 2 dξ e ξ2 σ 2π 2σ 2 π Here, we chaged variables to ξ x µ/ 2σ 2 ad used the well kow itegral: I dξ e ξ2 π

which ca be derived from: ˆ2π I 2 dx dy e x2 +y 2 dθ rdr e r2 2π dr 2 e r2 π 2 by switchig from Cartesia x, y to polar r, θ coordiates. The mea is: x dx xfx σ 2π σ 2π x µ2 dx x µ + µ exp 2σ 2 x µ2 dx x µ exp 2σ 2 + µ dx fx µ Usig the trick x x µ + µ we separated the iitial itegral ito two. The rst oe vaishes because it's a itegral of a odd fuctio x µ exp over a symmetric iterval. The secod itegral is simply µ multiplied by the ormalizatio of the PDF. The variace is: Varx x x 2 σ 2π 2σ2 3 2 dx x µ 2 fx σ 2π σ 2π 2σ2 3 2 so that stadard deviatio is σ. Cetral limit theorem dξ ξ 2 e ξ2 σ 2π 2σ2 3 2 π 2 σ2 dx x µ 2 x µ2 exp 2σ 2 ξ e ξ2 2 The th momet of the probability distributio of a radom variable x is deed by: µ x The zeroth momet is always, the rst momet is the mea of the distributio, etc. All momets ca be obtaied by takig derivatives of the geeratig fuctio Gt at t : Gt e tx µ d G dt t This follows from the Taylor expasio of the expoetial fuctio: Gt e tx xt m x m t m m! m! m Whe we take the th derivative term-by-term, usig m + 2 dξ e ξ2 d dt tm mt m for m, d 2 dt 2 tm d dt mtm mm t m 2 for m 2 etc.

2 we get a sum of various o-egative powers of t: d G dt m x m m! d t m dt m x m m! mm m 2 m + t m The, evaluatig this th derivative at t kills all terms with o-zero powers of t, leavig behid just the rst term t : d G t x ] dt 2 + t + t + t 2 t + x! µ! t! Cumulats κ of the probability distributio of a radom variable x are deed by the geeratig fuctio Kt: Kt log e tx κ d K t dt The rst few cumulats are: κ dk dt κ 2 d2 K t dt 2 d ] dg dt G dt { κ 3 d3 K t dt 3 d dt G 2 κ K log e x log d Gt t dt log t G t G 2 d 2 G dt 2 G d 2 G dt 2 G dg µ dt t e x x ] 2 dg µ dt 2 µ 2 Varx t ]} 2 dg µ dt 3 3µ 2 µ + 2µ 3 t We see that the zeroth cumulat vaishes, the rst cumulat is the mea, ad the secod cumulat is the variace. Homogeeity of momets ad cumulats: If µ x variable x respectively, the for ay costat c: Proof: For momets: µ cx µ cx ad κ x c µ x, κ cx c κ x cx c x c µ x are momets ad cumulats of the radom For cumulats: The cumulat κ x is geerated by the th derivative of the geeratig fuctio Kt at t. Whe expaded i terms of the geeratig fuctio Gt for momets, κ x becomes a sum of various combiatios of G derivatives which always have the same total power of dt i the deomiators see the example for κ 2 above. Therefore, each such term becomes a product of several momets, e.g. µ x... µ x µ x 2 but always with + 2 + + k. Sice µ cx κ cx C k µ cx k after substitutig t which makes Gt, c µ x, the: µ cx 2... µ cx k c + + k C k µ x c C k µ x c κ x µ x 2 µ x 2... µ x k... µ x k If oe kows all momets of a probability distributio, or alteratively all cumulats, oe ca recostruct the PDF.

22 Proof: If all momets x m, m {,, 2... } are kow, the oe ca costruct a auxiliary fuctio gt Git usig the Taylor expasio of the geeratig fuctio Gt: gt Git e itx m x m i m t m m! Here, i is the imagiary uit. The fuctio gt is useful because it is related to the Fourier trasform of the Dirac delta fuctio δt: dt e itx 2πδx where δx is deed by: δx {, x, x }, dx δx The above relatioship betwee gt ad δt ca be prove by calculatig a well-deed Gaussia itegral Dx; α dt expitx αx 2 with a ite positive α, ad the showig that Dx; α coverges to δx i the α limit. The Dirac delta fuctio has the followig essetial property for ay fuctio fx: dx δx x fx fx which follows directly from its deitio. Applied to the PDF fx, it allows us to calculate fx at ay desired poit x : fx δx x dt e itx x dt e itx e itx dt gte itx if we ca obtai gt, i.e. if we kow all momets. This completes the proof. Sice Gt e Kt ad Kt is fully determied by the kowledge of all cumulats, we ca similarly recostruct the PDF from the cumulats. The Gaussia distributio µ, σ has cumulats κ µ, κ 2 σ 2, ad all other cumulats equal to zero. Proof: First, calculate the geeratig fuctio for momets: Gt e tx σ 2π σ 2π 2σ 2 eµt π e µt e 2 σ2 t 2 x µ2 dx exp 2σ 2 + tx dξ e ξ2 +tξ 2σ 2 +µ eµt π dξ e ξ 2 t 2σ 2 2 + 4 t2 2σ 2 dξ e ξ2 +tξ 2σ 2 We chaged variables from x to ξ x µ/ 2σ 2 i the secod lie. The we completed the square i the expoet i the third lie: the expoets of e are equal i the third lie ad the

last itegral of the secod lie, but the third lie becomes a costat exp 2 σ2 t 2 times a pure Gaussia itegral that evaluates to π as we saw before. ow that we kow Gt, we immediately kow the geeratig fuctio for cumulats: Kt log Gt µt + 2 σ2 t 2 The derivatives of Kt geerate the cumulats ad we clearly reproduce the stated result: all cumulats are zero except κ µ ad κ 2 σ 2. Cetral limit theorem: Let x i, be idepedet radom variables from the same probability distributio with mea µ ad variace σ 2. Let X be the radom variable deed by: X σ x i µ I the limit, the statistics of X coverges to the Gaussia distributio, with mea ad variace, regardless of the distributio of x i. Proof: Costruct the geeratig fuctio for the cumulats of X: ] K X t log e tx t log exp σ ] txi µ x i µ log exp σ ] txi µ log exp σ txi log exp σ exp tµ σ t K i σ tµ t σ K x σ tµ ] σ We are able to pull the product symbol outside of the averagig brackets at the begiig of the secod lie oly because the radom variables x i are idepedet: the average of a product of radom idepedet values equals the product of their averages. Goig through the ed, we d that the cumulat geeratig fuctio K X for X equals a shifted sum of aalogous geeratig fuctios K x for idividual radom variables x i, although evaluated at the rescaled parameter t/σ. Sice all x i have the same distributio, the last expressio is simply proportioal to. ow we ca calculate the cumulats of X substitutig τ t/σ : d K X t t dt κ X σ κx /2 µδ, σ d K t/σ ] dt t µδ, σ ] The zeroth cumulat always vaishes. The rst cumulat is the mea: κ X σ κx µ ] σ The secod cumulat is the variace: κ X 2 ] σ 2 κx 2 The higher order cumulats κ X become egligible whe. κ X 2 >2 d K τ τ σ /2 dτ µδ, σ We showed earlier that the probability distributio of X is completely determied by the cumulats, ad the distributio i which oly the rst ad secod cumulats are ite is Gaussia. This proves the cetral limit theorem. ]