Module 8: Probablty and Statstcal Methods n Water Resources Engneerng Bob Ptt Unversty of Alabama Tuscaloosa, AL Flow data are avalable from numerous USGS operated flow recordng statons. Data s usually avalable on a real-tme bass. Besdes the current flow condtons, summares of etreme flows are also tabulated and avalable. These data can be used to predct the frequency of etreme flows of most nterest for desgn. Probablty and Statstcs Desgn storms are frequently used to sze hydroc structures. The man queston s: What s the possblty of occurrence of a larger storm than our culvert or brdge s barely capable of handlng? (what s the lkelhood of falure) We must be able to evaluate the probablty of hydroc events. Frequent statstcal goal s therefore to ft a standard probablty dstrbuton to observed precptaton and runoff data. Probablty and Return Perods Concepts of probablty and statstcs are closely assocated, partcularly when dealng wth a set of data. The probablty of the occurrence of a partcular event s smply the chance that the event wll occur. If there s a fnte number of events, a notnecessarly equal probablty may be assgned to each. If the possble outcomes cover a contnuous range of values, the probablty can only be epressed by a mathematcal functon.
What s the probablty of rollng a wth a sngle roll of a s-faced de? Snce there are s equally lkely outcomes, the probablty of obtanng a s /6. What s the probablty of rollng ether a or a 4? There are possble outcomes, so the probablty of obtanng ether a or a 4 s /6. The probablty of equalng or eceedng a partcular value s a cumulatve probablty. Our nterest centers on the probablty that an event wll be equaled or eceeded wthn a gven tme frame. p s the probablty that an event of a specfc magntude s equaled or eceeded n that year. Therefore, -p s the probablty that the event s not equaled or eceeded n that year. The return perod epresses the average nterval between events, but t does not gve specfc nformaton concernng the lkelhood of occurrence durng the desgn lfe of the project. What s the probablty that the dscharge of 5,000 cfs wth a return perod of 50 years wll occur durng the lfe of a dam? Wll t occur at all? Can t occur more than once? The probablty that the dscharge s not equaled or eceeded n two years s: (-p)(-p), or (-p) where s the perod of nterest. The probablty J that the event wll occur at least once durng years s: J ( p ) If a dscharge of 5,000 cfs has a probablty, p 0.0, then the probablty s percent that ths dscharge wll be equaled or eceeded n any one year (and 98% that t wll not). The recprocal of p s the return perod, or recurrence nterval, t p. Ths s the average tme nterval between dscharges that are equal to, or greater than, a specfed dscharge: t p p Prasuhn 987
If a dam has a desgn lfe of 50 years (50 years), the 5,000 cfs flow (havng a recurrence nterval, t p, of 50 years), wll occur, or be eceeded wth a probablty of 63.6 percent (not 00%!). What s the probablty that ths flow wll occur (or be eceeded) durng the 5-year constructon perod? (5 years). (Answer: 9.6%). What s the probablty that a 00 year storm wll occur at least once durng a 00 year perod? Durng a 50 year perod? Durng year? Unform dstrbutons are of lttle nterest n hydroy, but are a smple place to start, and are smlar to the de problem. Probablty Dstrbuton Functon Cumulatve Dstrbuton Functon Any value between a and b have an equal lkelhood of occurrng. a and b are the lower and upper lmts of The probablty that an outcome wll be equal to, or greater than, a partcular value of s: p F ( ) f ( )d Prasuhn 987 Probablty Dstrbutons A probablty densty functon (PDF) s a contnuous mathematcal epresson that determnes the probablty of a specfc event. The dstrbuton that best fts the set of data s epected to gve the best estmate of the probablty of an event that has not been observed. Actual dscharges or precptaton values over a perod of years form a contnuous functon, because any value s possble, wthn a broad range. We wll only eamne a few possble probablty dstrbutons. ormal dstrbutons are famlar bell-shaped curves. f ( ) ( ) ep σ π σ Prasuhn 987 3
The normal dstrbuton PDF s defned by two dstrbuton parameters, the mean and the standard devaton. The mean s the average of all of the observatons: and s at the center of the dstrbuton. The standard devaton descrbes the wdth of the dstrbuton (the spread of the data): About 68% of the data s wthn +/- standard devatons of the mean, whle σ ( ) / about 98% s wthn +/- 3 standard devatons. Prasuhn 987 The normal dstrbuton does not provde a satsfactory ft to flood dscharges and other hydroc data. The dstrbuton etends from negatve to postve nfnty and therefore assgns a probablty to negatve flows. A specfc event can be related to the probablty of eceedance p by: + K Kσ where K s the frequency factor (gven n the followng table for specfc values of p). Usng actual data, K can be calculated: Ths s the number of ( ) σ standard devatons the data pont s located from the mean If actual hydroc data are to be epressed wth respect to p, the probablty of eceedance n a year, the data set wll often be based on the sngle peak value observed for each year (the annual seres). Eample 5- (Prasuhn 987) Determne the probablty that a dscharge of 0,000 cfs wll be equaled or eceed n any one year, f the mean of the annual seres of rver dscharges s 0,000 cfs and the standard devaton s () 3,000 cfs, or () 6,000 cfs. What s the return perod n each case? Soluton: () K 0,000 cfs 3,000 0,000 cfs cfs 3.33 Lookng up ths value of K on the prevous table yelds a p of about 0.0005, and the correspondng recurrence nterval: t, p p 0.0005 000 years 4
() 0,000cfs K 6,000 0,000 cfs cfs.67 The correspondng p value on the table s about 0.05 t p 0 p 0.05 years Therefore, the same value can have vastly dfferent recurrence ntervals, even wth the same average flow rates, as the standard devaton changes. Ths dstrbuton assumes that the arthms of the dscharges are normally dstrbuted. The pror equatons can be used descrbng the mean and standard devatons, f the followng transformaton s used: y The mean of the arthms themselves can be epressed drectly: As an alternatve, the mean can be found by takng the arthm of the geometrc mean of the set of values: / ( ) 3... Log-ormal Dstrbutons The -normal dstrbuton s closely related to the normal dstrbuton. The values are transformed by takng the 0 of the data. Ths dstrbuton s much more useful than the basc normal dstrbuton as no negatve numbers are allowed, whle large postve values are acceptable. The followng plot shows ths dstorted dstrbuton n real (nontransformed) space: The standard devaton can also be drectly calculated where the values are both based on the arthms of the actual data: σ ( ) / The probablty of eceedance can be related to the occurrence usng: + K σ The frequency factors may be determned from the pror table for normal dstrbutons. 5
Log Pearson Type III Dstrbuton The problem wth most hydroc data s that an equal data spread does not occur above and below the mean. The lower sde s lmted to the range from the mean to zero, whle there s no lmt to the upper range. Ths results n a skewed dstrbuton that may not be completely corrected by the normal dstrbuton. The coeffcent of skew, a, s defned by: a ( ) 3 3 ( )( ) σ The use of values n the -normal dstrbuton tends to reduce the dstrbuton dstorton. However, some skewness usually remans. Prasuhn 987 To determne the skewness when usng values, the followng can be used: a ( ) 3 ( )( ) σ 3 The normal and -normal dstrbutons assume zero skew. If some skew ests n the data, these dstrbutons result n errors. The Pearson Type III dstrbuton was developed n 967 to mprove the ft of hydroc data. Ths method uses a thrd parameter, the skew coeffcent, n addton to the mean and standard devaton. The followng table gves the frequency factors for ths dstrbuton. The zero skew values correspond to the normal dstrbuton. 6
Eample 5- (Prasuhn 987) The mean of the arthms of the annual seres of rver dscharges s.700 (whch corresponds to a geometrc mean for the peak flows of 50 m 3 /sec). The standard devaton of the same values s 0.65. Determne the dscharge wth a 00-year perod f the coeffcent of skew s () -0.4, () 0, and (3) +0.4. Soluton: Log.7 + 0.65 K () K.09; Q 4.09; Q 0,400 m 3 /sec () K.36; Q 4.; Q 6,300 m 3 /sec (-normal) (3) K.65; Q 4.400; Q 5,00 m 3 /sec Prasuhn 987 Statstcal Analyss An annual seres uses the mamum values for each year. A drawback to the annual seres s that some years may have several large peaks, whle other years may have peaks that are much lower. However, only one value per year can be used, dscardng some potentally valuable nformaton. A partal duraton seres uses all values above a selected value. For rare events, the results are smlar and the annual seres s recommended because t s easer to obtan. However, f more frequent flows are of nterest (such as recurrence ntervals of less than one year), the partal duraton seres should be used. The followng table shows all annual peak flows, plus all others greater than 9,000 cfs. Ths data can therefore be used for ether a partal duraton seres, or an annual seres analyss. The followng table shows the last segment of an annual seres analyss of ths data. Included are the ranks for the smallest annual peak flows, the correspondng flows, and several columns of summary statstcs for these flows. Ths analyss can be easly conducted on a spreadsheet program. Each flow has ts recurrence nterval calculated for years of record (53 years), where m s the rank: + t p m The probablty of eceedance s the recprocal of the return perod, or: p + m A rank of n a perod of 0 years leads to a probablty of p / 0.0909. + results n the best estmate for lmted data sets. In a partal duraton seres, stll refers to the duraton of the record and s dfferent than the number of observatons. 7
Prasuhn 987 Prasuhn 987 It s now possble to plot the data and the ftted equatons for the dfferent dstrbutons to normal and -normal probablty paper to determne the best fttng dstrbuton, and to use the plots to determne flows for dfferent recurrence ntervals. Probablty paper can be easly downloaded at several web stes, ncludng: http://www.webull.com/gpaper/ The use of probablty paper allows vsual clues as to the best dstrbuton (usually the one wth the best ft for the data has a straght lne, at least for normal and -normal plots, or that fts the curved plotted lne for -Pearson type III plots). The frst step s to create the annual seres (or partal seres) data and rank the observatons, usually from the largest to the smallest. Then calculate the probablty for each observaton, usng p m/(+). Fnally, just plot the flow values aganst the calculated p values. The followng plots are eamples usng the Sou Rver data. Do an n-class eample to plot the followng 9 observatons: 5 5 78 56 3 7 3 88 8
The followng plot uses the Bg Sou Rver data on a normal plot. For ths plot, the flow values are plotted on an arthmetc scale and the probabltes are plotted on scales that are dstorted so that a normal dstrbuton would plot as a straght lne. ote that ths s not a scale. Besdes the data ponts (whch are not along a straght lne, an ndcaton that ths s not a sutable dstrbuton), a straght lne whch corresponds to the best ft for ths data s also plotted. The equaton for ths lne s based on the data characterstcs: + Kσ 3,844 + 4, 505K To plot the straght lne, values of K are obtaned from the pror table of K values for normalty probablty for selected p values. The flow values are then calculated correspondng to these p values, and then plotted to form the lne. The flow values are plotted on the scale, and the same probablty vs. flow values are used to plot the straght lne. The actual equaton for the straght lne s: + K σ 3.949 + 0. 4380K Ths plot also shows the Pearson type III plot, usng the calculated skew. Prasuhn 987 ot a very good ft, so the followng plot for -normal dstrbutons are attempted, usng the same data, but usng -normal probablty paper. Prasuhn 987 The curved lne for the Pearson type III dstrbuton s obtaned the same way, ecept the calculated skew value s used to obtan the K parameter for the equaton. In ths eample, the skew coeffcent a s -0.368. Both of these ftted lnes are not perfect fts of the observed data, and lead to very dfferent results when used to etrapolate to large recurrence nterval flows. The Pearson curve fts the overall data range better, but the -normal curve fts the 3 largest values (usually of most nterest) better. The best choce s therefore sometmes dffcult to determne. What s the epected dscharge havng a 00 year recurrence nterval (t p 00 years, p 0.005)? The calculated lnes could be etended to ths value, or the equatons can be drectly used. 9
Log-normal: p 0.005 and a 0 K.576 from the table Therefore: Log 3.949 + (0.4380) (.576) 5.0773 and 0 5.0773 9,500 cfs Log Pearson Type III: p 0.005 and a -0.368 K.3 from the table Therefore: Log 3.949 + (0.4380)(.3) 4.96 and 0 4.96 84,400 cfs Etreme flow events are usually well outsde of the normal channel and measurement accuracy suffers. Chn 000 There s consderable dscrepancy between these two predcted values. The larger value s more conservatve and s more consstent wth the larger observed dscharges. However, the lower value s probably the better estmate as t fts the complete data set better. Measurng the large actual dscharges s subject to consderable error, as they were lkely occurrng durng flood stage condtons where the flow measurement staton may have been submerged, or beyond calbraton depths, requrng crude estmates of actual flows based on physcal evdence. Also, the largest flow was not lkely assocated wth an eact 54-year event. There s no way of knowng what sze event t was; could have been assocated wth a much more rare event, such as the 00 year event that just happened to occur durng the shorter perod of record. ormally, the Pearson dstrbuton s recommended as t consders the skew parameter, but cauton s needed as unrealstc and ecessve values of skew may occur for a partcular rver. Regonal skew values should also be eamned. Eample 5-4 (Prasuhn 987) Use the three dstrbuton methods to predct the 50-year flood on the Bg Sou Rver at Akron, Iowa. Soluton: ()ormal dstrbuton: t p 50 yrs, p 0.0. Therefore K.054 3,844 + (.054)(4,505) 43,600 cfs () Log-normal dstrbuton: a 0 K.054 3.949 + (.054)(0.4380) 4.849 X 0 4.849 70,600 cfs 0
(3) Log Pearson type III a -0.368 K.85 Log 3.949 + (.85)(0.4380) 4.760 X 0 4.760 57,600 cfs Obvously, the applcaton of statstcal methods s not an eact scence. The methods are etremely helpful n the nterpretaton of hydroc data and the predcton of desgn tools, but the engneer must be aware of the lmtatons nvolved. ot only are flood condtons mportant, but drought condtons can also be evaluated usng the same methods. When dealng wth ranfall, the ntensty of the precptaton as well as the overall quantty s mportant. References Chn, Davd, A. Water-Resources Engneerng. Prentce Hall. 000. Prasuhn, Alan L. Fundamentals of Hydraulc Engneerng. Holt, Rnehart and Wnston. 987. Homework Problem: Repeat the Bg Sou Rver analyss, but only use data from the last 0 years of record (97 to 98). Predct the 50 and 00 year flows usng the Pearson type III dstrbuton. What are the lmtatons of usng a short perod of observatons?