0. Key Statistical Measures of Data Four pricipal features which characterize a set of observatios o a radom variable are: (i) the cetral tedecy or the value aroud which all other values are buched, (ii) the spread of the sample data aroud mea, (iii) the asymmetry or skewess of the spread of data, ad (iv) the peakedess of the data. These characteristics are expressed i terms of statistical properties which are estimated from the sample data. 0.. Measures of Cetral Tedecy I statistics various measures of cetral tedecy are employed. Three importat measures are the followig. (i) Arithmetic Mea: If x, x x represet a series of observatios, the mea of this series is: x = i= x i (0.5) Where x represets the sample mea; the mea of populatio is geerally deoted by. (ii) Mode: It is the value which occurs most frequetly. It is the peak value of the PDF. A data set may have more tha oe peak. (iii) Media: It is the middle value of the raked observatios for a data set. The media divides the distributio i two equal parts. 0.. Measure of Dispersio or Variatio Three statistical measures of variatio of data are commoly used. (i) Variace: It represets the scatter of the data are about the mea. Variace is computed by: s = i= (x i - x ) (0.) A small value of variace implies that values are buchig close to the mea.
(ii) Stadard Deviatio (SD): The ubiased estimate of populatio stadard deviatio (s) is give computed as the square root of the variace: s = [ ( xi - x ) ] i= 0.5 (0.7) whe < 0, the ubiased estimate of s is foud by replacig by - i the deomiator. Greek letter σ is used to deote the stadard deviatio of populatio. (iii) Coefficiet of Variatio (CV) is a dimesioless parameter ad is obtaied by dividig the stadard deviatio by the mea: C V = s / x (0.8) Whe the mea of the data is zero, C v is ot defied. This coefficiet is useful to compare differet populatios. Give two samples of data, the oe with larger C v will have more spread of the values aroud the mea. Example 0.: Average aual flows (i cumec) at a river gaugig site are give i the table below. Compute the mea, variace, stadard deviatio, ad the coefficiet of variatio of the flows. Year 70 7 7 7 74 75 7 77 78 7 80 Flow (cumec) 5.5 45.4 48. 4.7 05. 0. 0. 4.4 7..8.0 Year 8 8 8 84 85 8 87 88 8 0 Flow (cumec).4 40. 45...5 0.0 0..8 80.5. 4. Solutio: We have a total of values. The mea of the flows ca be computed as Mea = (5.5 + 45.4 + 48. +. 80.5 +. + 4.)/ = 7.8 cumec. The variace ca be computed by eq. (0.) Variace s = [(5.5-7.8) + (45.4 - -7.8) + (48.-7.8) + + (.-7.8) + (4.-7.8) )]/ =.4 cumec SD s = (.4)0.5 = 8. cumec CV = 8./7.8 = 0.47. 0.. Measures of Symmetry Usually the hydrologic data are ot distributed symmetrically aroud the mea. If the data to the
right of the mea are more spread out tha those o the left the, by covetio, the asymmetry is positive ad vice versa for egative asymmetry (see figure 0.4).If the data are symmetrically placed aroud the mea the the measure of symmetry would be zero. by: The third momet of the data about the mea is used i idicatig symmetry ad is give M = i = (x i - x ) (0.) It is easy to see that this momet is zero if the data are symmetrical. Otherwise, M will have certai value, a positive or egative. Note that because the third cetral momet has dimesios equal to the cube of the data, it is ot useful while comparig differet data sets. Beig o-dimesioal the coefficiet of skewess does ot have this disadvatage ad is preferred. Coefficiet of Skewess: A o-dimesioal measure of the asymmetry of the distributio of the data is helpful whe various data are to be compared ad the coefficiet skewess is oe such measure. The coefficiet of skewess (C s ) is give by: = C s i (x - x) ( - ) ( ) s i (0.0) Symmetrical frequecy distributios have very small or egligible value for skewess coefficiet C s, while asymmetrical frequecy distributios have either positive or egative coefficiets. Whe C s has a small value, it idicates that the probability distributio may be approximated by the ormal distributio sice C s = 0 for this distributio. The symmetrical ad skewed distributios are show i Fig. 0.4. Negative skew Mea Media Mode Zero skew Mea = Media = Mode Positive skew Mode Media Mea f(x ) f(x ) f(x ) x --
Fig. 0.4 Symmetrical ad asymmetrical (+ve ad ve) skewed distributios. 0..4 Measures of Peakedess or Flatess The measure used to deote the peakedess or the flatess of the frequecy distributio ear its cetre is kow as the kurtosis coefficiet. This coefficiet is computed by: ( - x) 4 xi Ck = i ( -)( - )( - )s 4 (0.) Normal distributio has the kurtosis. If a data set has a relatively greater cocetratio ear the mea tha the ormal distributio, the kurtosis will be greater tha. Coversely, if the data have a relatively smaller cocetratio ear the mea tha the ormal distributio, the kurtosis will be less tha. Example 0.: Compute the coefficiet of skewess ad the coefficiet kurtosis of the data of example 0.. Solutio: The coefficiet of skewess ca be computed by eq. (0.0) C s = [/(*0*8. )]*[(5.5-7.8) + (45.4 - -7.8) + + (.-7.8) + (4.-7.8) )] =.8 A positive value of C S implies that the probability distributio of the data has heavy tail to the right. Kurtosis ca be computed by eq. () C k = [*/(*0**8. 4 )]*[(5.5-7.8) 4 + (45.4 - -7.8) 4 + + (.-7.8) 4 + (4.-7.8) 4 )] =.45 Sice kurtosis is less tha, it meas that the data values are less cocetrated aroud the mea tha the ormal distributio or the peak of the distributio will be flatter compared to the
ormal distributio. 0. Graphical Presetatio of Data Graphically presetatio helps i a good isight i the behavior ad variatio of the data. To graphically preset the data i the form of histograms, a frequecy table is prepared. For this purpose the rage of the data is divided ito a umber of itervals of coveiet size ad frequecies of values occurrig i each iterval is etered alogside. The appearace of a frequecy histogram depeds upo the selectio of class iterval. If the class itervals are very large, the table is compact but details may be lost. If the itervals are too small, the table may be too bulky. The followig guidelies may be cosidered while choosig the class iterval (a) Brooks ad Carruthers rough guide: Number of classes 5 log (sample size) (0.) (b) Charlier's rule of thumb: w = (maximum value miimum value)/0 (0.) where w is the size of class iterval. I geeral the umber of classes varies betwee ad 5. To prepare the frequecy table, steps give below ca be followed: (i) Arrage the variable (X i ) i icreasig or decreasig order of magitude. (ii) Decide the umber of class itervals (NC) ad the size of the class iterval X. (iii) Divide the ordered observatios X i ito NC itervals. (iv) Determie the absolute frequecy j as the umber of observatios that fall i the j th class iterval, j=,... NC. (v) Compute the relative frequecies of various classes as j /, j=,... NC ad is the umber of observatios. (vi) Compute the cumulative relative frequecies F j, j =,... NC. (vii) Plot the relative frequecies as well as cumulative relative frequecies with group iterval as abscissa ad the relative frequecies or cumulative relative frequecies as ordiate. Example 0.: The aual flow of Sabarmati River at Dharoi is plotted i Fig. 0.5 for the
period 88-5. Plot the histogram ad the cumulativee histogram. Fig. 0.5 Plot of the aual flow of the river. Solutio: After examiig the data havig 8 values, thee class iterval was chose as 00 MCM. Table 0. shows the mid-values of classes i which thee data has bee divided, the frequecy of values i each class ad the cumulative frequecies. There are 7 classes. Table 0. Mid-values ad frequecies of various classess of examplee data. Mid-value of class (MCM) 50 50 50 450 550 50 750 850 50 050 50 50 Frequecy 5 Cumulative frequecy 5 5 44 5 7 78 84 85 0
50 450 550 50 750 0 0 5 5 7 7 8 The cumulative histogram of the aual flow of Sabarmati River att Dharoi for the period 88-5 (8 years) is plotted i Fig. 0.. Fig. 0. Histogram of the aual river flows. Fig. 0.7 Cumulative histogram of aual flows of the river.