Section-2. Data Analysis

Size: px

Start display at page:

Download "Section-2. Data Analysis"

Kelley Simpson
6 years ago
Views:

1 Section-2 Data Analysis Short Questions: Question 1: What is data? Answer: Data is the substrate for decision-making process. Data is measure of some ad servable characteristic of characteristic of a set of objects of interest. Statistics is a vast area of applied mathematics wherein data are collected, classified, presented and analyzed for a specific purpose. Question 2: What is role of statistics in business decision? Answer: Statistics plays an important role in business, because it provides the quantitative basis for arriving at decisions in all matters connected with operations of business. Statistics helps in a business to plan production according to the tastes of the consumers. Statistics in business can also serve as a tool of management to evaluate performance of machines and personnel. It also enables the businessman to judge the efficiency of new production methods by studying relationship between costs and methods of production. Question 3: Define Frequency Table. Answer: frequency is the number of occurrences of a data item. A table such as the one shown above that summarizes number of cases against a column of interest is called a frequency table. Question 4: What is Central Tendency? Answer: In a series of statistical data that parameter which reflects a central value of the series is called the central tendency. Central tendency refers to a single value that represent the whole set of data. Question 5: Define Average and discuss various types of averages. Answer: An average can be defined as a central value around which other values of series tend to cluster. An average is computed to give a concise picture of a large group. By the use of average complex groups of large numbers are presented in a few significant words or figures. Averages help in obtaining a picture of universe with the help of sample. Although sample and the universe differ in size, still their average may be very much identical. Average may be classified into tree board types: 1) Mathematical Averages: a) Arithmetical mean b) Geometric mean c) Harmonic average

2 2) Positional Averages: a) Mode b) Median 3) Commercial Averages: a) Moving average b) Progressive average c) Quadratic average Question 6: What you understand by term Range in statistics? Answer: Range: Range of data set is the difference between the largest value and the smallest value. For example runs scored by two batsmen A and B, we had some idea of variability in the scores on the basis of minimum and maximum runs in each series. To obtain a single number for this, we find the difference of maximum and minimum Values of each series. This difference is called the Range of the data. In case of batsman A, Range = = 117 and for batsman B, Range = = 14. Clearly, Range of A > Range of B. Therefore, the scores are scattered or dispersed in Case of A while for B these are close to each other. Thus, Range of a series = Maximum value Minimum value. Question 7: Define Mean Deviation. Answer: Mean deviation also known as average deviation, mean deviation is the mean of the absolute amounts by which the individual items deviate from the mean. The following procedure is usually applied: 1) Calculate the absolute deviation from the mean, removing any negative signs. 2) Add all the deviations. 3) Divide the sum of the deviation by the total number of items. Symbolically, these steps may be summarized as follows: For a sample size, the mean deviation is defined by MD = Where x is the arithmetic mean of variable x. Question 8: What is Skewness? Answer: Skewness: Skewness is a measure of the lack of symmetry or degree of distortion from symmetry exhibited by a normal distribution. Negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. It has a few relatively low values. The distribution is said to be leftskewed. In such a distribution, the mean is lower than median which in turn is lower than the mode (i.e.; mean < median < mode); in which case the skewness coefficient is lower than zero. Example (observations): 1, 1000, 1001, 1002, 1003

Positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. It has a few relatively high values. The distribution is said to be rightskewed.

3 Positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. It has a few relatively high values. The distribution is said to be rightskewed. In such a distribution, the mean is greater than median which in turn is greater than the mode (i.e.; mean > median > mode); in which case the skewness coefficient is greater than zero. Example (observations): 1,2,3,4,100 In a skewed (unbalanced, lopsided) distribution, the mean is farther out in the long tail than is the median. If there is no skewness or the distribution is symmetric like the bell-shaped normal curve then the mean = median = mode. Question 9: Discuss Merits and Demerits of Standard Deviation. Answer: Merits (1) The standard deviation is the best measure of variation because of its mathematical characteristics. It is based on every item of the distribution. Also it is amenable to algebraic treatment and is less affected by fluctuations of sampling than most other measures of dispersion. (2) It is possible to calculate the combined standard deviation of two or more groups. This is not possible with any other measure. (3) For comparing the variability of two or more distributions coefficient of variation is considered to be most appropriate and this is based on mean and standard deviation. (4) Standard deviation is most prominently used in further statistical work. Limitations (1) As compared to other measures it is difficult to compute. However, it does not reduce the importance of this measure because of high degree of accuracy of results is gives. (2) It gives more weight to extreme items and less to those which are near the mean. It is because of the fact that the squares of the deviations which are big in size would be proportionately greater than the squares of those deviations which are comparatively small. Question 10: Calculate the arithmetic mean for the following data:

4 Serial num Height of stu. Answer: calculation of arithmetic mean Serial number Height of student n=5 Mean(X ) = X = = Question11: Find the mean of first n natural numbers? Answer: since X = Sum of First natural number = xi xi = n = X = X = Question12. Find arithmetic mean of given x i and frequency? x i F Answer: Calculation of arithmetic mean

5 X i f fx x i = 69 fx = 1478 A.M. = A.M. = = Question13: Find the mode of the given data. Family size No. of family Answer: l = 3 h = 2 f 0 = 7 f 2 = 2 Mode = l + = 3 + [ ]*2 = 3 + (Answer) = 3.28 Question14: Find the Mode of the given data. Age x

6 No. of pl. f Answer: calculation of mode Age x No. of people f cf Where l = 35 h = 10 f 0 = 21 f 1 = 23 f 2 = 14 Mode = 35 + *2 = 35 + = Question15: Find the M.D. of the mean for the given data. 6, 7, 10, 12, 13, 4, 8, 12 Answer: = 72 X = = = 9 X i x = = x i x = M. D. =

7 = = X = 2.75 Question16: Why Study Dispersion? Answer: A measure of location, such as the mean or the median, only describes the center of the data, but it does not tell us anything about the spread of the data. For example, if your nature guide told you that the river ahead averaged 3 feet in depth, would you want to wade across on foot without additional information? Probably not. You would want to know something about the variation in the depth. A second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions. Question17: Write a short note on Properties of the Median. Answer: 1. There is a unique median for each data set. 2. It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. 3. It can be computed for ratio-level, interval-level, and ordinal-level data. 4. It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class. Question18: Discuss Merits and Demerits of arithmetic Mean. Answer: Merits: Arithmetic mean is widely used in practice because of the following reasons: 1. It is the simplest to understand and the easiest to compute. Neither the arranging of data as required for calculating median nor grouping of data as required for calculating mode is needed while calculating mean. 2. It is affected by the value of every item in the series. 3. It is defined by a rigid mathematical formula with the result that everyone who computes the average gets the same answer. Demerits: 1. Arithmetical mean is not always a good measure of central tendency, as, for instance, in extremely asymmetrical distributions. 2. Since the value of mean depends upon each and every item of the series, extreme items, i.e., very small and very large items, unduly affect the value of the average.

8 Question19: Discuss Merits and Demerits of Median. Answer: Merits 1. It is especially useful in case of open-end classes since only the position and not the values of items must be known. 2. In a markedly skewed distribution such as income distribution or price distribution where the arithmetic mean would be distorted by extreme values the median is especially useful. 3. The value of median can be determined graphically whereas the value of mean can not be graphically ascertained. Demerits 1. For calculating median it is necessary to arrange the data; other averages do not need any arrangement. 2. The value of median is affected mare by sampling fluctuations as compared to the value of arithmetic mean. Question20: Discuss Merits and Demerits of Mode. Answer: Merits 1. It is not affected by extremely large or small items. 2. The value of mode can also be determined graphically whereas the value of mean cannot be graphically ascertained. Demerits 1. The value of mode is not based on each and every item of the series. 2. It is not a rigidly defined measure. There are several formulae for calculating the mode, all of which usually give somewhat different answer. Long Questions: Question1: Write short notes on followings. a. Arithmetical Mean b. Weighted Average c. Geometric Mean d. Harmonic Mean Answer:

9 a. Arithmetical Mean: Arithmetic Mean or simple mean (represented by putting a bar above the variable name) is the quantity obtained by dividing the sum of the values of items ( X) in a variable by their number (n) i.e. number of items. X = b. Weighted Average: In calculating simple arithmetic mean it is assumed that all items were equal in importance. It may not be the case always. When items vary in importance they should be assigned weights in order of their relative importance. For calculating the weighted arithmetic mean the value of each items multiplied by its weight, product summated and divided by the total of weights and not by the number of items. The result is the weighted arithmetic average. X w = Here w 1, w 2, w 3. Stands for the respective weights of each of the items. c. Geometric Mean: geometric mean is defined as the positive nth root of the product of N items of series. If there are two items, take the square roots; if there are three items, we take the cube root, and so on. G.M. = d. Harmonic Mean: the harmonic mean is based on the reciprocals of the numbers averaged. It is defined as the reciprocal of the arithmetical mean of the reciprocal of the individual observations. H.M. = Question2: Write short notes on followings. a. Mean b. Median c. Mode Answer: a. Mean: Arithmetic Mean or simple mean (represented by putting a bar above the variable name) is the quantity obtained by dividing the sum of the values of items ( X) in a variable by their number (n) i.e. number of items. X b. Median: Median is the value of that item in the set of data which divides the data in two equal parts, one part consisting of all the values less and other all value greater than it.

10 Defined in another way median is that value of the central tendency, which divides the total frequency into two halves. When n is odd, The middle position number = When n is even, The middle position number = + 1 c. Mode: A third type of Central value or Centre of the distribution is the value of greatest frequency or, more precisely, of greatest frequency density. Graphically, it is the value on the X-axis below the peak, or highest point of the frequency curve. This is called then mode. Mode=L 1 + where L 1 =lower boundary of the class containing the largest frequency d 1 = difference of the largest frequency and the frequency of the last class d 2 = difference of the largest frequency and the frequency of the next class C= class interval Question3: Write short notes on followings. a. Quartiles b. Deciles c. Percentiles d. Moving Average e. Quadratic Average Answer: a. Quartiles: quartiles are another set of measures of positional central tendency. Like median, a quartile divides the entire set of data into four equal parts. Each part is known as a quartile. Therefore, three quartiles are possible in a data set as shown below. General idea remains the same. The data values are arranged either in ascending or descending order.

11 b. Deciles: in a manner similar to median and quartiles, the data set can be divided into 10 equal parts when arranged either in ascending or descending order. Each point of division is called a deciles. Thus, there are nine deciles represented as D 1,D 2,D 3.D 9. The interpretation of a deciles is similar to that of median and quartile. c. Percentiles: The data set can also be divided in to 100 equal parts whence each point of division called percentile. The 99 number of percentiles are represented by P 1,P 2,P 3 P 99. A general formula for all the positional measures of central tendency for a frequency for a frequency class distribution is given by: T i =L Ti + d. Moving Average: The moving average is an arithmetic average of data over a period and is updated regularly by replacing the first item in the average by the new item as it comes in. it is useful eliminating the irregularity of time series and is generally computed to study the trend. Example: Suppose the prices of 12 months are given and a tree monthly average is to be computed. Then the first item in the 3-month moving average would be the average [(a 1 +a 2 +a 3 )/3], the second item would be the average of the next three months[(a 2 +a 3 +a 4 )/3] and so on. The last item would be the average[(a 10 +a 11 +a 12 )/3]. As the next month would come in a10 would be dropped and a13 would be added in [(a 10 +a 11 +a 12 )/3] and so on. e. Quadratic Average: the quadratic mean on average is estimated by taking the square root of the average squares of the items of a series. Q m = Where Q m = Quadratic Mean a 2, b 2,c 2 =square of the different values

12 Quadratic average is useful when some items have negative values and other positive values because in such cases the mean is not very representative. It is also used in averaging deviations, rather than original values, when the standard deviation is computed. Question4: Write short notes on followings. a. Standard Deviation b. Variance c. Coefficient of Variance d. Quartile Deviation Answer: a. Standard deviation: The standard deviation of a sample(sd) is similar to the mean deviation in that it considers the deviation of each X value from the mean. However, instead of using the absolute values of the deviations, it uses the square of the deviations. These are added, divided by n, and the square root extracted. The formula for standard deviation SD SD = b. Variance: Variance is the square of SD and is represented by: Variance = V = c. Coefficient of variance: to get an indication of the variation that is related that is related to the mean, we divide the standard deviation by the mean to get the coefficient of variance. This enables us to compare two groups, which have different standard deviations and means more easily. Coefficient of variation = d. Quartile deviation: Half of the interquartile range is called the quartile deviation or semi-interquartile range. Symbolically, The value of Q.D. givens the average magnitude by which the two quartiles deviate from median. If the distribution is approximately symmetrical, then M d 50 % fo the observations and, thus, we can write Q 1 =M d -Q.D. and Q 3 =M d +Q.D. Question5: Write a short notes on Measures of Skewness and Kurtosis. And Kurtosis Vs. skewness Answer: Definition of skewness: For univariate data Y 1, Y 2,..., Y N, the formula for skewness is: where is the mean, is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any symmetric data should have a skewness

13 near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative. Definition of kurtosis: For univariate data Y 1, Y 2,..., Y N, the formula for kurtosis is: where is the mean, is the standard deviation, and N is the number of data points. The kurtosis for a standard normal distribution is three. For this reason, some sources use the following defition of kurtosis: This definition is used so that the standard normal distribution has a kurtosis of zero. In addition, with the second definition positive kurtosis indicates a "peaked" distribution and negative kurtosis indicates a "flat" distribution. Which definition of kurtosis is used is a matter of convention. When using software to compute the sample kurtosis, you need to be aware of which convention is being followed.

14 Examples The following example shows histograms for 10,000 random numbers generated from a normal, a double exponential, a Cauchy, and a Weibull distribution. Skewness and kurtosis: A fundamental task in many statis characterize the location and variability of a data set. A fur data includes skewness and kurtosis. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case. The histogram is an effective graphical technique for showing both the skewness and kurtosis of data set. Question6: Find the A.M. of given range and frequency. By (1) Assumption method, (2) Step deviation method Wages x No. of f

15 Answer: calculation of A.M. Wages x No. f D=X-A f*d U=D/20 f*u Let A = 900 Method (1) A.M. = A + A.M. = Method (2) A.M. = A + = A.M. = = Question6: Find the Quartiles of given data below? Length c Leaves f Answer: calculation of Quartiles Length c Leaves f Length c C*f For Q 2 Q 2 = l 1 + l 1 = 144.5

16 l 2 = f = 12 N = 40 C = 17 So, Q 2 = = = For Q 1 Q 1 = l 1 + l 1 = l 2 = f = 9 N = 40 C = 17 So, Q 1 = =137.5 For Q 3 Q 3 = l 1 + l 1 = l 2 = f = 5 N = 40 C = 29 So, Q 3 = =155.3

17 Question7: Find the M.D. about the mean for the given data. X i f i Answer: calculation of M.D. x i f i f i *x i x i x f i x i x As M.D.(m ) = N = 40 X = = M = = 7.5 M = 2.3 Question8: Find the Median of the given data. Height in c.m. Number of student Less than Less than Less than Less than Less than Less than Answer: calculation of median Class interval F fc is 51 odd, so observation will Sinc e n

18 Median = l + f-> frequency of observation class l-> Lower limit of observation cf-> frequency commutative of proceeding class h-> class size Median = [ ]*5 = [ ]*5 = = = Question9: Find the M.D. about the median for the following data. X i f i Answer: calculation of M.D. x i F i Cf Since N = 30, which is even. So Median is the A.M. of 15 th and 16 th observation. Median = = 13

19 f i x i M f i * x i - M f i * x i - M = 149 M.D. = M.D. = = 4.97 Question10: Find the M.D. about the mean for the following data. Mark obt. No. of stu Answer: calculation of M.D. Mark ob. f i x i f i *x i x i x f i * x i x f i = 40 f i *x i = 1800 f i * x i x =400 X = = 180 = 45 f i * x i x =400 M.D. = M.D. =

20 = 10 (Answer) Question11: Calculate Karl Pearson s coefficient of skewness for the following distribution. Monthly Salary (in Rs.) 400 but less than but less than but less than but less than but less than but less than Number of salesmen Answer: calculation of Karl Pearson s coefficient of skewness Salary Rs. m.p. m. f (m- 900)/200 d fd fd N=50 Fd=5 fd 2 =63 Coe ff. Of Sk. = Mea n: X = A + A = 900, fd = 5, N=50, i=200 X = =920 Mode: mode = L + Mode lies in the class L = 800, Mode=800+ = =912.5

21 S.D. = *200 = *200 = Coeff. Of sk. = = Question12: The median of the following data is 525. Find the values of x and y, if the total frequency is 100. Class interval X Y Frequency Answer: Class interval F Cf X 7+x x x x Y 56+x+y x+y x+y x+y It is given that n = 100 So, 76+x+y=100, i.e., x+y=24

22 The median is 525, which lies in the class Using the formula: Median =, we get 525 = = (14-x)*5 25=70-5x 5x=70-25=45 So, X=9 Therefore, from (1), we get 9+y=24 Y=24-9 Y=15 Short Questions: Question1: What is correlation analysis? Correlation Analysis Answer: Correlation is a measure of degree of association between two (or more) variables in a data set. Thus, if it is known that two variables are highly correlated then one can predict the value of one variable on the basis of the value of the other variable. two variables say X and Y are said to be correlated if: a. Both increase and decrease together. In this case the variables are said to be positive correlated. b. One increase then the other decrease, when the variables are said to be negatively correlated. Question2: What is scatter diagram? Answer: The simplest device for determining relationship between two variables is a special type of dot chart called scatter diagram. When this method is used the given data are plotted on a graph paper in the form of dots, i.e., for each pair of X and Y value we put a dot and thus obtain as many points the number of observations. By looking to the scatter of the various points we can form an idea as to whether the variables are related or not. The more the plotted points scatter over a chart, the less relationship there is between the two variables. The more nearly the points come falling on a line, the hither the degree of relationship. If all the points lie on a straight line falling from the left-hand corner to the upper right corner, correlation is said to be perfectly positive. On other hand, if all the

23 points are lying on a straight line rising from the upper left hand corner to the lower righthand corner of the diagram correlation is said to be perfectly negative. Question3: State Karl Pearson Coefficient of Linear Correlation. Answer: we observed that the more is the covariance the more will be correlation between the two variables. Therefore, covariance can be treated as a measure of correlation between two variables. However, the magnitude of covariance will depend on the units of measurements. The following expression derived from covariance does not suffer from the of units of measurements and hence is called Karl Pearson coefficient of linear correlation or simply coefficient of correlation and is denoted by r. r= Hence x= (X-X ), y= (Y-Y ) N = Number of paired observations. Question4: Write Properties of Coefficient of Correlation. Answer: The Karl Pearson Coefficient of Linear Correlation possesses a number of very interesting properties as described below. 1. The coefficient of linear correlation always lies between -1 and +1 inclusive. -1<=r<=1 The value 1 suggests perfect positive linear correlation while -1 implies perfect neg7yative linear correlation. A value 0 indicates that no linear correlation exists between the variables. 2. Coefficient of correlation is not affected by linear transformation of the variables. Thus if r xy is the correlation between variables X and Y, and r AB is the correlation between A and B, then r Ar =r xy where, A= ax+b and B= cy+d 3. If two variables are not related then they are also not correlated. However, if they are uncorrelated they may be related. This directly follows from the fact that coefficient of correlation measures strength of linear relationship. If the variables are related by not linearly then the coefficient of correlation may turn out to be 0 even though they are related otherwise. Question5: What is Regression Analysis? Answer: The statistical tool with the help of which we are in position to estimate (or predict) the unknown values of one variable from known value of another variable is called

24 regression. With the help of regression analysis, we are in a position to find out the average probable change in one variable given a certain amount of change in another. Question6: State the Spearman s Rank Correlation. Answer: This measure is especially useful when quantitative measures for certain factors (such as in the evaluation of leadership ability or the judgment of female beauty) cannot be fixed, but the individuals in the group can be arranged in order thereby obtaining for each individual a number indicating his (her) rank in the group. In any event, the rank correlation coefficient is applied to a set of ordinal rank numbers, with 1 for the individual ranked first in quantity, or quality, and so on, to n for the individual ranked last in the group of n individuals (or n pairs of individuals). Spearman s rank correlation coefficient is defined as: R = 1- Where R denotes rank coefficient of correlation and D refers to the difference of ranks between paired items in two series. Question7: What is difference between Regression & Correlation? Answer: Following are the points of difference between correlation and regression: 1. Whereas correlation coefficient is a measure of degree of co variability between X and Y, the objective of regression analysis is to study the nature of relationship between the variables so that we may be able to predict the value of one on the basis of production is called the interdependent variable and the variable that is to be predicted is referred to as the dependent variable. 2. The cause and effect relation is clear indicated through regression analysis than by correlation. Correlation is merely a tool of ascertaining the degree of relationship between two variable and, therefore, we cannot say that one variable is the cause and the other the effect. Question8: What is relationship between Regression and Correlation? Answer: The two coefficients of regression are related to the coefficient of correlation in a following way. Bd=r *r r 2 Or, r = Hence, coefficient of correlation is geometric mean if the two coefficients of regression. Question9: What is Partial and Multiple Correlation? Answer: when three or more variables are studied it is a problem of either multiple or partial correlation. In multiple correlations three or more variables are studied simultaneously. For example, when we study the relationship between yield of rice per acre

25 and both the amount of rainfall and the amount of fertilizer used, it is a problem of multiple correlation. Long Questions: Question1:Define follows: a. Positive and negative correlation b. Linear and non-linear correlation Answer: a. Positive and Negative Correlation: whether correlation is positive (direct) or negative (inverse) would depend upon the direction of change of the variable. If both the variables are varying in the same direction, if as one variable is increasing the other on an average, is also decreasing, correlation said to be positive. If, on the other hand, the variables are varying in opposite direction, i.e., as one variable is increasing, the other is decreasing or vice versa, correlation said to be negative. b. Linear and Non-linear Correlation: the distinction between linear and non-linear correlation is based upon the constancy of the ratio of change between variables. If the amount of change in one variable tends to bear a constant ratio to the amount of change in the other variable then the correlation is said linear. Correlation called non-linear or curvilinear if the amount of change in one variable does not bear a constant ratio to the amount of change in the order variable.

26 Linear correlation Non-linear correlation Question2: Write a short note on Karl Pearson s coefficient of correlation. Answer: of the several mathematical methods of measuring correlation, the Karl Pearson s method, popularly known as Pearsonian coefficient of correlation, is most widely used in practice. The Pearsonian coefficient of correlation is denoted by the symbol r. it is the one of the very few symbols that is used universally for describing the degree of correlation between two series. The formula for computing pearsonian r is: r = Hence x = (X X ), y = (Y Y ) This method is to be applied only when the deviations of items are taken from actual means and not from assumed means. The value of the coefficient of correlation as obtained by the above formula shall always lie between when r = +1, it means there is perfect positive correlation between the variables. When r= -1, it means there is perfect negative correlation between the variables. When r = 0, it means there no relationship between the variables. Question3: Two judges in a beauty competition rank the 12 entries as follows: X: Y:

27 What degree of agreement is there between the judgment of the two judges? Answer: Calculation of Rank Correlation coefficient X R 1 Y R 2 (R 1 R 2 ) D D 2 = 416 D 2 = 416, N = 12 D 2 R = 1 - R = 1 = 1 - = = Question4: Write a short note on regression lines. Answer: if we take the case of two variables X and Y, we shall have two regression lines as the regression of X on Y and of Y on X. The regression line of Y on X givens the most probable values of Y for given value of X and the regression line of X on Y gives the most probable values of X for given values of Y. thus we have two regression lines. However, when there is either perfect positive or perfect negative correlation between the two variables, the two regression lines will coincide, i.e., we will have only one line. The farther the two regression lines from each other, the lesser is the degree of correlation and nearer the two regression lines to each other, the higher the degree of correlation. If the varieties are independent, r is zero and the lines of regression are at right angles, i.e., parallel to OX and OY. It should be noted that the regression lines intersect each other at the point of average of X and Y, i.e., if from the point where both the regression lines intersect each other a perpendicular is drawn on the X-axis, we will get the mean value of X and if from that point a horizontal line is drawn on the Y-axis, we will get the mean value of Y. Regression equation of Y on X

28 The regression equation of Y on X is expressed as follows: Y c = a + bx To determined the value of a and b the following two normal equations are to be solved simultaneously: XY= a X+b X 2 Regression equation of X on Y The regression equation of X on Y is expressed as follows: X c = a + by To determined the value of a and b the following two normal equations are to be solved simultaneously: XY= a Y+b Y 2 Question5: From the following data obtain the regression equation of X on Y, and also than of Y on X. X Y Answer: calculation of regression equations X (X-6) X X 2 y (y-8) Y XY X =30 X= 0 X 2 = 40 y =40 Y= 0 Y 2 = 20 XY= -26 Y 2 Regression equation X on Y X-X = r (Y-Y )

29 X = X-6 = -1.3(Y-8) X-6 = -1.3Y or X = Y Regression equation Y on x Y-Y = r (X-X ) X = Y-8 = -1.3(X-8) Y-8 = -1.3X or Y = X Question6: Calculation of Karl Pearson s coefficient of correlation from the following data: X Y Answer: X (X-18) x X 2 Y (Y-19) y xy X=162 X=0 X 2 =598 Y=171 Y=0 Y 2 =338 Xy=431 Y 2 r =

30 r = = Question7: What is the utility of the study of correlation? Answer: The study of correlation is of immense use in practical life because of the following reasons: 1. Most of the variables show some kind of relationship. For example, there is relationship between price and supply, income and expenditure, etc. with the help of correlation analysis we can measure in one figure the degree of relationship exiting between the variables. 2. Once we know that two variables are closely related, we can estimate the value of one variable given the value of another. 3. Correlation analysis contributes to the economic behavior, aids in locating the critically important variables on which others depend, may reveal to the economist the connection by which disturbances spread and suggest to him the paths through which stabilizing forces become effective. In business, correlation analysis enables the executive to estimate costs, sales, prices and other variables on the basis of some other series with which these costs, sales, or prices may be functionally related. Some guesswork can be removed from decisions when the relationship between a variable to be estimated and the one or more other variables on which it depends are close reasonably invariant. However, it should be noted that coefficient of correlation is one of the most widely used and also one of the most widely abused of statistical Measures. It is abused in the sense that one sometimes overlooks the fact that r measures nothing bit the strength of the linear relationships and that it does not necessarily imply a cause-effect relationship. 4. Progressive development in the methods of science and philosophy has been characterized by increase in the knowledge of relationship or correlations. Nature has been found to be as multiplicity of interrelated forced.

Measures of Central tendency

Elementary Statistics Measures of Central tendency By Prof. Mirza Manzoor Ahmad In statistics, a central tendency (or, more commonly, a measure of central tendency) is a central or typical value for a