MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Similar documents
Measures of Central tendency

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Simple Descriptive Statistics

Numerical Measurements

Measures of Dispersion (Range, standard deviation, standard error) Introduction

3.1 Measures of Central Tendency

PSYCHOLOGICAL STATISTICS

ECON 214 Elements of Statistics for Economists

CHAPTER 2 Describing Data: Numerical

Engineering Mathematics III. Moments

Numerical Descriptions of Data

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

SUMMARY STATISTICS EXAMPLES AND ACTIVITIES

Descriptive Statistics

Moments and Measures of Skewness and Kurtosis

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Lectures delivered by Prof.K.K.Achary, YRC

DESCRIPTIVE STATISTICS

Applications of Data Dispersions

2 DESCRIPTIVE STATISTICS

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Some Characteristics of Data

Descriptive Statistics

Fundamentals of Statistics

Measures of Central Tendency: Ungrouped Data. Mode. Median. Mode -- Example. Median: Example with an Odd Number of Terms

1 Describing Distributions with numbers

Description of Data I

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Section-2. Data Analysis

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

Chapter 6 Simple Correlation and

22.2 Shape, Center, and Spread

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

appstats5.notebook September 07, 2016 Chapter 5

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Basic Procedure for Histograms

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Statistics 114 September 29, 2012

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Measure of Variation

Frequency Distribution and Summary Statistics

2 Exploring Univariate Data

Unit 2 Statistics of One Variable

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Chapter 5: Summarizing Data: Measures of Variation

Numerical Descriptive Measures. Measures of Center: Mean and Median

Empirical Rule (P148)

STAT 113 Variability

CABARRUS COUNTY 2008 APPRAISAL MANUAL

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Terms & Characteristics

Section3-2: Measures of Center

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Statistics I Chapter 2: Analysis of univariate data

David Tenenbaum GEOG 090 UNC-CH Spring 2005

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

Some estimates of the height of the podium

ECON 214 Elements of Statistics for Economists 2016/2017

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Lecture 2 Describing Data

SOLUTIONS TO THE LAB 1 ASSIGNMENT

HIGHER SECONDARY I ST YEAR STATISTICS MODEL QUESTION PAPER

IOP 201-Q (Industrial Psychological Research) Tutorial 5

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

DATA SUMMARIZATION AND VISUALIZATION

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Shifting and rescaling data distributions

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

Chapter 6: The Normal Distribution

Descriptive Analysis

DATA HANDLING Five-Number Summary

value BE.104 Spring Biostatistics: Distribution and the Mean J. L. Sherley

Chapter 6: The Normal Distribution

Numerical summary of data

Normal Model (Part 1)

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

ECON 214 Elements of Statistics for Economists

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Normal Probability Distributions

CSC Advanced Scientific Programming, Spring Descriptive Statistics

UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree

Describing Data: One Quantitative Variable

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

Descriptive Statistics for Educational Data Analyst: A Conceptual Note

Lecture 1: Review and Exploratory Data Analysis (EDA)

The Normal Distribution

Transcription:

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE Dr. Bijaya Bhusan Nanda,

CONTENTS What is measures of dispersion? Why measures of dispersion? How measures of dispersions are calculated? Range Quartile deviation or semi inter-quartile range, Mean deviation and Standard deviation. Methods for detecting outlier Measure of Relative Standing Measure of shape

LEARNING OBJECTIVE They will be able to: describe the homogeneity or heterogeneity of the distribution, understand the reliability of the mean, compare the distributions as regards the variability. describe the relative standing of the data and also shape of the distribution.

What is measures of dispersion? (Definition) Central tendency measures do not reveal the variability present in the data. Dispersion is the scattered ness of the data series around it average. Dispersion is the extent to which values in a distribution differ from the average of the distribution.

Why measures of dispersion? (Significance) Determine the reliability of an average Serve as a basis for the control of the variability To compare the variability of two or more series and Facilitate the use of other statistical measures.

Dispersion Example Number of minutes 20 clients waited to see a consulting doctor Consultant Doctor X Y 05 15 15 16 12 03 12 18 04 19 15 14 37 11 13 17 06 34 11 15 X: High variability, Less consistency. Y: Low variability, More Consistency X:Mean Time 14.6 minutes Y:Mean waiting time 14.6 minutes What is the difference in the two series?

Frequency curve of distribution of three sets of data C B A

Characteristics of an Ideal Measure of Dispersion 1. It should berigidly defined. 2. It should beeasy to understand and easy to calculate. 3. It should bebased on all the observations of the data. 4. It should be easily subjected to further mathematical treatment. 5. It should beleast affected by the sampling fluctuation. 6. It should not be undulyaffected by the extreme values.

How dispersions are measured? Measure of dispersion: Absolute: Measure the dispersion in the original unit of the data. Variability in 2 or more distr n can be compared provided they are given in the same unit and have the same average. Relative: Measure of dispersion is free from unit of measurement of data. It is the ratio of a measaure of absolute dispersion to the average, from which absolute deviations are measured. It is called as co-efficient of dispersion.

How dispersions are measured? Contd. The following measures of dispersion are used to study the variation: The range The inter quartile range and quartile deviation The mean deviation or average deviation The standard deviation

How dispersions are measured? Contd. Range: The difference between the values of the two extreme items of a series. Example: Age of a sample of 10 subjects from a population of 169subjects are: X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 42 28 28 61 31 23 50 34 32 37 The youngest subject in the sample is 23years old and the oldest is 61 years, The range: R=X L X s = 61-23 =38

Co-efficient of Range: R = (X L - X S ) / (X L + X S ) = (61-23) / (61 + 23) =38 /84 = 0.452 Characteristics of Range Simplest and most crude measure of dispersion It is not based on all the observations. Unduly affected by the extreme values and fluctuations of sampling. The range may increase with the size of the set of observations though it can decrease Gives an idea of the variability very quickly

Percentiles, Quartiles (Measure of Relative Standing) and Interquartile Range Descriptive measures that locate the relative position of an observation in relation to the other observations are called measures of relative standing. They are quartiles, deciles and percentiles The quartiles & the median divide the array into four equal parts, deciles into ten equal groups, and percentiles into one hundred equal groups. Given a set of n observations X 1, X 2,. X n, the p th percentile P is the value of X such that p per cent of the observations are less than and 100 p per cent of the observations are greater than P. 25 th percentile = 1 st Quartile i.e., Q 1 50 th percentile = 2 nd Quartile i.e., Q 2 75 th percentile = 3 rd Quartile i.e., Q 3

Q L M Q U Figure 8.1 Locating of lower, mid and upper quartiles

Percentiles, Quartiles and Interquartile Range Contd. Q 1 = Q 2 = Q 3 = n+1 4 2(n+1) 4 3(n+1) 4 th ordered observation th ordered observation th ordered observation Interquartile Range (IQR): The difference between the 3 rd and 1 st quartile. IQR = Q 3 Q 1 Semi Interquartile Range:= (Q 3 Q 1 )/ 2 Coefficient of quartile deviation: (Q 3 Q 1 )/(Q 3 + Q 1 )

Interquartile Range Merits: It is superior to range as a measure of dispersion. A special utility in measuring variation in case of open end distribution or one which the data may be ranked but measured quantitatively. Useful in erratic or badly skewed distribution. The Quartile deviation is not affected by the presence of extreme values. Limitations: As the value of quartile deviation dose not depend upon every item of the series it can t be regarded as a good method of measuring dispersion. It is not capable of mathematical manipulation. Its value is very much affected by sampling fluctuation.

Another measure of relative standing is the z-score for an observation (or standard score). It describes how far individual item in a distribution departs from the mean of the distribution. Standard score gives us the number of standard deviations, a particular observation lies below or above the mean. Standard score (or z -score) is defined as follows: x For a population:z-score= X - µ σ where X =the observation from the population µ the population mean, σ = the population s.d For a sample z-score= X - X s where X =the observation from the sample X the sample mean, s = the sample s.d

Mean Absolute Deviation (MAD) or Mean Deviation (MD) The average of difference of the values of items from some average of the series (ignoring negative sign), i.e. the arithmetic mean of the absolute differences of the values from their average. Note: 1. MD is based on all values and hence cannot be calculated for openended distributions. 2. It uses average but ignores signs and hence appears unmethodical. 3. MD is calculated from mean as well as from median for both ungrouped data using direct method and for continuous distribution using assumed mean method and short-cut-method. 4. The average used is either the arithmetic mean or median

Computation of Mean absolute Deviation For individual series: X 1, X 2, X n M.A.D = X i -X n For discrete series: X 1, X 2, X n & with corresponding frequency f 1, f 2, f n f i X i -X M.A.D = f i X: Mean of the data series.

Computation of Mean absolute Deviation: For continuous grouped data: m 1, m 2, m n are the class mid points with corresponding class frequency f 1, f 2, f n M.A.D = X: Mean of the data series. f i m i -X f i Coeff. Of MAD: = (MAD /Average) The average from which the Deviations are calculated. It is a relative measure of dispersion and is comparable to similar measure of other series.

Problem: Find the MAD of weight and coefficient of MAD of 470 infants born in a hospital in one year from following table. Weight in Kg No. of infant 2.0-2.4 2.5-2.9 3.0-3.4 3.5-3.9 4.0-4.4 4.5+ 17 97 187 135 28 6

Merits and Limitations of MAD Simple to understand and easy to compute. Based on all observations. MAD is less affected by the extreme items than the Standard deviation. Greatest draw back is that the algebraic signs are ignored. Not amenable to further mathematical treatment. MAD gives us best result when deviation is taken from median. But median is not satisfactory for large variability in the data. If MAD is computed from mode, the value of the mode can not be determined always.

Standard Deviation (σ) It is the positive square root of the average of squares of deviations of the observations from the mean. This is also called root mean squared deviation (σ). For individual series: x 1, x 2, x n σ = Σ ( x i x ) 2 ------------ n σ = x i 2 n x i n -( ) 2 For discrete series: X 1, X 2, X n & with corresponding frequency f 1, f 2, f n σ = Σ f i ( x i x ) 2 ------------ Σ f i σ = f i x i 2 f i f i x i -( ) 2 f i

Standard Deviation (σ) Contd. For continuous grouped series with class midpoints : m 1, m 2, m n & with corresponding frequency f 1, f 2, f n σ = Σ f i ( m i x ) 2 ------------ Σ f i σ = f i m i 2 f i Variance: It is the square of the s.d Coefficient of Variation (CV): Corresponding Relative measure of dispersion. CV = σ ------- 100 X f i m i -( ) 2 f i

Characteristics of Standard Deviation: SD is very satisfactory and most widely used measure of dispersion Amenable for mathematical manipulation It is independent of origin, but not of scale If SD is small, there is a high probability for getting a value close to the mean and if it is large, the value is farther away from the mean Does not ignore the algebraic signs and it is less affected by fluctuations of sampling SD can be calculated by : Direct method Assumed mean method. Step deviation method.

It is the average of the distances of the observed values from the mean value for a set of data Basic rule --More spread will yield a larger SD Uses of the standard deviation The standard deviation enables us to determine, with a great deal of accuracy, where the values of a frequency distribution are located in relation to the mean. Chebyshev s Theorem For any data set with the mean µ and the standard deviation σ at least 75% of the values will fall within the 2σ interval and at least 89% of the values will fall within the 3σ interval of the mean

TABLE: Calculation of the standard deviation (σ) Weights of 265 male students at the university of Washington Class-Interval f d fd fd 2 (Weight) 90-99 1-5 -5 25 100-109 1-4 -4 16 110-119 9-3 -27 81 120-129 30-2 -60 120 130-139 42-1 -42 42 140-149 66 0 0 0 150-159 47 1 47 47 160-169 39 2 78 156 170-179 15 3 45 135 180-189 11 4 44 176 190-199 1 5 5 25 200-209 3 6 18 108 n =265 Σƒd= 99 Σƒd 2 = 931 (Σƒd 2 ) σ= n 931 = 265 = = = (Σfd) - 2 (i) n 2 (99) - 2 (10) 265 (3.5132 0.1396) ( 10) (1.8367) (10) 18.37 or 18.4 d = (X i A)/i n = Σf i.a = 144.5, i = 10

Means, standard deviation, and coefficients of variation of the age distributions of four groups of mothers who gave birth to one or more children in the city of minneapol in: 1931 to 1935. Interprete the data CLASSIFICATION X σ CV Resident married 28.2 6.0 21.3 Non-resident married 29.5 6.0 20.3 Resident unmarried 23.4 5.8 24.8 Non-resident unmarried 21.7 3.7 17.1 Example: Suppose that each day laboratory technician A completes 40 analyses with a standard deviation of 5. Technician B completes 160 analyses per day with a standard deviation of 15. Which employee shows less variability?

Uses of Standard deviation Uses of the standard deviation The standard deviation enables us to determine, with a great deal of accuracy, where the values of a frequency distribution are located in relation to the mean. We can do this according to a theorem devised by the Russian mathematician P.L. Chebyshev (1821-1894).

Skewness & Kurtosis Measure of Shape In order to properly understand a distribution, there are two other comparable characteristics called skewness and kurtosis along with measures of central tendency and variability. Two distributions may have the same mean and standard deviation but may differ widely in there overall appearance as seen from the following figures.

Measure of Shape +ve or Right-skewed distribution ve Left-skewed distribution

Definition of Skewness When a series is not symmetrical it is said to be asymmetrical or skewed Croxton and Cowden. Skewness refers to asymmetry or lack of asymmetry in the shape of a frequency distribution Morris Hamburg Measures of skewness tells us the direction and the extent of skewness. In symmetrical distribution the mean, median and mode are identical. The more the mean moves away from the mode, the larger is the asymmetry Simpson and Kafka

Symmetrical distribution The value of mean, median and mode coincide. The spread the frequencies is the same on both sides of the center point of the frequency curve. Asymmetrical distribution A distribution which is not symmetrical is called a skewed or assymmetrical distribution. Such type of distribution could either be positively or negatively skewed. Positively skewed distribution The value of the mean is maximum and that of mode least. The median lies in between the two. Negatively skewed distribution The value of the mode is maximum and that of mean least and the median lies between the two.

In the positively skewed distribution the frequencies are spread out over a greater range of value on the high value end of the curve than they are at the low value end. In the negatively skewed distribution the frequencies are spread out over a greater range of value on the low value end (left side) of the curve than they are on the high value end. In moderately symmetrical distributions the interval between the mean and the median is approximately one third of the interval between the mean and the mode. Difference between dispersion and skewness Dispersion is concerned with the amount of variation rather than with the direction. Skewness tells about the direction of the variation or departures from symmetry. In fact, measures of skewness are dependent upon the amount of disperation.

Test of skewness The value of mean, median and mode do not coincide. When the frequencies are plotted on graph, the frequency curve or histogram do not give the normal bell shaped form. Sum of the positive deviations from the mean is not equal to the some of the negative deviation. Quartiles are not equidistant from the median. Frequencies are not equally distributed at points of equal deviations from the mode.

Absolute Measures of Skewness Skewness can be measured in absolute terms by taking the difference between mean and mode Absolute Sk =Mean - mode. If the value of mean is greater than mode skewness will be positive. If the value of mode greater than mean, skewness will be negative. It would be expressed in the unit of value of the distribution. Therefore, cannot be compared with another comparable data expressed in different units. Distributions vary greatly and the difference between, say, Mean and the Mode in absolute terms might be considerable in one series and small in another, although the frequency curves distributions were similarly skewed. Therefore, we should think of some relative measure of skewness for direct comparison of skewness of two similar data sets.

Relative Measures of Skewness There are four important measures of relative skewness, namely, The Karl Pearson s coefficient of skewness, The Bowley s coefficient of skewness. The Kelly s coefficient of skewness. Measure of skewness based on moments. These measures of skewness should mainly be used for making comparison between two or more distributions. A good measure of skewness should have the following three properties. It should be a pure number in the sense that its value should be independent of the units of the series and also of the degree of variation in the series; It should have a zero value, when the distribution is symmetrical; and Have some meaningful scale of measure so that we could easily interpret the measured value.

It is based upon the difference between mean and mode. This difference is divided buy standard deviation to give a relative measure. The formula thus becomes: Skp = Mean mode Standard Deviation Skp = KarL Pearson s coefficient of skewness There is no limit to this measure in theory and this is a slight drawback. But in practice the value given by this formula is rarely very high and usual lies between +1 to -1. When a distribution is symmetrical, the values of mean, median and mode coincide and, therefore, the coefficient of skewness will be zero. When a distribution is positively skewed, the coefficient of skewness shall have minus sign.

The above method of measuring skewness cannot be used where mode is ill defined. However, in moderately skewed distribution the averages have the following relationship: Mode = 3 Median 2 Mean And therefore, it this value of mode is substituted in the above formula we arrive at another formula for finding out skewness, Skp = 3 (Mean Median) Theoretically, the value of this coefficient varies between ±3; however in practice it is rare that the coefficient of skewness obtained by the above method exceeds ± 1.

Kurtosis: Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the bell-shaped distribution (normal distribution). Kurtosis of a sample data set is calculated by the formula: Kurtosis ( n n( n 1) 1)( n 2)( n 3) n 4 2 xi x 3( n 1) i 1 s ( n 2)( n 3) Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution.

The distributions with positive, negative and null kurtosis are depicted here. The distribution with null kurtosis is normal distribution.

REFERENCE 1. Mathematical Statistics- S.P Gupta 2. Statistics for management- Richard I. Levin, David S. Rubin 3. Biostatistics A foundation for Analysis in the Health Sciences.

THANK YOU