Lectures delivered by Prof.K.K.Achary, YRC

Similar documents
Engineering Mathematics III. Moments

Moments and Measures of Skewness and Kurtosis

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

Simple Descriptive Statistics

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Measures of Central tendency

Frequency Distribution and Summary Statistics

Terms & Characteristics

Data Distributions and Normality

PSYCHOLOGICAL STATISTICS

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Some Characteristics of Data

Establishing a framework for statistical analysis via the Generalized Linear Model

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Descriptive Statistics

2 Exploring Univariate Data

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Fundamentals of Statistics

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Descriptive Statistics for Educational Data Analyst: A Conceptual Note

SOLUTIONS TO THE LAB 1 ASSIGNMENT

chapter 2-3 Normal Positive Skewness Negative Skewness

Lecture 1: Review and Exploratory Data Analysis (EDA)

Numerical summary of data

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Getting to know data. Play with data get to know it. Image source: Descriptives & Graphing

Putting Things Together Part 2

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Steps with data (how to approach data)

Getting to know a data-set (how to approach data) Overview: Descriptives & Graphing

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

SKEWNESS AND KURTOSIS

Measures of Central Tendency: Ungrouped Data. Mode. Median. Mode -- Example. Median: Example with an Odd Number of Terms

E.D.A. Exploratory Data Analysis E.D.A. Steps for E.D.A. Greg C Elvers, Ph.D.

Lecture 2 Describing Data

2.4 STATISTICAL FOUNDATIONS

Introduction to Descriptive Statistics

Summary of Statistical Analysis Tools EDAD 5630

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

2.1 Properties of PDFs

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.

UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES

Lecture 6: Non Normal Distributions

Basic Procedure for Histograms

Math 140 Introductory Statistics

NCSS Statistical Software. Reference Intervals

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

DATA SUMMARIZATION AND VISUALIZATION

Lecture Week 4 Inspecting Data: Distributions

Unit 2 Statistics of One Variable

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Chapter 4-Describing Data: Displaying and Exploring Data

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

DESCRIPTIVE STATISTICS

14.1 Moments of a Distribution: Mean, Variance, Skewness, and So Forth. 604 Chapter 14. Statistical Description of Data

Chapter 4-Describing Data: Displaying and Exploring Data

Review: Types of Summary Statistics

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Description of Data I

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Exploratory Data Analysis (EDA)

Numerical Measurements

STAT 157 HW1 Solutions

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

Diploma in Financial Management with Public Finance

1/2 2. Mean & variance. Mean & standard deviation

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă

STAB22 section 1.3 and Chapter 1 exercises

Statistics 114 September 29, 2012

CHAPTER 2 Describing Data: Numerical

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

HIGHER SECONDARY I ST YEAR STATISTICS MODEL QUESTION PAPER

Chapter 3. Populations and Statistics. 3.1 Statistical populations

Central University of Punjab, Bathinda

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Descriptive Statistics

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Parametric Statistics: Exploring Assumptions.

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Rationale Reference Nattawut Jenwittayaroje, Ph.D., CFA Expected Return and Standard Deviation Example: Ending Price =

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections

Process capability estimation for non normal quality characteristics: A comparison of Clements, Burr and Box Cox Methods

Skewness and the Mean, Median, and Mode *

Section 6-1 : Numerical Summaries

STAT 113 Variability

CHAPTER II LITERATURE STUDY

starting on 5/1/1953 up until 2/1/2017.

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Section-2. Data Analysis

Transcription:

Lectures delivered by Prof.K.K.Achary, YRC

Given a data set, we say that it is symmetric about a central value if the observations are distributed symmetrically about the central value. In symmetrically distributed dataset, the frequency will be maximum at the central value and will decrease in the same pattern on either side of the central value.usually,the central value is mean A symmetric distribution is one where the left and right hand sides of the distribution are roughly equally balanced around the mean.

This can be observed in stem-and-leaf chart, frequency table and in histogram. In a symmetric dataset mean and median are equal/lie very close. Median is equidistant from the two quartiles. The histogram below shows a typical symmetric distribution

Data set showing departure from symmetry are asymmetric or skewed. In a skewed distribution/ dataset, the frequency curve has a long tail. Skewness is right -tailed or positive,if the tail extends to the right,i.e. towards larger values. Skewness is left- tailed or negative if the tail extends to the left,i.e. towards smaller values. The following frequency curve shows positive skewness. Draw the frequency curve of negative skewed distribution.

In a positive skewed distribution, the mean is typically greater than the median. Median is closer to the first quartile. In a negatively skewed distribution, the mean is typically smaller than the median. Median is closer to third quartile. An important comment: The relative positions of mean, median and mode in skewed distributions are often given as follows: For + ve skewed distribution mean > median > mode. For ve skewness, mean < median < mode This relationship is not always true!!!!!!!

Is the following data set symmetric, skewed right or skewed left? 27 ; 28 ; 30 ; 32 ; 34 ; 38 ; 41 ; 42 ; 43 ; 44 ; 46 ; 53 ; 56 ; 62 Answer : The statistics of the data set are mean: 41.14 first quartile: 31.75; median: 41.5; third quartile: 47.75.

We can conclude that the data set is leftskewed( negatively skewed ) for two reasons. The mean is less than the median. There is only a very small difference between the mean and median, so this is not a very strong reason. A better reason is that the median is closer to upper quartile than the lower quartile.

The following data set: 11.2 ; 5 ; 9.4 ; 14.9 ; 4.4 ; 18.8 ; 0.4 ; 10.5 ; 8.3 ; 17.8 The statistics of the data set are mean: 9.99; first quartile: 6.65; median: 9.95; third quartile: 13.05.

Note that we get contradicting indications from the different ways of determining whether the data is skewed right or left. The mean is slightly greater than the median. This would indicate that the data set is skewed right. The median is slightly closer to the third quartile than the first quartile. This would indicate that the data set is skewed left. Since these differences are so small and since they contradict each other, we conclude that the data set is symmetric.

Karl Pearson s coefficient of skewness: = ( mean mode )/S.D. If + ve, then the data is positively skewed Bowley s coefficient= ( Q3 + Q1-2*M)/( Q3 -Q1) How to interpret this? In a moderately skewed distribution (mean - mode)= 3(mean median ) The measures are free from unit of measurement. One more measure based on third central moment and variance is also used.(β1) HW: Find the nature of skewness of systolic BP data for three groups of individuals.

Thumb rules to interpret skewness: If skewness measure is between -1/2 and +1/2 the data set is approximately symmetric. If skewness is between 1 and ½ or between +½ and +1, the distribution is moderately skewed. If skewness is less than 1 or greater than +1, the distribution is highly skewed.

Caution: This is an interpretation of the data you actually have. When you have data for the whole population, that s fine. But when you have a sample, the sample skewness doesn t necessarily apply to the whole population. In that case the question is, from the sample skewness, can you conclude anything about the population skewness? ( Inference is difficult!)

Histogram gives a fairly good idea about the nature of skewness in your data. Stem and leaf plot also helps. It is important to understand the nature of skewness in your data,because the inference techniques vary for skewed data and normal(symmetric) data. The parametric inference largely relies on the assumption of normality in your data.presence of asymmetry is indication of nonnormality.

Central tendency,variability and shape are the important characteristics of a data set. The shape of a distribution is described by skewness and kurtosis. While skewness describes asymmetry in shape, kurtosis typically describes peakedness of data set/distribution.

Based on the extent of peakedness, kurtosis is categorised into three types. Mesokurtic distribution- ideal or benchmark distribution- normal distribution. The peakedness of other distributions is compared with this distribution. Leptokurtic distribution- a distribution which is more peaked than mesokurtic Platykurtic distribution - distribution which is flatter than mesokurtic.

The ratio of fourth central moment to the square of the variance is used as a coefficient of kurtosis,denoted as For a mesokurtic dist. For a leptokurtic dist. For platykurtic dist. 2 2 3 2 3 2 3 A normal distribution is mesokurtic The following graph shows the three curves

In a leptokurtic dist. more observations cluster around the mean and the spread may be less. In a platykurtic dist. the observations are less concentrated around the mean and hence spread may be more. Some remarks: Describing kurtosis in terms of peakedness alone is not correct. It should take into consideration the tails of the distribution also.

If we consider the graphs of three symmetric curves with common mean (=0) and variances of 2,0.5 and 1.0,the curve with variance 0.5 looks more peaked and the curve with variance 2 looks less peaked than the curve with variance 1. But all curves represent normal distribution and hence all are mesokurtic. We have to be very careful when comparing kurtosis of distributions with different variances.

Kurtosis as a descriptive measure of data is usually not discussed much in research applications. Since in most of the data analysis, the focus is on normality assumption, researchers ignore kurtosis. But, skewness and kurtosis are very important to understand departure from normality. Kurtosis of any distribution is studied in relation to a normal distribution.