Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Similar documents
Some estimates of the height of the podium

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Simple Descriptive Statistics

1 Describing Distributions with numbers

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

appstats5.notebook September 07, 2016 Chapter 5

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Lecture 1: Review and Exploratory Data Analysis (EDA)

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Empirical Rule (P148)

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Description of Data I

STAT 113 Variability

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

2 Exploring Univariate Data

Monte Carlo Simulation (Random Number Generation)

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Mini-Lecture 3.1 Measures of Central Tendency

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

4. DESCRIPTIVE STATISTICS

Lecture Week 4 Inspecting Data: Distributions

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

STAT 157 HW1 Solutions

Numerical Descriptions of Data

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Section 6-1 : Numerical Summaries

Exploring Data and Graphics

Section3-2: Measures of Center

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections

Chapter 3. Lecture 3 Sections

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Describing Data: One Quantitative Variable

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Descriptive Statistics

Lecture 2 Describing Data

Statistics I Chapter 2: Analysis of univariate data

CHAPTER 2 Describing Data: Numerical

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Some Characteristics of Data

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Frequency Distribution and Summary Statistics

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Basic Sta)s)cs. Describing Data Measures of Spread

Lecture 07: Measures of central tendency

Unit 2 Statistics of One Variable

Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation.

Applications of Data Dispersions

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

SOLUTIONS TO THE LAB 1 ASSIGNMENT

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Basic Procedure for Histograms

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

ECO220Y, Term Test #2

Descriptive Statistics (Devore Chapter One)

Descriptive Analysis

1) What is the range of the data shown in the box and whisker plot? 2) True or False: 75% of the data falls between 6 and 12.

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

CHAPTER 6 Random Variables

The Normal Distribution

Section 7.4 Transforming and Combining Random Variables (DAY 1)

STA 248 H1S Winter 2008 Assignment 1 Solutions

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Normal Model (Part 1)

Honors Statistics. 3. Discuss homework C2# Discuss standard scores and percentiles. Chapter 2 Section Review day 2016s Notes.

Ti 83/84. Descriptive Statistics for a List of Numbers

FINALS REVIEW BELL RINGER. Simplify the following expressions without using your calculator. 1) 6 2/3 + 1/2 2) 2 * 3(1/2 3/5) 3) 5/ /2 4

Section 6.2 Transforming and Combining Random Variables. Linear Transformations

Chapter 6: Random Variables

IOP 201-Q (Industrial Psychological Research) Tutorial 5

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Chapter 6: Random Variables

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

Chapter 6 Part 3 October 21, Bootstrapping

How Wealthy Are Europeans?

DATA SUMMARIZATION AND VISUALIZATION

Descriptive Statistics Bios 662

= P25 = Q1 = = P50 = Q2 = = = P75 = Q3

Introduction to Descriptive Statistics

3.1 Measures of Central Tendency

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

starting on 5/1/1953 up until 2/1/2017.

Source: Fall 2015 Biostats 540 Exam I. BIOSTATS 540 Fall 2016 Practice Test for Unit 1 Summarizing Data Page 1 of 6

AP Stats ~ Lesson 6B: Transforming and Combining Random variables

CHAPTER 6 Random Variables

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Monte Carlo Simulation (General Simulation Models)

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Numerical Descriptive Measures. Measures of Center: Mean and Median

Data screening, transformations: MRC05

Transcription:

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Lecture 3 Reading: Sections 5.7 54 Remember, when you finish a chapter make sure not to miss the last couple of boxes: What Can Go Wrong? and Ethics in Action 1 Measures of Relative Standing: Percentiles Fraction.6.5 3 World bank data again n = 174 countries, bin width = 5 828 56.747 What is approx. median (5 th percentile)? 2 th percentile? 85.64 th percentile? 45 15.57.57.57.57 2 4 6 Inflation Rate, 211 2. su inflation_211, detail Reading STATA Output inflation_211 ------------------------------------------------------------- Percentiles Smallest 1% -2.517798-4.895247 5%.922363-2.517798 1% 2.75173-644478 Obs 174 25% 32996-833333 Sum of Wgt. 174 5% 4.977675 Mean 6.646499 Largest Std. Dev. 6.77998 75% 853968 26.921 9% 123155 332422 Variance 45.96813 95% 17.71178 477686 Skewness 3.7732 99% 477686 53287 Kurtosis 22.85972 Median? Range? Sample size? 3 Lecture 3, Page 1 of 8

Trips Freq. Percent Cum. Trips Freq. Percent Cum. 294 35.85 35.85 19 1 2 95.85 1 76 97 452 2 3 7 962 2 66 8.5 537 21 2 4 966 3 58 7.7 64 22 4 9 96.95 4 47 5.73 65.98 23 1 2 97.7 5 47 5.73 71.71 24 4 9 97.56 6 36 49 76 25 2 4 97.8 7 3 3.66 79.76 26 4 9 989 8 28 31 837 27 2 4 98.54 9 15 1.83 85. 28 3 7 98.9 1 9 1 86 3 1 2 99 11 16 1.95 88.5 34 1 2 995 12 25 3.5 91 35 1 2 997 13 9 1 92 36 1 2 999 14 5.61 92.8 41 1 2 99.51 15 9 1 93.9 43 1 2 99.63 16 5.61 94.51 44 1 2 99.76 17 6.73 954 45 1 2 99.88 18 4 9 95.73 5 1 2 1. cont d Total 82 1. What is the median? What is the 75 th percentile? 4 Discrete Histogram (bin width = 1) 5 1 2 3 4 5 Number Fishing Trips 5 Reading STATA Output. summarize Number_of_Trips, detail; Number_of_Trips ------------------------------------------------------------- Percentiles Smallest 1% 5% 1% Obs 82 25% Sum of Wgt. 82 5% 2 Mean 4.52439 Largest Std. Dev. 6.684273 75% 6 43 9% 12 44 Variance 44.6795 95% 17 45 Skewness 2.717188 99% 3 5 Kurtosis 1381 How can the 1 th percentile and the 25 th percentile both be zero? 6 Lecture 3, Page 2 of 8

One Popular Use of Percentiles Quartiles: 1 st quartile: obs btwn th and 25 th percentiles 2 nd quartile: obs btwn 25 th and 5 th percentiles 3 rd quartile: obs btwn 5 th and 75 th percentiles 4 th quartile: obs btwn 75 th and 1 th percentiles Quintiles: Divide variable into fifths: e.g. top quintile includes obs btwn 8 th and 1 th percentiles Deciles: Divide variable into tenths: e.g. bottom decile includes obs btwn th and 1 th percentiles Note: You are responsible for knowing the meaning of these terms if they appear on a test, exam, etc. 7 Practice Reading and Interpreting Alesina et al (21) Why Doesn t the United States Have a European-Style Welfare State? What do these numbers mean? How should they be interpreted? 8 Interquartile Range (IQR) Interquartile range: 75 th percentile minus 25 th percentile Measures spread of middle observations What does it measure? 9 Lecture 3, Page 3 of 8

Boxplot of Inflation Distribution, n = 174 countries LAV Median 75 th Percentile Upper Adjacent Value (UAV) UAV marks biggest obs. within 1.5 IQR s of the 75 th percentile Outside Values 25 th Percentile whisker 2 4 6 Inflation Rate, 211 1 x1 x2 x3-4 -2 2 4-2 2 4.5-4 -2 2 4.6.8 1-2 -1 1 2 11 x1 x2 x3 4 6 8 1 12.6 4 6 8 1 12.6.8 4 6 8 1 12.5 1 1.5 2 2.5 4 6 8 1 12 12 Lecture 3, Page 4 of 8

x1 x2 x3 6 8 1 12 14 6 8 1 12 14.5 5 6 8 1 12 14 6 8 1 12 14 13 Discrete Histogram (bin width = 1) How would the box plot of the Wisconsin fishing trip data be unusual? 5 1 2 3 4 5 Number Fishing Trips 14 Outliers Outliers: extremely large or small values different from the bulk of the data Robust: not sensitive to outliers Is the sample mean a robust measure of central tendency? Is the sample median robust? However, the mean retains more information from sample & has useful statistical properties Is the IQR robust? variance? 15 Lecture 3, Page 5 of 8

Charitable Donors: Stats Can http://www5.statcan.gc.ca/cansim/a5?lang=eng&id=1112&pattern=1112&searchtypebyvalue=1&p2=35 Donors and donations 211 Number of taxfilers 4 24,841,63 Number of donors 2,3 5,79,7 Percentage of donors aged to 24 years 2,3,6 3 Percentage of donors aged 25 to 34 years 2,3,6 12 Percentage of donors aged 35 to 44 years 2,3,6 17 Percentage of donors aged 45 to 54 years 2,3,6 23 Percentage of donors aged 55 to 64 years 2,3,6 21 Percentage of donors aged 65 years and over 2,3,6 25 2 Charitable donor is defined as a taxfiler reporting a charitable donation amount on line 34 of the personal income tax form. Average Age of Donors? 16 Section 5.7 Grouped Data tells how to approximate the mean & s.d. with grouped data % aged to 24 [21] 3 % aged 25 to 34 [29.5] 12 % aged 35 to 44 [39.5] 17 % aged 45 to 54 [49.5] 23 % aged 55 to 64 [59.5] 21 % aged 65 and over [7] 25 Mean 21 + 2 29.5 + 7 39.5 + 3 49.5 + 1 59.5 + 5 7 52 years What if we use 75 years old for last category? Then mean 53.5. What if we use 12 years old for first category? Then mean 52.. 17 Logic of Calculation: Smaller Example Survey a random sample of 4 A&S students asking how many courses are you currently taking. A tabulation: num_courses Freq. Percent Cum. ------------+----------------------------------- 2 3 7.5 7.5 4 7 17.5 25. 5 28 7. 95. 6 2 5. 1. ------------+----------------------------------- Total 4 1. 4 X = i=1 n x i = 3 i=1 2 + 7 4 i=1 + i=1 5 + i=1 6 4 =.75 2 + 75 4 +.7 5 +.5 6 = 4.65 28 2 = 3 2 + 7 4 + 28 5 + 2 6 4 18 Lecture 3, Page 6 of 8

Similarly for standard deviation num_courses Freq. Percent Cum. ------------+----------------------------------- 2 3 7.5 7.5 4 7 17.5 25. 5 28 7. 95. 6 2 5. 1. ------------+----------------------------------- Total 4 1. s = 4 x i X 2 i=1 n 1 = 3 i=1 2 4.65 2 + 4 4.65 2 28 i=1 7 i=1 + 5 4.65 2 + 6 4.65 2 4 2 i=1 4 39 =.75 2 4.65 2 + 75 4 4.65 2 +.7 5 4.65 2 +.5 6 4.65 2 4 39 =.89 And, if you ignore 4/39, you get.88 (very close to right answer) 19 Standard Deviation of Age of Donors? % aged - 24 [21] 3 % aged 25-34 [29.5] 12 % aged 35-44 [39.5] 17 % aged 45-54 [49.5] 23 % aged 55-64 [59.5] 21 % aged 65 & over [7] 25 s 2 21 52 2 + 2 29.5 52 2 + 7 39.5 52 2 + 3 49.5 52 2 + 1 59.5 52 2 + 5 7 52 2 = 21.6 years 2 s. d. 21.6 = 14.5 years 2 Standardization ( z-scores ) Standardize: z = x X s x z: how many s.d. s a value is from the mean (+ if above; - if below) Z has a mean of and s.d. of 1 and no units Eg: mean inflation 6.64, s.d. 6.78; 2.91 in Canada: z=-.55=(2.91-6.64)/6.78 What does -.55 mean?.8.6.6 Inflation Rate, 211 n = 174 countries 2 4 6 Inflation Rate, 211 Inflation Rate, 211 n = 174 countries -2 2 4 6 standardized (z-scores) 21 Lecture 3, Page 7 of 8

Linear Transformations Linear transformation can be written as Y = a + bx where a and b are constants Linear transformation of X? Y = 2 X Y = X 2 1 = (X 1)(X + 1) Y = (X - 1)/2 Linear transformations change scale of a variable but not shape of the distribution Standardization is a linear transformation 22 Fraction 5 1 15 Gov t debt (% GDP), 21 Fraction mean = 587, med = 494 sd = 347.6 mean = 53, med = 5 sd = 22.58-1 -5 5 Change: 25 to 21 5 1 15 2 Gov t debt (% GDP), 25 Fraction mean = 534, med = 46.65 sd = 35.9 Change = Debt1 Debt5 53 = 587 534 Linear combinations have simple effect on mean. But this does not work (at all) for median or sd. World Bank data again, Central gov t debt, n = 48 countries 23 Fraction Fraction.5 mean=14955, med=91 sd = 16243 4 8 GDP per capita mean=14.955, med=9 sd = 1643.5 2 4 6 8 1 GDP per capita ($1s) Fraction5.5 mean=8.972, med=916 sd = 149 6 7 8 9 1 11 ln(gdp per capita) Non-linear transformations (natural log is very popular) can often transform skewed data to be more symmetric. Linear transformations (such as changing units) do not affect the shape at all. CIA data again, US$, PPP, 212 est., n = 185 countries 24 Lecture 3, Page 8 of 8