Introduction to Descriptive Statistics

Similar documents
Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

2 Exploring Univariate Data

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

DATA SUMMARIZATION AND VISUALIZATION

Frequency Distribution and Summary Statistics

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Simple Descriptive Statistics

Lecture 2 Describing Data

SPSS I: Menu Basics Practice Exercises Target Software & Version: SPSS V Last Updated on January 17, 2007 Created by Jennifer Ortman

Exploring Data and Graphics

Description of Data I

Lecture Week 4 Inspecting Data: Distributions

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Numerical Descriptions of Data

1 Describing Distributions with numbers

Data Distributions and Normality

Unit 2 Statistics of One Variable

Engineering Mathematics III. Moments

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Section3-2: Measures of Center

PSYCHOLOGICAL STATISTICS

Fundamentals of Statistics

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Getting to know a data-set (how to approach data) Overview: Descriptives & Graphing

Some Characteristics of Data

Describing Data: One Quantitative Variable

CSC Advanced Scientific Programming, Spring Descriptive Statistics

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Basic Procedure for Histograms

Getting to know data. Play with data get to know it. Image source: Descriptives & Graphing

Steps with data (how to approach data)

Moments and Measures of Skewness and Kurtosis

STAB22 section 1.3 and Chapter 1 exercises

appstats5.notebook September 07, 2016 Chapter 5

Chapter 6 Simple Correlation and

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Descriptive Statistics

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

3.1 Measures of Central Tendency

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Descriptive Statistics Bios 662

Descriptive Statistics

2 DESCRIPTIVE STATISTICS

Empirical Rule (P148)

Lectures delivered by Prof.K.K.Achary, YRC

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Some estimates of the height of the podium

SOLUTIONS TO THE LAB 1 ASSIGNMENT

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

chapter 2-3 Normal Positive Skewness Negative Skewness

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Putting Things Together Part 2

Descriptive Analysis

Terms & Characteristics

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Establishing a framework for statistical analysis via the Generalized Linear Model

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

STAT 113 Variability

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

y axis: Frequency or Density x axis: binned variable bins defined by: lower & upper limits midpoint bin width = upper-lower Histogram Frequency

Section 6-1 : Numerical Summaries

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

22.2 Shape, Center, and Spread

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Measures of Central Tendency: Ungrouped Data. Mode. Median. Mode -- Example. Median: Example with an Odd Number of Terms

Numerical Descriptive Measures. Measures of Center: Mean and Median

Measures of Central tendency

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Statistics 114 September 29, 2012

DATA ANALYSIS EXAM QUESTIONS

Lecture 1: Review and Exploratory Data Analysis (EDA)

DATA HANDLING Five-Number Summary

CH 5 Normal Probability Distributions Properties of the Normal Distribution

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

DESCRIPTIVE STATISTICS

The normal distribution is a theoretical model derived mathematically and not empirically.

HIGHER SECONDARY I ST YEAR STATISTICS MODEL QUESTION PAPER

Numerical summary of data

Statistics I Chapter 2: Analysis of univariate data

STAT 157 HW1 Solutions

Ti 83/84. Descriptive Statistics for a List of Numbers

Converting to the Standard Normal rv: Exponential PDF and CDF for x 0 Chapter 7: expected value of x

David Tenenbaum GEOG 090 UNC-CH Spring 2005

The Normal Distribution

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Example: Histogram for US household incomes from 2015 Table:

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Transcription:

Introduction to Descriptive Statistics 17.871

Types of Variables ~Nominal (Quantitative) Nominal (Qualitative) categorical Ordinal Interval or ratio

Describing data Moment Non-mean based measure Center Mean Mode, median Spread Variance (standard deviation) Skew Skewness -- Peaked Kurtosis -- Range, Interquartile range

Population vs. Sample Notation Population Vs Sample Greeks Romans μ, σ, β s, b

Mean n i1 x i X n

Variance, Standard Deviation n i1 ( x i n ) 2 2, n i1 ( x i n ) 2

Variance, S.D. of a Sample n i1 ( x i n 1 ) 2 s 2, Degrees of freedom n i1 ( x i n 1 ) 2 s

Binary data X prob ( X ) 1 proportion of time x 1 s 2 x x(1 x) s x x(1 x)

Example Frequency IQ SAT Height Value No skew Zero skew Symmetrical

Skewness Asymmetrical distribution Frequency GPA of MIT students Negative skew Left skew Value

Skewness (Asymmetrical distribution) Frequency Income Contribution to candidates Populations of countries Residual vote rates Value Positive skew Right skew

Skewness Frequency Value

Kurtosis Frequency k > 3 leptokurtic k = 3 mesokurtic k < 3 platykurtic Value

Normal distribution Skewness = 0 Kurtosis = 3 f ( x) 1 2 e ( x)/ 2 2

The z-score or the standardized score z x x x

More words about the normal curve

Commands in STATA for getting univariate statistics summarize varname summarize varname, detail histogram varname, bin() start() width() density/fraction/frequency normal graph box varnames tabulate [NB: compare to table]

Example of Sophomore Test Scores High School and Beyond, 1980: A Longitudinal Survey of Students in the United States (ICPSR Study 7896) totalscore = % of questions answered correctly minus penalty for guessing recodedtype = (1=public school, 2=religious private, 3 = non-sectarian private)

Explore totalscore some more. table recodedtype,c(mean totalscore) -------------------------- recodedty pe mean(totals~e) ----------+--------------- 1.3729735 2.4475548 3.589883 --------------------------

0.5 Density 1 1.5 2 Graph totalscore. hist totalscore -.5 0.5 1 totalscore

0.5 Density 1 1.5 2 Divide into bins so that each bar represents 1% correct hist totalscore,width(.01) (bin=124, start=-.24209334, width=.01) -.5 0.5 1 totalscore

0.5 Density 1 1.5 2 Add ticks at each 10% mark histogram totalscore, width(.01) xlabel(-.2 (.1) 1) (bin=124, start=-.24209334, width=.01) -.2 -.1 0.1.2.3.4.5.6.7.8.9 1 totalscore

0.5 Density 1 1.5 2 Superimpose the normal curve (with the same mean and s.d. as the empirical distribution). histogram totalscore, width(.01) xlabel(-.2 (.1) 1) normal (bin=124, start=-.24209334, width=.01) -.2 -.1 0.1.2.3.4.5.6.7.8.9 1 totalscore

0 1 2 3 Density 0 1 2 3 Histograms by category.histogram totalscore, width(.01) xlabel(-.2 (.1)1) by(recodedtype) (bin=124, start=-.24209334, width=.01) 1 2 -.2 -.1 0.1.2.3.4.5.6.7.8.9 1 3 -.2 -.1 0.1.2.3.4.5.6.7.8.9 1 totalscore Graphs by recodedtype

Brief exercise: red versus blue states? Open CCES.dta from Examples folder in course locker Is America polarized? Create a histogram of partisan identification (pid7) by state Necessary commands: tab (with option nolabel), recode, collapse, histogram Do most states fall within two standard deviations? Create z-scores and use tabulate

Main issues with histograms Proper level of aggregation Non-regular data categories

A note about histograms with unnatural categories From the Current Population Survey (2000), Voter and Registration Survey How long (have you/has name) lived at this address? -9 No Response -3 Refused -2 Don't know -1 Not in universe 1 Less than 1 month 2 1-6 months 3 7-11 months 4 1-2 years 5 3-4 years 6 5 years or longer

Solution, Step 1 Map artificial category onto natural midpoint -9 No Response missing -3 Refused missing -2 Don't know missing -1 Not in universe missing 1 Less than 1 month 1/24 = 0.042 2 1-6 months 3.5/12 = 0.29 3 7-11 months 9/12 = 0.75 4 1-2 years 1.5 5 3-4 years 3.5 6 5 years or longer 10 (arbitrary)

Fraction Graph of recoded data histogram longevity, fraction.557134 0 0 1 2 3 4 5 6 7 8 9 10 longevity

Density plot of data Total area of last bar =.557 Width of bar = 11 (arbitrary) Solve for: a = w h (or).557 = 11h => h =.051 0 0 1 2 3 4 5 6 7 8 9 10 longevity 15

Density plot template Category Fraction X-min X-max X-length Height (density) < 1 mo..0156 0 1/12.082.19* 1-6 mo..0909 1/12 ½.417.22 7-11 mo..0430 ½ 1.500.09 1-2 yr..1529 1 2 1.15 3-4 yr..1404 2 4 2.07 5+ yr..5571 4 15 11.05 * =.0156/.082

-.5 0.5 1 Draw the previous graph with a box plot. graph box totalscore Upper quartile Median Lower quartile } 1.5 x IQR } Inter-quartile range

-.5 0.5 1 -.5 0.5 1 Draw the box plots for the different types of schools. graph box totalscore,by(recodedtype) 1 2 3 Graphs by recodedtype

-.5 0.5 1 Draw the box plots for the different types of schools using over option graph box totalscore,over(recodedtype) 1 2 3

Three words about pie charts: don t use them

So, what s wrong with them For non-time series data, hard to get a comparison among groups; the eye is very bad in judging relative size of circle slices For time series, data, hard to grasp crosstime comparisons

Some Words about Graphical Presentation Aspects of graphical integrity (following Edward Tufte, Visual Display of Quantitative Information) Represent number in direct proportion to numerical quantities presented Write clear labels on the graph Show data variation, not design variation Deflate and standardize money in time series