Lecture Week 4 Inspecting Data: Distributions

Similar documents
Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Descriptive Statistics

Lecture 2 Describing Data

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

2 Exploring Univariate Data

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Frequency Distribution and Summary Statistics

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

appstats5.notebook September 07, 2016 Chapter 5

Lecture 1: Review and Exploratory Data Analysis (EDA)

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

DATA SUMMARIZATION AND VISUALIZATION

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Describing Data: One Quantitative Variable

Section3-2: Measures of Center

Description of Data I

Getting to know a data-set (how to approach data) Overview: Descriptives & Graphing

Data Distributions and Normality

1 Describing Distributions with numbers

Numerical Descriptions of Data

STAT 113 Variability

Descriptive Analysis

Some estimates of the height of the podium

Fundamentals of Statistics

CHAPTER 2 Describing Data: Numerical

4. DESCRIPTIVE STATISTICS

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

CSC Advanced Scientific Programming, Spring Descriptive Statistics

Descriptive Statistics

Steps with data (how to approach data)

Basic Procedure for Histograms

Quantitative Analysis and Empirical Methods

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

The Mode: An Example. The Mode: An Example. Measure of Central Tendency: The Mode. Measure of Central Tendency: The Median

Mini-Lecture 3.1 Measures of Central Tendency

SOLUTIONS TO THE LAB 1 ASSIGNMENT

Getting to know data. Play with data get to know it. Image source: Descriptives & Graphing

DATA HANDLING Five-Number Summary

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Math Take Home Quiz on Chapter 2

David Tenenbaum GEOG 090 UNC-CH Spring 2005

PSYCHOLOGICAL STATISTICS

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

Introduction to Descriptive Statistics

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Unit 2 Statistics of One Variable

2CORE. Summarising numerical data: the median, range, IQR and box plots

Ti 83/84. Descriptive Statistics for a List of Numbers

CHAPTER 2 DESCRIBING DATA: FREQUENCY DISTRIBUTIONS AND GRAPHIC PRESENTATION

22.2 Shape, Center, and Spread

Descriptive Statistics Bios 662

Empirical Rule (P148)

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Putting Things Together Part 2

Topic 8: Model Diagnostics

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

3.1 Measures of Central Tendency

Simple Descriptive Statistics

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

Statistics I Chapter 2: Analysis of univariate data

NOTES: Chapter 4 Describing Data

Some Characteristics of Data

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Applications of Data Dispersions

Descriptive Statistics (Devore Chapter One)

Monte Carlo Simulation (Random Number Generation)

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

1. In a statistics class with 136 students, the professor records how much money each

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Key: 18 5 = 1.85 cm. 5 a Stem Leaf. Key: 2 0 = 20 points. b Stem Leaf. Key: 2 0 = 20 cm. 6 a Stem Leaf. Key: 4 3 = 43 cm.

Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree

SUMMARY STATISTICS EXAMPLES AND ACTIVITIES

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

STAB22 section 1.3 and Chapter 1 exercises

Establishing a framework for statistical analysis via the Generalized Linear Model

Misleading Graphs. Examples Compare unlike quantities Truncate the y-axis Improper scaling Chart Junk Impossible to interpret

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Solution Manual for Essentials of Business Statistics 5th Edition by Bowerman

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă

SPSS I: Menu Basics Practice Exercises Target Software & Version: SPSS V Last Updated on January 17, 2007 Created by Jennifer Ortman

Data screening, transformations: MRC05

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Transcription:

Lecture Week 4 Inspecting Data: Distributions Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit

So next week No lecture & workgroups But Practice Test on-line (BB) Enter data for your own research Practice SPSS skills with own data

Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and Variability Graphical displays (2): Boxplots Read: Leary: Chapter 6 Howell: Chapter 2

Types of descriptive research Survey research Demographic research Attitudes, lifestyles, behaviors, problems Patterns of basic life events: birth, marriage, migration, death. Epidemiological research Occurrence of disease and death

3 types of surveys Cross-sectional Successive independent samples Longitudinal (panel survey design) One-shot cross-section of the population Changes over time Different respondents each time! Are samples comparable? Changes over time Same respondents more than once! Drop out

Describing and presenting data 3 criteria for a good description: 1) Accurate 2) concise Trade-off 3) comprehensible - Loss of information - Possible distortion Data can be presented in numerical and graphical format TIP: Always start with graphs Beware: Scale of measurement?!?

( y y) ) How to describe a distribution? A) Overall pattern 1) Shape - number of peaks (uni-, bi- of multi-modal)? - symmetrical or skewed? 2) Central tendency / Location: midpoint 3) Spread: a little or a lot? B) Deviations from the pattern - Outliers: observations that lie far from the majority - Tails: thick or thin?

Frequency distributions: Example How do children recall stories? Respondents: 25 children Task: Tell researcher about a movie Dependent variable: number of and then statements (see Howell, Exercise 2.1, p.55)

Raw data and frequency distributions Table 1. # and then statements 18 17 16 18 15 15 18 16 20 18 22 20 17 21 17 19 17 21 20 19 18 12 23 20 20 Table 2. # and then statements Score f P 12 1 0.04 15 2 0.08 16 2 0.08 17 4 0.16 18 5 0.20 19 2 0.08 20 5 0.20 21 2 0.08 22 1 0.04 23 1 0.04 Total 25 1.00

Absolute and relative frequencies Absolute frequency (f) = Number of respondents with a given score Disadvantage: hard to interpret / compare Relative frequency (P) = Proportion of the total with a given score (P = f / n) Advantage: easy to interpret Note: 0 < P < 1 P x 100 = %

SPSS: Frequencies - Menu Analyze > Desciptive Statistics > Frequencies

SPSS: Frequencies Dialog box

SPSS: Frequencies - Output

Grouped frequency distribution (1) Simple frequency distributions unclear in case of: - small number of participants in each category and/or - variables with many categories Solution: grouped frequency table Distribute the raw data over K class intervals and make a new frequency distribution Make sure all intervals are: - exhaustive and mutually exclusive - of equal width

Grouped frequency distribution (2) Rule 1: number of classes (K) = n Rule 2: class interval width (I) = range / number of classes (Range (R) = highest score lowest score) In our example Number of intervals = 25 = 5 Range = 23 12 = 11 Interval width = 11 / 5 2 or 3 Score f P 12-14 1 0.04 15-17 8 0.32 18-20 12 0.48 21-23 4 0.16 total 25 1.00

SPSS: Grouped frequency distribution (1)

SPSS: Grouped frequency distribution (2) 1 2

SPSS: Grouped frequency distribution (3) 2 1 3

SPSS: Grouped frequency distribution (4) 1 2

SPSS: Grouped frequency distribution (5)

Cumulative frequency distributions (1) Class interval Real lower limit Real upper limit 12-14 11.5 14.5 13 15-17 14.5 17.5 16 18-20 17.5 20.5 19 21-23 20.5 23.5 22 Total Real lower limit = lower limit 0.5 Real upper limit = upper limit + 0.5 Midpoint = upper limit + lower limit / 2 Midpoint f P F

Cumulative frequency distributions (2) Class interval Real lower limit Real upper limit Midpoint f P F 12-14 11.5 14.5 13 1 0.04 15-17 14.5 17.5 16 8 0.32 18-20 17.5 20.5 19 12 0.48 21-23 20.5 23.5 22 4 0.16 Total 25 1.00 F = Cumulative Relative Frequency (CRF): add all previous proportions.

Cumulative frequency distributions (3) Class interval Real lower limit Real upper limit Midpoint f P F 12-14 11.5 14.5 13 1 0.04 0.04 15-17 14.5 17.5 16 8 0.32 0.36 18-20 17.5 20.5 19 12 0.48 0.84 21-23 20.5 23.5 22 4 0.16 1.00 Total 25 1.00 NB. Also possible: cumulative absolute frequency

( y y) ) Cumulative frequency distributions (4) The cumulative relative frequency polygon graphs the possibility that someone has a score of X or lower.

Count Count Graphical displays: Nominal / Ordinal Raw data Grouped Bar 4 6 3 2 4 1 2 0 2 3 4 5 6 score 7 8 9 0 2-3 4-5 score 6-7 8-9 Pie 9 2 3 8-9 2-3 8 4 7 6 6-7 4-5 5

Graphical displays: Interval Histograms Stem & Leaf Display Freq. Stem & Leaf 1,00 Extremes (=<12,0) 2,00 15. 00 2,00 16. 00 4,00 17. 0000 5,00 18. 00000 2,00 19. 00 5,00 20. 00000 2,00 21. 00 1,00 22. 0 1,00 23. 0 Stem width: 1 Each leaf: 1 case(s)

Histogram symmetrical or skewed? Symmetrical Negatively skewed Positively skewed

SPSS: Graphs Chart Builder / Legacy Dialogs

SPSS: Graphs > Legacy Dialogs

SPSS - Graphs > Chart builder 3 1 2

Measures of central tendency 1. Mode (Mo) = most common score 2. Median (Mdn) = middle score (50 th percentile) Median location 1 N 2 3. Mean (M) = average x x 1 x 2... n x n or x 1 n x i

s2 sxx Central tendency and skewness Shape Mode Median Mean positive skew symmetrical negative skew A B C A A A C B A

Measures of variability 1. Range (R) = Highest score Lowest score 2. Interquartile range (IQR) = Q3 Q1 3. Standard deviation (s or σ) = spread around the mean 4. Variance (s² or σ²) = spread around the mean

Variance and standard deviation Score Deviation Squared x 1 x 2 x 3 x x 1 x n i x x 2 x x 3 x n x ( x1 x) ( x2 x) ( x x Sum x 0 0 2 2 2 3 ) 2 ( x n x) Variance s Standard deviation s 2 x x ( x n i x) 1 ( xi x) n 1 2 2 The standard deviation and variance are: only suitable as measures of spread around the mean Not robust against outliers

Five-number summary and boxplot Five-number summary consists of: Minimum = Lowest (non-outlying) score Q1 = 25 th percentile (25% lower, 75% higher) Median (=Q2) = 50 th percentile Q3 = 75 th percentile Maximum = Highest (non-outlying) score Graphical display: Boxplot

Boxplot - Example Data: 3 13 17 19 22 24 25 28 35 39 44 45 83 86 93 Nummerical (five-number summary) Max = 93 Q3 = 45 M = 28 Q1 = 19 Rule of thumb Outlier = observation that lies 1.5 x IQR above Q3 or below Q1. Min = 3 IQR = 45 19 = 26 Graphical (boxplot) Q1 1.5*IQR = -20 Q3 + 1.5*IQR = 84

Overview Scale of Measurement Nominal Graphical CT Spread Bar chart Mode --- (Pie chart) Ordinal Boxplot Median Range IQR Interval Histogram Mean - Standard dev. (and higher) (Stem&Leaf display) - Variance

What have you learned today? What are the various ways to represent distributions numerically? What are the various ways to represent distributions graphically? How to describe a distribution How to create and evaluate various numerical and graphical representations of distributions How to determine what numerical and graphical representation is suitable for a variable.

Next week No lecture and workgroups Practice test on Blackboard Enter your own data In two weeks Normal distribution and standard scores Read: Howell: Chapter 3