Exploring Data and Graphics

Similar documents
2 Exploring Univariate Data

DATA SUMMARIZATION AND VISUALIZATION

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Description of Data I

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Some Characteristics of Data

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

MVE051/MSG Lecture 7

Chapter 6 Simple Correlation and

Lecture 2 Describing Data

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Simple Descriptive Statistics

Descriptive Statistics

Basic Procedure for Histograms

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Introduction to Descriptive Statistics

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Some estimates of the height of the podium

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

Describing Data: One Quantitative Variable

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

PSYCHOLOGICAL STATISTICS

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

St. Xavier s College Autonomous Mumbai STATISTICS. F.Y.B.Sc. Syllabus For 1 st Semester Courses in Statistics (June 2015 onwards)

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

St. Xavier s College Autonomous Mumbai F.Y.B.A. Syllabus For 1 st Semester Course in Statistics (June 2017 onwards)

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Getting to know a data-set (how to approach data) Overview: Descriptives & Graphing

Exploratory Data Analysis

appstats5.notebook September 07, 2016 Chapter 5

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

Descriptive Statistics Bios 662

Monte Carlo Simulation (General Simulation Models)

CHAPTER 2 Describing Data: Numerical

Empirical Rule (P148)

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Frequency Distribution and Summary Statistics

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Steps with data (how to approach data)

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Section 6-1 : Numerical Summaries

HIGHER SECONDARY I ST YEAR STATISTICS MODEL QUESTION PAPER

Fundamentals of Statistics

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Summary of Information from Recapitulation Report Submittals (DR-489 series, DR-493, Central Assessment, Agricultural Schedule):

Chapter 4-Describing Data: Displaying and Exploring Data

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

Getting to know data. Play with data get to know it. Image source: Descriptives & Graphing

CSC Advanced Scientific Programming, Spring Descriptive Statistics

1 Describing Distributions with numbers

STAT 113 Variability

Lecture 1: Review and Exploratory Data Analysis (EDA)

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Descriptive Statistics (Devore Chapter One)

IMPORTING & MANAGING FINANCIAL DATA IN PYTHON. Summarize your data with descriptive stats

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Descriptive Statistics

Descriptive Analysis

STAB22 section 1.3 and Chapter 1 exercises

Numerical Descriptions of Data

3.1 Measures of Central Tendency

Measures of Dispersion (Range, standard deviation, standard error) Introduction

GOALS. Describing Data: Displaying and Exploring Data. Dot Plots - Examples. Dot Plots. Dot Plot Minitab Example. Stem-and-Leaf.

Lecture Data Science

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Quantitative Analysis and Empirical Methods

SUMMARY STATISTICS EXAMPLES AND ACTIVITIES

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Monetary Economics Measuring Asset Returns. Gerald P. Dwyer Fall 2015

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă

Introduction to R (2)

Data Distributions and Normality

Monte Carlo Simulation (Random Number Generation)

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Statistics 114 September 29, 2012

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Moments and Measures of Skewness and Kurtosis

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Session 5: Associations

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

Establishing a framework for statistical analysis via the Generalized Linear Model

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

Describing Data: Displaying and Exploring Data

AP Statistics Chapter 6 - Random Variables

Mini-Lecture 3.1 Measures of Central Tendency

M249 Diagnostic Quiz

Lecture Week 4 Inspecting Data: Distributions

Section 2.2 One Quantitative Variable: Shape and Center

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

David Tenenbaum GEOG 090 UNC-CH Spring 2005

4. DESCRIPTIVE STATISTICS

NCSS Statistical Software. Reference Intervals

Transcription:

Exploring Data and Graphics Rick White Department of Statistics, UBC Graduate Pathways to Success Graduate & Postdoctoral Studies November 13, 2013

Outline Summarizing Data Types of Data Visualizing Data Questions

Moments Quantitative measures of the shape of a set of points 1st raw moment: mean (center) 2nd central moment: variance (width) Sqrt of variance is standard deviation 3rd central moment: skewness (lopsidedness) 4th central moment: kurtosis (how fat is the data) Mixed moments are co moments (covariance) Covariance is standardized to correlation Moments are susceptible to outliers

Computing Moments Computing the kth raw moment: Raise each observation to the kth power Sum all the data Divide by th number of observations Computing the kth central moment: Compute the mean of the data Subtract the mean from all observed data Proceed as for a raw moment Standardized kth moment (k > 2) Based on the variance

The different means The arithmetic mean (or simply "mean") of a sample is the sum the sampled values divided by the number of items in the sample. The geometric mean is an average that is useful for sets of positive numbers. Take the nth root of the product of the data points. Or log the data, compute the arithmetic mean, exponentiate result. There are many other types of means.

Quantiles Points that split the ordered data into equal sized groups 2 groups: median (50%) 3 groups: terciles (33%) 4 groups: quartiles (25%) 5 groups: quintiles (20%) 10 groups: deciles (10%) 100 groups: percentiles (1%)

Wikipedia A good source of information Website http://en.wikipedia.org/wiki Moment (mathematics) Mean Variance Quantiles Many ways to estimate

Example Distributions distribution Mean Variance Skewness Kurtosis Normal 3.0 9.0 0.00 0.00 Negative Binomial 3.0 9.0 1.67 4.12 Gamma 3.0 9.0 2.00 6.00 distribution 1% 25% 50% 75% 90% 99% Normal -3.98 0.98 3.00 5.02 6.84 9.98 Negative Binomial 0.00 1.00 2.00 4.00 7.00 13.00 Gamma 0.03 0.86 2.07 4.16 6.91 13.81

Tabulating Data Only useful if data takes on relatively few different values Data can be binned to reduce the number of distinct values. Count the number of items in each bin To express as a percentage: divide the number of items in each bin by the total number of items and multiply by 100.

Types of Data Nominal scale Differentiated by label but no logical order Ordinal scale Differentiate by rank order: data can be sorted Interval scale Numeric data with an arbitrarily defined zero. Ratio scale Numeric data with a meaningful zero

Nominal scale Gender, Ethnicity, Species, Genre Moments are undefined Arithmetic is meaningless (+, -, *, /) Quantiles are undefined Only Logical operation is equality Central tendency: mode (most common value) Generally not of much value Data can be tabulated

Ordinal scale Likert scales; descriptive size Moments are undefined Arithmetic is meaningless (+, -, *, /) Quantiles are defined Logical operations: =, <, >=, etc Central tendency: median, mode Generally not of much value Data can be tabulated

Interval scale Celcius, date of event Moments and Quantiles are defined Central tendency: mean (arithmetic), median Dispersion: standard deviation, inter-quartile range Data usually cannot be tabulated without some manipulation Ratio of observations is meaningless but ratio of differences is interpretable

Ratio scale Height, length, duration Moments and Quantiles are defined Central tendency: arithmetic mean, geometric mean, median Dispersion: standard deviation, inter-quartile range, coefficient of variation Data usually cannot be tabulated without some manipulation Ratio of observations is meaningful

Graphics Purpose of a graphic is to communicate information in a clear and effective manner. A graphic is more effective at conveying information than a table of numbers. A graphic need not be complex to be effective. If something is present on a graphic it should serve a purpose.

One Variable Categorical variables (ordinal or nominal) Pie Charts (not recommended) The eye is good at judging linear measures and bad at judging relative areas. Barcharts or dotplots Numeric variables (interval or ratio) Barchart with error bars (not recommended) Too much information is lost Boxplots, histograms, density plots

Pie Charts: Are any the same?

Barcharts: Are any the same?

Barchart with errors Barchart for numeric data Bar height represents the mean. Whisker represents the SD Is the data the same?

Box and Whiskers plot Boxplots. Center line: Median Box: Q1 to Q3 Whiskers: Max & Min Each has same mean and SD but is the data the same?

Histogram and Density Plots

Two variables Both categorical Stacked barchart (not recommended) Side by side barchart One categorical, one numeric Side by side boxplots Histogram or Density plot within each category Both numeric scatterplot

Stacked Barchart Compare the relative size of each colour category within each bar and between the different bars. Are the different sizes obvious?

Side by side barchart Can you tell the difference if we make it a side by side bar chart?

Side by side boxplot

Adding more information

Same Scale Histogram/Density

Scatterplots Scatterplots show the type of relationship that exists between two numeric variables. This is an example of a linear relationship.

Other Relationships

Other Relationships

Outlier Detection

Scatterplots with too many points can appear as a big blob of ink Overplotting

Overplotting option

Demonstration in R R is a free software programming language and a software environment for statistical computing and graphics. www.r-project.org RStudio IDE is a powerful and productive user interface for R. It s free and open source, and works great on Windows, Mac, and Linux. www.rstudio.com/ www.stat.ubc.ca/~rickw/rw2013-11-13.html

Questions 12 10 8 6 Column 1 Column 2 Column 3 4 2 0 Row 1 Row 2 Row 3 Row 4