Parametric Statistics: Exploring Assumptions.

Similar documents
Data Distributions and Normality

Two Way ANOVA in R Solutions

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Lecture 2 Describing Data

Lecture 1: Empirical Properties of Returns

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Lecture Week 4 Inspecting Data: Distributions

Unit2: Probabilityanddistributions. 3. Normal distribution

Numerical Descriptions of Data

An Introduction to R 2.1 Descriptive statistics

Copyright 2005 Pearson Education, Inc. Slide 6-1

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

The Normal Distribution

Section3-2: Measures of Center

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Description of Data I

Math 227 Elementary Statistics. Bluman 5 th edition

Some Characteristics of Data

SPSS t tests (and NP Equivalent)

Simple Descriptive Statistics

Random Effects ANOVA

Putting Things Together Part 2

Descriptive Statistics

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Lectures delivered by Prof.K.K.Achary, YRC

Descriptive Analysis

STAT 113 Variability

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

One sample z-test and t-test

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Some estimates of the height of the podium

Empirical Rule (P148)

Descriptive Statistics

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Terms & Characteristics

Assessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

NCSS Statistical Software. Reference Intervals

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Lecture 1: Review and Exploratory Data Analysis (EDA)

Establishing a framework for statistical analysis via the Generalized Linear Model

Normal Probability Distributions

JZ Assignment Page 1 of 5

The Normal Distribution

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Valid Missing Total. N Percent N Percent N Percent , ,0% 0,0% 2 100,0% 1, ,0% 0,0% 2 100,0% 2, ,0% 0,0% 5 100,0%

Two-Sample T-Test for Non-Inferiority

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

Lecture Data Science

DESCRIPTIVE STATISTICS II. Sorana D. Bolboacă

DATA SUMMARIZATION AND VISUALIZATION

Descriptive Statistics Bios 662

Mini-Lecture 3.1 Measures of Central Tendency

Sampling Distributions Homework Answers

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Power of t-test for Simple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

Lecture 6: Non Normal Distributions

Frequency Distribution and Summary Statistics

LAST SECTION!!! 1 / 36

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

Section 6-1 : Numerical Summaries

Announcements. Unit 2: Probability and distributions Lecture 3: Normal distribution. Normal distribution. Heights of males

Two-Sample T-Test for Superiority by a Margin

Model Construction & Forecast Based Portfolio Allocation:

Chapter Seven. The Normal Distribution

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

Chapter 4. The Normal Distribution

appstats5.notebook September 07, 2016 Chapter 5

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

Influence of Personal Factors on Health Insurance Purchase Decision

STAT 157 HW1 Solutions

Basic Procedure for Histograms

Describing Data: One Quantitative Variable

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Data Analysis and Statistical Methods Statistics 651

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Review: Types of Summary Statistics

Probability Distribution Unit Review

The normal distribution is a theoretical model derived mathematically and not empirically.

Honors Statistics. 3. Discuss homework C2# Discuss standard scores and percentiles. Chapter 2 Section Review day 2016s Notes.

SOLUTIONS TO THE LAB 1 ASSIGNMENT

- International Scientific Journal about Simulation Volume: Issue: 2 Pages: ISSN

Study 2: data analysis. Example analysis using R

Lecture 3: Probability Distributions (cont d)

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

SPSS I: Menu Basics Practice Exercises Target Software & Version: SPSS V Last Updated on January 17, 2007 Created by Jennifer Ortman

Financial Time Series and Their Characteristics

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Transcription:

Parametric Statistics: Exploring Assumptions http://www.pelagicos.net/classes_biometry_fa17.htm

Reading - Field: Chapter 5

R Packages Used in This Chapter For this chapter, you will use the following packages: Start Rcmdr install.packages( car ); install.packages( ggplot2 ); install.packages( pastecs ); install.packages( psych ); library(car); library(ggplot2) library(pastecs) library(psych) NOTE: red font indicates RCmdr dependencies

Exploring Assumptions Assumptions of parametric tests based on the normal distribution Aim of this chapter: Quantify the assumption of normality o Graphical displays o Skew o Kurtosis o Normality tests Quantify the homogeneity of variances (when dealing with 2 or more samples): Levene s test

Assessing Normality We do not have access to sample the entire biological population, so we test observed data 1) Central Limit Theorem If N < 25, sampling distribution rarely normal 2) Graphical Displays Histogram Q-Q plot 3) Skewness / Kurtosis (point estimate +/- SE) Do they overlap with 0? (normal distribution)

Assessing Normality 4) Performing Statistical Tests Shapiro Wilk Test Tests if data differ from a normal distribution Significant = non-normal data Non-Significant = Normal data Levene s Test (comparing 2 or more samples) - Tests if the data distributions have equal variances Significant = different variances Non-Significant = equal variances

Assessing Normality - Graphically Characteristics of Normal Distributions Unimodal, Symmetrical, Bell-shaped

Assessing Normality - Graphically Comparing observations against a cumulative normal distribution (same mean and S.D.)

sample sample Assessing Normality - Graphically 3.5 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0-3 -2-1 0 1 2 3 theoretical -2-1 0 1 2 theoretical A percentile is the proportion of cases (observations) that fall below a certain value. Each observed percentile compared to the percentile that the value would have in a normal distribution.

Example: Festival Data Set Biologist worried about potential health effects of music festivals. Measured hygiene of 810 concert-goers over the three days of a music festival. Hygiene measured using standardized index (from 0 to 4): 0 = you smell terribly 4 = you smell beautifully Import Download Festival Data (DownloadFestival.xlsx) For ease of use, rename the Data Set Festival > Festival <- DownloadFestival

histogram density Explore Data Graphically: RCmdr day1 day2 day3

Graphs in Rcmdr Quantiles Normal Distribution is the Default Identifies Max / Min as Default Graphically compares an observed (empirical) distribution (points) with a chosen theoretical expectation (line) Identify Points: Automatic or Interactively

Graphs in Rcmdr Quantiles day1 The solid red line is the expected pattern a normal distribution with the same mean and SD and the sampled data. Points outside of the dashed line envelope suggest significant deviations

Graphs in Rcmdr Quantiles day 2 day 3 Note: The straight line represents the expected pattern for a normal distribution

Explore Festival Data Set We can also explore the summary statistics describing the three datasets (day1, day2, day3) using RCmdr: > numsummary(festival[,c("day1", "day2", "day3"), drop=false], statistics=c("mean", "sd", "IQR", "quantiles", "skewness", "kurtosis"), quantiles=c(0,.25,.5,.75,1), type="2")

Explore Festival Data Set We can also explore the summary statistics describing the three datasets (day1, day2, day3) using RCmdr: NOTE: multiple datasets can be analyzed at once What statistics would you use to assess data normality?

Explore Festival Data Set Exploring the summary statistics describing the three datasets (day1, day2, day3) using RCmdr: > numsummary(festival[,c("day1", "day2", "day3"), drop=false], statistics=c("mean", "quantiles", "skewness", "kurtosis"), quantiles=c(.5), type="2") mean skewness kurtosis 50% n NA day1 1.7933580 8.865312 170.4502658 1.79 810 0 day2 0.9609091 1.095226 0.8222057 0.79 264 546 day3 0.9765041 1.032868 0.7315003 0.76 123 687

Further Explore Festival Data Set Exploring additional datasets using other functions: describe() function in psych package > describe(festival$day1) vars n mean sd median skew kurtosis 1 810 1.79 0.94 1.79 8.83 168.97 trimmed mad min max range se 1.77 0.7 0.02 20.02 20 0.03

Further Explore Festival Data Set Exploring additional datasets using other functions: stat.desc() function in psych package > stat.desc(festival$day1, basic = FALSE, norm = TRUE) basic argument: Basic statistics included if TRUE (Note: FALSE is the default) norm argument: Statistics relating to normal distribution included if TRUE (Note: FALSE is the default)

Further Explore Festival Data Set > stat.desc(festival$day1, basic = FALSE, norm = TRUE) median 1.790000e+00 SE.mean 3.318617e-02 var 8.920705e-01 mean 1.793358e+00 C.I.mean.0.95 6.514115e-02 std.dev 9.444949e-01 coef.var 5.266627e-01

Further Explore Festival Data Set > stat.desc(festival$day1, basic = FALSE, norm = TRUE) skewness skew.2se 8.832504e+00 5.140707e+01 kurtosis kurt.2se 1.689671e+02 4.923139e+02 skew.2se: Skew divided by 2 SE kurtosis.2se: Kurtosis divided by 2 SE How can we interpret these results? Z= (observed value theoretical value) / (SE of value)

Further Explore Festival Data Set skewness skew.2se 8.832504e+00 5.140707e+01 kurtosis kurt.2se 1.689671e+02 4.923139e+02 skew.2se: Skew divided by 2 SE kurtosis.2se: Kurtosis divided by 2 SE What values are needed to have a significant skew / kurtosis significant? (Different from 0)

Further Explore Festival Data Set skew.2se = 5.14 (observed skew) / 2 SE kurtosis.2se = 492 (observed skew) / 2 SE Are skew / kurtosis significant? (Different from 0) YES Rules of thumb to assess significance: skew.2se kurtosis.2se P value ABS > 0.98 < 0.05 ABS > 1 < 0.04 ABS > 1.29 < 0.01 ABS > 1.65 < 0.001

Testing Data Normality > stat.desc(festival$day1, basic = FALSE, norm = TRUE) NOTE: Because norm argument set to TRUE, stat.desc provided normality test normtest.w 6.539142e-01 normtest.p 1.545986e-37 Test Statistic P value Is this distribution different from a normal distribution? How do I know that? YES P < 0.05 NOTE: Null Hypothesis is that data are normal

Testing Data Normality > shapiro.test(festival$day1) Shapiro-Wilk normality test data: Festival$day1 W = 0.65391, p-value < 2.2e-16 Is this distribution different from a normal distribution? How do I know that? YES P < 0.05 NOTE: Null Hypothesis is that data are normal

Testing Data Normality Shapiro-Wilk normality test data: Festival$day2 W = 0.90832, p-value = 1.282e-11 Shapiro-Wilk normality test data: Festival$day3 W = 0.90775, p-value = 0.0000003804 Is day2 different from a normal distribution? How do I know that? YES (P < 0.05) Is day3 different from a normal distribution? How do I know that? YES (P < 0.05)

histogram density Graphical Data Exploration: RCmdr day2 day3 Diagnostics: Lack of Symmetry Long tails Mean > Median Positive Skew Positive kurtosis

Summary Statistics & Quantiles mean skewness kurtosis 50% n day2 0.9609091 1.095226 0.8222057 0.79 264 day3 0.9765041 1.032868 0.7315003 0.76 123 day 2 day 3

Rule of Thumb (Z scores) skewness2.se kurtosis.2se Day2 3.612 1.265 Day3 2.309 0.686 Significant Results day 2 day 3

Summary: Normality Indicators of a normal (Gaussian) distribution A. Mean = Median = Mode B. Skewness: measures asymmetry of the distribution. A value of zero indicates symmetry. Symmetry is needed to be a normal distribution. The larger the absolute value the more skewed the distribution. C. Kurtosis: measures the distribution of mass in the distribution. A value of zero indicates a normal distribution. The larger the absolute value the more distorted the distribution.

1. Assess Normality Graphically Note: The straight line represents the expected pattern for a normal distribution

2. Assess Skew / Kurtosis Calculate probability of observed skew / kurtosis, compared to expectation for normal distribution Use rule of thumb : skew.2se kurtosis.2se P value ABS > 0.98 < 0.05 ABS > 1 < 0.04 ABS > 1.29 < 0.01 ABS > 1.65 < 0.001

3. Use Shapiro-Wilk (S-W) Test Specific test developed to test null hypothesis that a given sample (x 1,..., x n ) came from a normally distributed population. Significant = non-normal data Non-Significant = Normal data Shapiro, SS, Wilk, MB. 1965. An analysis of variance test for normality (complete samples). Biometrika 52: 591 611.

Summary Parametric tests based on normal distributions 3 ways of Checking the assumption of normality Graphical displays: Q-Q plots Skew & Kurtosis: Z scores Normality test: S-W Next Lecture: When and how to correct problems in the distribution of the data Data Transformations Pitfalls and alternatives