Descriptive Analysis

Similar documents
Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Descriptive Statistics

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

SPSS t tests (and NP Equivalent)

Numerical Descriptions of Data

David Tenenbaum GEOG 090 UNC-CH Spring 2005

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Fundamentals of Statistics

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Data Distributions and Normality

Chapter 3 Descriptive Statistics: Numerical Measures Part A

3.1 Measures of Central Tendency

Valid Missing Total. N Percent N Percent N Percent , ,0% 0,0% 2 100,0% 1, ,0% 0,0% 2 100,0% 2, ,0% 0,0% 5 100,0%

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

Section3-2: Measures of Center

Lecture Week 4 Inspecting Data: Distributions

Description of Data I

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Simple Descriptive Statistics

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Averages and Variability. Aplia (week 3 Measures of Central Tendency) Measures of central tendency (averages)

CSC Advanced Scientific Programming, Spring Descriptive Statistics

PSYCHOLOGICAL STATISTICS

Descriptive Statistics

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Copyright 2005 Pearson Education, Inc. Slide 6-1

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

Measure of Variation

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Computing Statistics ID1050 Quantitative & Qualitative Reasoning

Basic Procedure for Histograms

HASIL PENELITIAN BERUPA OUTPUT SPSS

1 Describing Distributions with numbers

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

DATA SUMMARIZATION AND VISUALIZATION

2 Exploring Univariate Data

Statistics vs. statistics

Lecture 2 Describing Data

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Measures of Central Tendency: Ungrouped Data. Mode. Median. Mode -- Example. Median: Example with an Odd Number of Terms

Statistics I Chapter 2: Analysis of univariate data

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Empirical Rule (P148)

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Numerical Measurements

Data screening, transformations: MRC05

LAMPIRAN 1: OUTPUT SPSS

Chapter 4 Variability

appstats5.notebook September 07, 2016 Chapter 5

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

Establishing a framework for statistical analysis via the Generalized Linear Model

Frequency Distribution and Summary Statistics

LAMPIRAN IV PENGUJIAN HIPOTESIS

Lecture 1: Review and Exploratory Data Analysis (EDA)

Topic 8: Model Diagnostics

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

Evaluating the Characteristics of Data CHARACTERISTICS OF LEVELS OF MEASUREMENT

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Introduction to Statistical Data Analysis II

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

STAT 113 Variability

Normal Probability Distributions

Summary of Statistical Analysis Tools EDAD 5630

CHAPTER 2 Describing Data: Numerical

Random Variables and Probability Distributions

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Describing Data: One Quantitative Variable

Two-Sample T-Test for Superiority by a Margin

Exploratory Data Analysis (EDA)

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Putting Things Together Part 2

STATS DOESN T SUCK! ~ CHAPTER 4

Numerical summary of data

Measures of Central tendency

Two-Sample T-Test for Non-Inferiority

Some estimates of the height of the podium

VARIABILITY: Range Variance Standard Deviation

Monte Carlo Simulation (Random Number Generation)

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Exploring Data and Graphics

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Terms & Characteristics

Numerical Descriptive Measures. Measures of Center: Mean and Median

The normal distribution is a theoretical model derived mathematically and not empirically.

Getting to know a data-set (how to approach data) Overview: Descriptives & Graphing

How To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion

Transcription:

Descriptive Analysis HERTANTO WAHYU SUBAGIO

Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable : distribution central tendency dispersion / variability

Distribution ib ti

Distribution. is a summary of the frequency of individual values or ranges of values for a variable. Frequency distribution table. Frequency distribution bar chart.

Histogram of normal distributed data

Normal or bell-shaped approximately 69% of the scores in the sample fall within one standard deviation of the mean approximately 95% of the scores in the sample fall within two standard deviations of the mean approximately 99% of the scores in the sample fall within three standard deviations of the mean

Skewness of a distribution Negatively skewed (skew to the left) Positively skewed (skew to the right)

Central tendency

Central Tendency is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency : mean median mode

Mean Mean or average is probably the most commonly used method of describing central tendency. Add up all the values and divide by the number of values. For example, consider the test score values: 15, 20, 21, 20, 36, 15, 25, 15 The sum of these 8 values is 167, so the mean is 167/8 = 20.875.

Median Median is the score found at the exact middle of the set of values. If we order the 8 scores shown above, we would get: 15, 15, 15, 20, 20, 21, 25, 36 There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20, the median is 20. If th t iddl h d diff t l ld If the two middle scores had different values, you would have to interpolate to determine the median.

Mode is the most frequently occurring value in the set of scores. In our example, the value 15 occurs three times and is the mode. In some distributions there is more than one modal value. For instance, in a bimodal distribution there are two values that occur most frequently. 15, 15, 15, 20, 20, 21, 25, 36

Positively i skewed ddistributionib i mode median mean

Negatively skewed ddistributionib i mean mode median

Variability

Variability Range Variance Measure of dispersion Population Variance Sample Variance Symbols will be defined in class N 2 (xi µ ) σ = i= 1 N s n i= 1( xi x) = n 1 2 2 2 parameter Statistic (µ is unknown) Standard Deviation : square root of the variance Coefficient of Variation (CV) : SD / mean

Range the highest value minus the lowest value. 15, 20, 21, 20, 36, 15, 25, 15 The high value is 36 and the low is 15, so the range is 36-15 = 21.

Standard Deviation

Computing SD 15,20,21,20,36,15,25,15 to compute the standard deviation, we first find the distance between each value and the mean (20.875). So, the differences from the mean are: 15-20.875 = -5.875 20-20.875 = -0.875 21-20.875 = +0.125 20-20.875 = -0.875 36-20.875 = 15.125 15-20.875 = -5.875 25-20.875 = +4.125 15-20.875 = -5.875 Notice that values that are below the mean have negative discrepancies and values above it have positive ones.

Next, we square each discrepancy: -5.875 * -5.875 = 34.515625-0.875 * -0.875 = 0.765625 +0.125 * +0.125 = 0.015625-0.875 * -0.875 = 0.765625 15.125125 * 15.125125 = 228.765625-5.875 * -5.875 = 34.515625 +4.125 * +4.125 = 17.015625-5.875 * -5.875 = 34.515625 Now, we take these "squares" and sum them to get the Sum of Squares (SS) value. Here, the sum is 350.875. Next, we divide this sum by the number of scores minus 1. Here, the result is 350.875 / 7 = 50.125. This value is known as the variance. To get the standard deviation, we take the square root of the variance. This would be SQRT(50.125) = 7.079901129253

Coefficient of variation The coefficient of variation of a distribution is the ratio of standard deviation to the mean Useful for comparing spread (variability) of distribution Sample Population coefficient of coefficient of variation : variation : cv = s x σ CV = µ

To Obtain Frequencies and Statisticsti ti From the menus choose: Analyze Descriptive Statistics Frequencies... Select one or more categorical or quantitative variables. Optionally, you can: Click Statistics for descriptive statistics for quantitative variables. Click Charts for bar charts, pie charts, and histograms. Click Format for the order in which results are displayed.

Descriptive Statistics SPSS PC. 10.1 1 Statistics NSEM N Valid 235 Missing 2 Mean Median Mode Std. Deviation 51.6915 51.8333 50.83 a 5.1982 Variance Skewness Std. Error of Skewness 27.0209 -.152.159 Kurtosis Std. Error of Kurtosis Range.012.316 30.33 Percentiles 10 45.1000 90 58.6667 a. Multiple modes exist. The smallest value is shown

Measure of shape Coefficient of Skewness A measure of symmetry. a symmetric distribution has a coefficient of skewness=0 Coefficient of Kurtosis A measure of the peakednes of a distribution the normal distribution=0. ib ti

Normal distribution index of kurtosis and index of skewness (between 2 and + 2 : normal) normal Q-Q plot and detrended normal Q-Q plot Kolmogorov-Smirnov test / Shapiro Wilks : p > 0.05 : normal distributed

Normal Q-Q plot of normal ldistributed ib t ddata Normal Q-Q Q Plot of NSEM 3 2 1 0 Expecte ed Normal -1-2 -3 30 40 50 60 70 Observed Value

Detrended normal Q-Q plot of normal ldistributed ib t ddata Detrended Normal Q-Q Plot of NSEM.1 0.0 -.1 Dev from No ormal -.2 -.3 30 40 50 60 70 Observed Value

Normality Test Kolmogorov-Smirnov NSEM Tests of Normality Kolmogorov-Smirnov a Statistic df Sig..035 235.200* *. This is a lower bound of the true significanc a. Lilliefors Significance Correction

Testing normality of data From the menus choose: Analyze descriptive statistic explore plot normality plot

Boxplot 160 300 140 120 37 200 38 65 15 59 32 17 100 100 80 60 0 40 20-100 N = 70 Asupan protein N = 70 Pdpt per capita