CSC Advanced Scientific Programming, Spring Descriptive Statistics

Similar documents
Statistics vs. statistics

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

DATA SUMMARIZATION AND VISUALIZATION

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Fundamentals of Statistics

Chapter 4 Variability

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Measure of Variation

3.1 Measures of Central Tendency

Numerical Descriptions of Data

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Numerical Descriptive Measures. Measures of Center: Mean and Median

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Lecture 2 Describing Data

Descriptive Statistics (Devore Chapter One)

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

2 DESCRIPTIVE STATISTICS

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

Frequency Distribution and Summary Statistics

Basic Procedure for Histograms

Lecture Week 4 Inspecting Data: Distributions

Lecture 18 Section Mon, Feb 16, 2009

Descriptive Analysis

Descriptive Statistics

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Unit 2 Statistics of One Variable

Monte Carlo Simulation (Random Number Generation)

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

2 Exploring Univariate Data

CHAPTER 2 Describing Data: Numerical

Simple Descriptive Statistics

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

SUMMARY STATISTICS EXAMPLES AND ACTIVITIES

DESCRIPTIVE STATISTICS

Some Characteristics of Data

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Discrete Random Variables

Refer to Ex 3-18 on page Record the info for Brand A in a column. Allow 3 adjacent other columns to be added. Do the same for Brand B.

appstats5.notebook September 07, 2016 Chapter 5

Averages and Variability. Aplia (week 3 Measures of Central Tendency) Measures of central tendency (averages)

Lecture 18 Section Mon, Sep 29, 2008

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Computing Statistics ID1050 Quantitative & Qualitative Reasoning

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011

Quantitative Analysis and Empirical Methods

Description of Data I

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Lecture 1: Review and Exploratory Data Analysis (EDA)

Exploring Data and Graphics

Copyright 2005 Pearson Education, Inc. Slide 6-1

22.2 Shape, Center, and Spread

Introduction to Descriptive Statistics

Measures of Central tendency

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Data Distributions and Normality

(And getting familiar with R) Jan. 8th, School of Information, University of Michigan. SI 544 Descriptive Statistics

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Data Analysis. BCF106 Fundamentals of Cost Analysis

Health Information Technology and Management

Math146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39

Chapter 5 Normal Probability Distributions

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Skewness and the Mean, Median, and Mode *

Review of the Topics for Midterm I

Ti 83/84. Descriptive Statistics for a List of Numbers

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Random Variables and Probability Distributions

Confidence Intervals. σ unknown, small samples The t-statistic /22

Elementary Statistics Blue Book. The Normal Curve

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

CHAPTER 6 Random Variables

Lecture Data Science

AP Statistics Chapter 6 - Random Variables

( ) P = = =

Statistics Class 15 3/21/2012

Example: Histogram for US household incomes from 2015 Table:

Descriptive Statistics

2011 Pearson Education, Inc

STATISTICS KEY POINTS

DATA HANDLING Five-Number Summary

EDO UNIVERSITY, IYAMHO EDO STATE, NIGERIA

Midterm Exam III Review

Confidence Intervals and Sample Size

1 Describing Distributions with numbers

PSYCHOLOGICAL STATISTICS

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

Continuous Probability Distributions & Normal Distribution

Transcription:

CSC 223 - Advanced Scientific Programming, Spring 2018 Descriptive Statistics

Overview Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions. Data consists of information coming from observations, counts, measurements, or responses. A population is the collection of all outcomes, responses, measurements, or counts that are of interest. A sample is a subset of the population. A parameter is a numerical description of a population characteristic. A statistic is a numerical description of a sample characteristic.

Branches of Statistics Descriptive statistics is the branch of statistics that involves the organization, summarization, and display of data. Inferential statistics is the branch of statistics that involves using a sample to draw conclusions about a population. A basic tool in the study of inferential statistics is probability.

Data Classification Types of data: Qualitative data consist of attributes, labels, or nonnumerical entries. Quantitative data consist of numerical measurements or counts. Levels of measurement: Nominal: categorized using names, labels, or qualities. Ordinal: can be arranged in order or ranked. Interval: can be ordered and meaningful differences between entries can be calculated. Ratio: similar to interval, but there is a zero entry that is an inherent zero (implies none).

Frequency Distributions A frequency distribution is a table that shows classes or intervals of data entries with a count of the number of entries in each class. The frequency f of a class is the number data entries in the class. Constructing a frequency distribution: 1 Decide on the number of classes. 2 Find the class width: the range of the data divided by the number of classes rounded up to a convenient number. 3 Find the class limits 4 Count up the number of data entries that fall within the class boundaries to determine the frequency f for each class.

Frequency Distribution Example Number of classes: 5 Data set: 7, 39, 13, 9, 25, 8, 22, 0, 2, 18, 2, 30, 7, 35, 12, 15, 8, 6, 5, 29, 0, 11, 39, 16, 15 Range = 39-0 Class width = 39 5 = 7.8 = 8 Frequency distribution: Class f 0-7 8 8-15 8 16-23 3 24-31 3 32-39 3 f = 25

Frequency Distributions The midpoint of a class is the sum of the lower and upper limits of the class divided by two. The midpoint is sometimes called the class mark. The relative frequency of a class is the portion or percentage of the data that falls in that class. To find the relative frequency of a class, divide the frequency f by the sample size n. The cumulative frequency of a class is the sum of the frequency for that class and all previous classes. The cumulative frequency of the last class is equal to the sample size n.

Frequency Distribution Example Number of classes: 5 Data set: 7, 39, 13, 9, 25, 8, 22, 0, 2, 18, 2, 30, 7, 35, 12, 15, 8, 6, 5, 29, 0, 11, 39, 16, 15 Frequency distribution: Class f Midpoint Relative Cumulative 0-7 8 3.5 0.32 8 8-15 8 11.5 0.32 16 16-23 3 19.5 0.12 19 24-31 3 27.5 0.12 22 32-39 3 35.5 0.12 25 f = 25 f n = 1

Measures of Central Tendency The mean of a data set is the sum of the data entries divided by the number of entries. Population mean: x µ = N Sample mean: x x = n The median of a data set is the value that lies in the middle of the data when the data is in sorted order. The mode of a data set is the data entry that occurs with the greatest frequency.

Measures of Central Tendency An outlier is a data entry that is far removed from the other entries in the data set. A weighted mean is the mean of a data set whose entries have varying weights. A weighted mean is given by: x = x w w where w is the weight of each entry x.

Measures of Central Tendency The mean of a frequency distribution for a sample is approximated by x f x = f where f is the frequency and x is the midpoint.

Frequency Distribution Example Number of classes: 5 Data set: 7, 39, 13, 9, 25, 8, 22, 0, 2, 18, 2, 30, 7, 35, 12, 15, 8, 6, 5, 29, 0, 11, 39, 16, 15 Frequency distribution: Class f Midpoint, x x f 0-7 8 3.5 28 8-15 8 11.5 92 16-23 3 19.5 58.5 24-31 3 27.5 82.5 32-39 3 35.5 106.5 f = 25 x f = 367.5 Mean of frequency distribution: 367.5 25 = 14.7

Measures of Variation The range of a data set is the difference between the maximum and minimum data entries in the set. The deviation of an entry x in a population data set is the difference between the entry and the mean µ of the data set. Deviation of x = x µ The population variance of a population data set of N entries is (x µ) Population variance = σ 2 2 = N where the symbol σ is a lowercase Greek letter Sigma.

Measures of Variation The population standard deviation of a population data set of N entries is the square root of the population variance σ = (x µ) σ 2 2 = N

Finding Population Variance and Standard Deviation 1. Find the mean of the population data set. µ = 2. Find the devation of each entry. x µ x N 3. Square each deviation. (x µ) 2 4. Add to get the sum of squares SS x = (x µ) 2 5. Divide by N to get the population variance. σ 2 = 6. Find the square root of the variance to get the population standard deviation. σ = (x µ) 2 N (x µ) 2 N

Measures of Variation The sample variance and sample standard deviation of a sample data set of n entries are Sample variance = s 2 = (x x) 2 n 1 (x x) 2 Sample standard deviation = s = n 1

Measures of Variation Symbols Population Sample Variance σ 2 s 2 Standard deviation σ s Mean µ x Number of entries N n Deviation x µ x x Sum of squares (x µ) 2 (x x) 2

Measures of Variation The standard deviation of a frequency distribution is: (x x) 2 f s = n 1 where n = f.