Computing Statistics ID1050 Quantitative & Qualitative Reasoning

Similar documents
Statistics vs. statistics

1 Describing Distributions with numbers

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Descriptive Analysis

Normal Probability Distributions

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

CSC Advanced Scientific Programming, Spring Descriptive Statistics

appstats5.notebook September 07, 2016 Chapter 5

Numerical Descriptive Measures. Measures of Center: Mean and Median

Measure of Variation

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Averages and Variability. Aplia (week 3 Measures of Central Tendency) Measures of central tendency (averages)

Random Variables and Probability Distributions

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Lecture 2 Describing Data

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Lecture Week 4 Inspecting Data: Distributions

2 Exploring Univariate Data

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Discrete Random Variables

Section Introduction to Normal Distributions

3.1 Measures of Central Tendency

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Math 227 Elementary Statistics. Bluman 5 th edition

Measures of Central tendency

STAT 113 Variability

Frequency Distribution and Summary Statistics

DATA SUMMARIZATION AND VISUALIZATION

PSYCHOLOGICAL STATISTICS

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Problem Set 4 Answer Key

Chapter 4 Variability

Lecture 18 Section Mon, Feb 16, 2009

Continuous Distributions

Lecture 18 Section Mon, Sep 29, 2008

Financial Econometrics Jeffrey R. Russell. Midterm 2014 Suggested Solutions. TA: B. B. Deng

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

5.3 Standard Deviation

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Numerical Measurements

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011

Copyright 2005 Pearson Education, Inc. Slide 6-1

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Chapter 6 Part 3 October 21, Bootstrapping

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Basic Procedure for Histograms

Fundamentals of Statistics

Unit 2 Statistics of One Variable

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

2 DESCRIPTIVE STATISTICS

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

Monetary Economics Measuring Asset Returns. Gerald P. Dwyer Fall 2015

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

2.1 Properties of PDFs

AP Statistics Chapter 6 - Random Variables

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Descriptive Statistics

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Example: Histogram for US household incomes from 2015 Table:

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Description of Data I

Describing Data: One Quantitative Variable

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Lecture 9. Probability Distributions. Outline. Outline

Lecture 1: Review and Exploratory Data Analysis (EDA)

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Descriptive Statistics

Financial Econometrics Jeffrey R. Russell Midterm 2014

Numerical Descriptions of Data

Lecture 9. Probability Distributions

The Normal Distribution

Some estimates of the height of the podium

Statistics 511 Additional Materials

Overview. Definitions. Definitions. Graphs. Chapter 4 Probability Distributions. probability distributions

Lesson 12: Describing Distributions: Shape, Center, and Spread

Counting Basics. Venn diagrams

STAB22 section 1.3 and Chapter 1 exercises

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

DATA HANDLING Five-Number Summary

Web Extension: Continuous Distributions and Estimating Beta with a Calculator

Financial Econometrics

Edexcel past paper questions

Chapter 7. Sampling Distributions

Ti 83/84. Descriptive Statistics for a List of Numbers

Chapter 7. Random Variables

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Statistics for Business and Economics: Random Variables:Continuous

Chapter 5-Measures of Variability

Transcription:

Computing Statistics ID1050 Quantitative & Qualitative Reasoning

Single-variable Statistics We will be considering six statistics of a data set Three measures of the middle Mean, median, and mode Two measures of spread Variance and standard deviation One measure of symmetry Skewness We can compute these values for either discrete or continuous data.

Mean or Average The mean is defined as the sum of the data divided by the number of data The variable often used is m, the Greek mu, or x. ҧ Often m is associated with a population andഥx is associated with a sample. Symbolically, x ҧ = σ x, where σ x = x 1 + x 2 + + x n, and n is the number of data values. n (The capital letter sigma,s,represents summation.) Example: Data is (1, 2, 3, 4, 5). The sum is 1+2+3+4+5=15. There are 5 data values, so the average is 15/5=3. Many calculators have a statistics mode. The way the manufacturer chooses to implement statistical calculation varies widely. There are tutorials for this course s standard calculator, the TI-30Xa, for entering data and computing statistics. If you have a different brand or model, consult your calculator s user s manual or website for details how to work with statistics.

Median The median is the middle number when the data is listed in order. If there is an even number of data points, the median is the average of the two middle values. Example: Data is (1,2,3,4,5). The median is 3 Example: Data is (1,2,3,4,5,6). The median is (3+4)/2=3.5 Why is this quantity useful? The median ignores outlying values. What if our data had been (1,2,3,4,1000)? The mean is 202, which is not characteristic of any of the actual values. The median is 3, which is more typical of most of the values. The median is helpful when looking for a house to buy. The median house price is the typical price you d pay, even though the millionaire s house at the corner of the block raises the mean of the house prices above the value most people paid for theirs.

Mode The mode represents the most populated class, or the group with the most members. This is yet another reasonable way of finding the middle of the data. Determining the mode is different for discrete data than it is for continuous data. For discrete data, the mode is simply the number that appears the most times. Data is (1, 1, 2, 3, 4, 4, 5, 5, 5). The mode is 5. For continuous data, the mode is the center of the range of the class that has the most members in it. Data is (1.1, 1.2, 1.3, 1.8, 2.0, 2.6, 3.1, 4.6, 4.8, 5.1). The class from 1-2 has the most members. The center of this range is 1.5, so the mode is 1.5. (Note: 1.5 does not even appear in the data.) In both cases, the mode can be quickly determined from the graph. The mode is the x-value that is at the center of the tallest bar in either the bar graph (discrete data) or histogram (continuous data). Data can have two modes (bi-modal), but if there are more, we usually say it is amodal (no distinct mode). 4 3 0 12 1 2 3 4 5

ҧҧ ҧ Variance Variance (var. or s 2 or s 2 ) is a measure of the spread of data about the average. We don t care which direction the difference is, so we will be ignoring the sign of the difference. In words, the variance is the sum of the squares of the differences divided by one less than the number of data values. The equation is var. = σ(x x)2 ҧ n 1 Example: Data is (1, 2, 3, 4, 5) and mean ( x) ҧ is 3. Variance is 10/(5-1)=2.5 If you are using a calculator, it is most likely that the calculator will compute the standard deviation (s) instead. To get the variance from the standard deviation, simply find the square of the standard deviation: var = σ 2 x x x x ҧҧ ҧ (x x) ҧҧ ҧ 2 1 3-2 4 2 3-1 1 3 3 0 0 4 3 1 1 5 3 2 4 10

Standard Deviation Standard deviation (std. dev. or s or s) is a measure of the spread of data about the average. We don t care which direction the difference is, so we will be ignoring the sign of the difference. In words, the standard deviation is the square root of (the sum of the squares of the differences divided by one less than the number of data values). The equation is std. dev. = σ(x x)2 ҧ n 1 = var. Example (from previous slide): Data is (1, 2, 3, 4, 5), mean ( ҧ x) is 3, and we previously found that the variance is var. =2.5 Since the standard deviation is the square root of variance, Standard deviation is σ = 2.5 = 1.58 If you are using a calculator, it is most likely that the calculator will compute the standard deviation (s) as part of its normal statistical function. There is a tutorial for using this course s standard calculator, the TI-30Xa, to calculate standard deviation. Question: Since standard deviation and variance differ by one keystroke, why do we need both? The units of standard deviation are the same as the data. Variance has other direct uses (e.g. Analysis of Variance) and is also more easily computed.

Skewness The distribution of a set of data may have symmetry about the mean, or it may have a longer tail to one side or the other. Imagine draping a sheet over the graph of the data. The side of the sheet that is least steep is the side that has the longer tail. If the tail points to the right (toward positive x values), the skewness will be a positive number. If the tail points to the left, skewness will be negative. Zero skewness indicates symmetric tails to both sides. It is sometimes difficult to estimate from the graph what the skewness will be, but there is a formula for calculating skewness in all cases: Skewness = (mean-mode)/(standard deviation) Data is (1.1, 1.2, 1.3, 1.8, 2.0, 2.6, 3.1, 4.6, 4.8, 5.1). Mean is 2.76 Mode is 1.5 Std. Dev. is 1.56 Skewness = (2.76 1.5) 1.56 = 0.81 (tail to the right)

Example: Discrete Data Data: 1, 1, 2, 3, 3, 4, 4, 4, 5 N: 9 Graph: Mean: 3 Median: 3 Mode: 4 Variance: 2 Standard Deviation: 1.41 Skewness: -0.71 x 4 3 2 1 0 1 2 3 4 5 xҧ x x ҧ (x x) ҧ 2 1 3-2 4 1 3-2 4 2 3-1 1 3 3 0 0 3 3 0 0 4 3 1 1 4 3 1 1 4 3 1 1 5 3 2 4 27 16

Example: Continuous Data Data: 1.5, 1.7, 2.4, 2.5, 2.7, 3.5, 3.8, 4.7, 5.1, 5.1 N: 10 Graph: Mean: 3.3 Median: 3.1 Mode: 2.5 Variance: 1.81 Standard Deviation: 1.35 Skewness: 0.6 x xҧ x x ҧ (x x) ҧ 2 1.5 3.3-1.8 3.24 1.7 3.3-1.6 2.56 2.4 3.3-0.9 0.81 2.5 3.3-0.8 0.64 2.7 3.3-0.6 0.36 3.5 3.3 0.2 0.04 3.8 3.3 0.5 0.25 4.7 3.3 1.4 1.96 5.1 3.3 1.8 3.24 5.1 3.3 1.8 3.24 33 16.34

Conclusion We can answer a great deal of statistical questions by examining the graph and six standard statistical variables for the data: Bar graph or histogram Measures of the middle Mean (can be done on a calculator) Median (obtained from the sorted list of data) Mode (obtained from the graph) Measures of the spread Variance (calculated using a tabular method) [or the square of the std. dev.] Standard Deviation (obtained from calculator s statistics mode) [or the square root of the variance] Measure of symmetry Skewness (calculated from the above values Mean, Mode, and Std. Dev.)