Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

Similar documents
Chapter 3 Student Lecture Notes 3-1

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

Chapter 3 Descriptive Statistics: Numerical Measures Part B

Physics 4A. Error Analysis or Experimental Uncertainty. Error

OCR Statistics 1 Working with data. Section 2: Measures of location

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

UNIVERSITY OF VICTORIA Midterm June 6, 2018 Solutions

MgtOp 215 Chapter 13 Dr. Ahn

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

Tests for Two Correlations

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

Understanding Annuities. Some Algebraic Terminology.

3: Central Limit Theorem, Systematic Errors

Financial mathematics

Elements of Economic Analysis II Lecture VI: Industry Supply

Capability Analysis. Chapter 255. Introduction. Capability Analysis

Probability Distributions. Statistics and Quantitative Analysis U4320. Probability Distributions(cont.) Probability

Chapter 5 Student Lecture Notes 5-1

Linear Combinations of Random Variables and Sampling (100 points)

Midterm Exam. Use the end of month price data for the S&P 500 index in the table below to answer the following questions.

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8

Random Variables. b 2.

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

Tests for Two Ordered Categorical Variables

4. Greek Letters, Value-at-Risk

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

02_EBA2eSolutionsChapter2.pdf 02_EBA2e Case Soln Chapter2.pdf

The Institute of Chartered Accountants of Sri Lanka

Survey of Math Test #3 Practice Questions Page 1 of 5

/ Computational Genomics. Normalization

PhysicsAndMathsTutor.com

arxiv: v1 [q-fin.pm] 13 Feb 2018

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

Elton, Gruber, Brown and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 4

Notes on experimental uncertainties and their propagation

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis

THIS PAPER SHOULD NOT BE OPENED UNTIL PERMISSION HAS BEEN GIVEN BY THE INVIGILATOR.

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

>1 indicates country i has a comparative advantage in production of j; the greater the index, the stronger the advantage. RCA 1 ij

EDC Introduction

Multifactor Term Structure Models

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

332 Mathematical Induction Solutions for Chapter 14. for every positive integer n. Proof. We will prove this with mathematical induction.

Homework 9: due Monday, 27 October, 2008

Available online: 20 Dec 2011

Evaluating Performance

Parallel Prefix addition

Problems to be discussed at the 5 th seminar Suggested solutions

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Survey of Math: Chapter 22: Consumer Finance Borrowing Page 1

Hewlett Packard 10BII Calculator

iii) pay F P 0,T = S 0 e δt when stock has dividend yield δ.

Analysis of Variance and Design of Experiments-II

Simple Regression Theory II 2010 Samuel L. Baker

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9

2) In the medium-run/long-run, a decrease in the budget deficit will produce:

Cracking VAR with kernels

The Integration of the Israel Labour Force Survey with the National Insurance File

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

Number of women 0.15

Calibration Methods: Regression & Correlation. Calibration Methods: Regression & Correlation

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed.

Lecture Note 2 Time Value of Money

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Dr. Wayne A. Taylor

Finance 402: Problem Set 1 Solutions

Introduction. Why One-Pass Statistics?

Spurious Seasonal Patterns and Excess Smoothness in the BLS Local Area Unemployment Statistics

σ may be counterbalanced by a larger

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

Applications of Myerson s Lemma

A Bootstrap Confidence Limit for Process Capability Indices

OPERATIONS RESEARCH. Game Theory

Jeffrey Ely. October 7, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

SIMPLE FIXED-POINT ITERATION

Midterm Version 2 Solutions

Mathematical Thinking Exam 1 09 October 2017

PRESS RELEASE. The evolution of the Consumer Price Index (CPI) of March 2017 (reference year 2009=100.0) is depicted as follows:

Sampling Distributions of OLS Estimators of β 0 and β 1. Monte Carlo Simulations

Spatial Variations in Covariates on Marriage and Marital Fertility: Geographically Weighted Regression Analyses in Japan

Problem Set 6 Finance 1,

Discounted Cash Flow (DCF) Analysis: What s Wrong With It And How To Fix It

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

Data Mining Linear and Logistic Regression

Principles of Finance

Skewness and kurtosis unbiased by Gaussian uncertainties

CHAPTER 3: BAYESIAN DECISION THEORY

PRESS RELEASE. CONSUMER PRICE INDEX: December 2016, annual inflation 0.0% HELLENIC REPUBLIC HELLENIC STATISTICAL AUTHORITY Piraeus, 11 January 2017

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Using Conditional Heteroskedastic

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 12

Ch Rival Pure private goods (most retail goods) Non-Rival Impure public goods (internet service)

Appendix for Solving Asset Pricing Models when the Price-Dividend Function is Analytic

2.1 Rademacher Calculus... 3

Standardization. Stan Becker, PhD Bloomberg School of Public Health

Risk and Return: The Security Markets Line

MULTIPLE CURVE CONSTRUCTION

Facility Location Problem. Learning objectives. Antti Salonen Farzaneh Ahmadzadeh

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session STS041) p The Max-CUSUM Chart

Transcription:

Part 4 Measures of Spread IQR and Devaton In Part we learned how the three measures of center offer dfferent ways of provdng us wth a sngle representatve value for a data set. However, consder the followng stuaton: Example : Below are the exam scores for two exams: Exam X: 68, 64, 7, 66, 7, 76, 7, 74, 7 Exam Y: 5, 4, 7, 4, 7,, 7,, 9 For exam X, calculate the mean, medan and mode. For exam Y, calculate the mean, medan and mode. From Example we see that we need some way of summarzng a data set other than fndng ts center. To that end... The range of a data set s the quantty (greatest value) (least value). Example : Calculate the range for each data set n Example. For exam X, calculate the range, mdrange For exam Y, calculate the range, mdrange Note that the range only consders two values and tells us essentally nothng about the spread of the ntermedate values. We could use the md-range as prevously defned; but, ths would not tell us anythng about how the data s dstrbuted snce the mdrange s greatly affected by only one data value. Moreover, the range s drectly and drastcally affected by what are called outlers (nformally, a small number of values whch dffer greatly from the majorty of values). So, although the range s a smple calculaton of spread to perform, for these reasons t s often not a practcal measure. One way to mantan much of the smplcty of the range yet avod the effects of outlers s to calculate the nter-quartle range or IQR, whch s defned as Q Q. Note that ths gves the range of the mddle 5% of the data. Example : Calculate the IQR for each data set n Example. For exam X, calculate Q -- Q = IQR For exam Y, calculate Q -- Q = IQR

The IQR also allows us to gve a formal defnton of an outler: any value whch s more than (.5) IQR away from the nearest quartle. Usng ths defnton, box-and-whsker dagrams are usually draw somewhat dfferently for samples that contan outlers. In ths case, the whskers extend to the furthest values whch are not outlers, and outlers are then ndcated wth a dot. (Sometmes, so-called extreme outlers, values whch are more than () IQR away from the nearest quartle, are ndcated wth an astersk.) Because outlers can cause major dffcultes n makng statstcal nferences, we wll pay specal attenton to them later n the course. Example 4: Redraw the box-and-whsker dagram from Example 6 n Part, ths tme accountng for outlers. 5, 6,,, 6, 7,,,,,, 4, 5, 8, 8, 4, 4, 66 Although the IQR s unaffected by outlers, t stll gnores many values the way the range does. For the remanng measures of spread, our approach s as follows: Suppose the values x, x,..., xn have mean X. How good of a representatve value s X? That s, how much do the other values dffer from t, on average? Note: Lterally, we mght try calculatng the quantty. However, ths quantty wll always come out to zero, as you can check. Instead, we mght try dstances rather than dfferences: Suppose the values x, x,..., xn have mean X. We defne the mean devaton as Example 5: Calculate the mean devaton for each set n Example. Set X x - X Set Y y - Y 68 5 64 4 66 4 76 74 7 9 A (perhaps non-obvous) drawback of the mean devaton s that absolute values don t work very well wth the tools of calculus, whch are very necessary for much of statstcal analyss. But, another way to make the dfferences (x X ) non-negatve s to square them. Snce ths has the sde effects of yeldng a quantty that s both large and n square unts, we take the square root after fndng the mean.

And fnally, t wll now matter whether we re dealng wth a sample or a populaton. For a gven sample of sze n, say a sample x, x,..., xn wth sample mean x, the sample standard devaton s s n x X n. The sample varance s s n x X n Gven a populaton x, x,..., xn wth populaton mean µ, the populaton standard devaton s N x N. The populaton varance s N x N A natural queston regardng the defnton of s s: Why n? Recall that the purpose of a statstc s to estmate t s correspondng parameter. So, the defnton s chosen freely (and reasonably) for σ, but once ths s done, we must choose a defnton for s so as to make t a good estmator for σ. Although t seems strange, ths necesstates the term n n t s denomnator and not n. Note also that s s not equal to the mean devaton. However, these values wll be reasonably close to each other, and ths gves us a way to determne approxmately what the values of s should be for a gven data set. Example 6: Calculate the sample standard devaton and sample varance for each set n Example. Set X (x - X ) Set Y (y -Y ) 68 5 64 4 66 4 76 74 7 9 So, whch measure of spread should we use? The answer s analogous to decdng between measures of center. s s more useful mathematcally, so we ll usually use t. However, f there are outlers they can strongly affect s, and so the IQR should probably be used nstead. As before, some statstcans advocate ths even for skewed dstrbutons.

As wth the mean, we ll consder a few propertes of the standard devaton. Suppose x, x,..., xn have sample standard devaton s. () For any constant k, the values x + k, x + k,..., xn + k have sample standard devaton s. () For any constant k, the values kx, kx,..., kxn have sample standard devaton k s. (Note: The same propertes hold true for the populaton standard devaton σ.) Example 7: Suppose the Exam scores for ths class have a standard devaton of ponts. Fnd the resultng standard devaton and varance f I (a)... gve everyone an addtonal 5 ponts. (b)... multply everyone s score by /. Usng some algebra, we can rearrange the formula for s ( ) ( ).The trade-off here n x x nn ( ) s that we don t have to calculate x or the values x x, but the values x are often much larger than the values (x x). As we dd n Part, for a frequency dstrbuton outcome x x... xn where k n f frequency f f... fn s the total number of observatons, we can deduce s n f x X n = n f ( x ) ( f x ) nn ( ) Recall, for grouped data, we approxmate the values n each class usng the class mark (mdpont). 4

Example 8: Below s a frequency dstrbuton for a survey of U.S. households determnng the number of people per household. Use ths to compute the sample standard devaton for the number of people per household. Number of people Number of households 7 9 4 5 5 4 6 7 Example 9: The scores of a quz are recorded n the dstrbuton below. Use ths to estmate the sample standard devaton for the quz scores. Score Frequency [8,) [,) [,4) 8 [4,6) [6,8) 7 [8,) 4 What s mportant to understand about the standard devaton or the average devaton? We should understand that the standard devaton s a measure of how tghtly packed the data s about the mean. In other words, the smaller the standard devaton (see Cvar), the closer the data s to the mean, or average. The data below wll help us vsually grasp how ths works. There are 5 data sets lsted below, each wth a mean of 5. As the data values begn to congregate about the mean, notce that the standard devaton becomes smaller. You can see the related hstograms dsplayed on the next page. I have used the average devaton here. As an exercse, you can verfy usng the standard devaton. 5

Example 9 4 4 4 9 4 4 4 9 Set 4 4 4 5 mean 6 45 45 4 Dev 6 45 45 6 45 45 9 5 48 4 9 5 48 4 9 Set 55 48 4 5 mean 55 48 45 5 Dev 55 48 45 6 5 45 6 5 45 4 6 5 48 9 5 48 9 5 48 9 Set 5 48 5 mean 5 48 6.5 Dev 5 5 5 5 5 5 5 5 55 5 55 5 55 5 6 5 6 5 6 5 9 5 Set 5 9 55 mean 5 Set 4 9 55 Dev 4. mean 5 55 Dev. 55 6 6 6 6

.5.5.5 Dev 4.5.5.5 Dev 5.5.5 4 45 48 5 5 55 6 9 4 45 48 5 5 55 6 9 4 Dev 6.5 Dev 6.5 8 6 4 Dev. Dev. 4 45 48 5 5 55 6 9 4 45 48 5 5 55 6 9 Dev 4. 7 6 5 4 4 45 48 5 5 55 6 9 Notce what happens to the graph of the dstrbuton as the mean, medan and mode become equal and the standard devaton becomes small the curve takes on a roughly normal shape. Remember, n the example above, we used the average devaton for s here. In practce, we wll use the standard devaton for s. One key to note when computng standard devatons as a measure of how tghtly packed the data s about ts mean s to compute the coeffcent of varance. The coeffcent of varance s the rato of the standard devaton to the mean. We abbrevate the term as cvar: s Coeffcent of varance s c var. X 7

If we use only the value of s to measure varaton n the data we can be msled. For example, f the mean salary of a company s $5K wth the st. dev. $K, then cvar s.4. But, f the mean salary s $5K and the st. dev. s $K, then cvar s.4. In the frst case, the st. dev. s 4% of the mean, very large s comparson; whereas n the second case, the st. dev. s 4% of the mean, very small n comparson. I thnk you would be much happer wth your salary devatng 4% rather than 4%. So, we want to know both the standard devaton and the coeffcent of varance. Usually, cvar wll be farly ntutve, as n the comparson above. Example : Now look at our comparson above usng Set - Set 5, and compute the cvar for each sample set usng the true standard devaton s and not the average devaton. Set : s= : cvar = Set : s= : cvar = Set : s= : cvar = Set 4: s= : cvar = Set 5: s= : cvar = Fnally, as one last note on standard devatons, we ll menton what s known as Chebyshev s Rule. Whle we know that the IQR tells us the range of the mddle 5% of a data set, somethng smlar can be sad about standard devatons. If a dstrbuton s symmetrc and approxmately normal, as n our Set 5 above, then ) approxmately 68% of the data les wthn one standard devaton of the mean, namely, usng nterval notaton, 68% of the data les wthn ( X -s, X +s) or (µ σ,µ + σ) ) ) approxmately 95% of the data les wthn two standard devatons of the mean, namely, n nterval notaton, 95% of the data les wthn ( X -s, X +s) or (µ σ,µ + σ) approxmately 99% of the data les wthn three standard devatons of the mean,.e. wthn ( X -s, X +s) or (µ σ,µ + σ). For a dstrbuton n general that mght not be normal or symmetrc, then Chebyshev s Rule states that the percent of the populaton that s wthn K standard devatons of the mean s. K 8

Accordng to Chebyshev s Rule then, what would be the percentages for parts -)? ) approxmately % of the data les wthn one standard devaton of the mean, namely, usng nterval notaton, % of the data les wthn ( X -s, X +s) or (µ σ,µ + σ) ) ) approxmately % of the data les wthn two standard devatons of the mean, namely, n nterval notaton, % of the data les wthn ( X -s, X +s) or (µ σ,µ + σ) approxmately % of the data les wthn three standard devatons of the mean,.e. wthn ( X -s, X +s) or (µ σ,µ + σ). We have talked about what t means to be normal. A populaton or sample s normally dstrbuted f t satsfes the crtera of symmetry, bell-shaped, and adheres to the percentages of 68, 95 and 99 as stated above. You may ask What do I normally have for breakfast? Well, I usually eat frut for breakfast. What does usually mean? I eat frut 6 out of 7 days a week? Wa Ch and Allen were usually late to class last semester. How often s usually? In statstcs, we defne an event as usual f t falls wth standard devatons of the mean. Well, that means that, for a normally dstrbuted populaton, 95% of the tme an event wll occur that s consdered usual and 5% of the tme an event wll occur that s consdered unusual. Example : Suppose that from a large sample the mean age of a female at the tme of dvorce from her partner s 7 wth a standard devaton of 8 years. Is a woman who s dvorced at age 6 consdered unusual? The queston here s as follows: Is 6 wthn standard devaton of the mean? Well, we then create the nterval (7--*8, 7 + *8) = (, 5). Snce 6 n not wthn the nterval from to 5, we would consder 6 to be an unusual age for a woman to dvorce. Example : Suppose exam scores are normally dstrbuted wth a mean of and a standard devaton of 5 ponts. If I wanted 84% of the class to pass the exam, what would be the lowest passng score? To answer ths queston, let us start by drawng a general normal curve, labellng the mean and the regons accordng to ther percentages. 9