OCR Statistics 1 Working with data. Section 2: Measures of location

Similar documents
Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

Chapter 3 Student Lecture Notes 3-1

UNIVERSITY OF VICTORIA Midterm June 6, 2018 Solutions

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

Tests for Two Correlations

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

Capability Analysis. Chapter 255. Introduction. Capability Analysis

Survey of Math: Chapter 22: Consumer Finance Borrowing Page 1

Hewlett Packard 10BII Calculator

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Chapter 3 Descriptive Statistics: Numerical Measures Part B

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

MgtOp 215 Chapter 13 Dr. Ahn

Finance 402: Problem Set 1 Solutions

Lecture Note 2 Time Value of Money

Mathematical Thinking Exam 1 09 October 2017

Data Mining Linear and Logistic Regression

Linear Combinations of Random Variables and Sampling (100 points)

The Institute of Chartered Accountants of Sri Lanka

Random Variables. b 2.

Parallel Prefix addition

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed.

Risk and Return: The Security Markets Line

02_EBA2eSolutionsChapter2.pdf 02_EBA2e Case Soln Chapter2.pdf

Evaluating Performance

Simple Regression Theory II 2010 Samuel L. Baker

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

S yi a bx i cx yi a bx i cx 2 i =0. yi a bx i cx 2 i xi =0. yi a bx i cx 2 i x

/ Computational Genomics. Normalization

Financial mathematics

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8

2) In the medium-run/long-run, a decrease in the budget deficit will produce:

Physics 4A. Error Analysis or Experimental Uncertainty. Error

Numerical Analysis ECIV 3306 Chapter 6

Elements of Economic Analysis II Lecture VI: Industry Supply

Homework 1 Answers` Page 1 of 12

SOCIETY OF ACTUARIES FINANCIAL MATHEMATICS. EXAM FM SAMPLE SOLUTIONS Interest Theory

Survey of Math Test #3 Practice Questions Page 1 of 5

7.4. Annuities. Investigate

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

Understanding Annuities. Some Algebraic Terminology.

PhysicsAndMathsTutor.com

Tests for Two Ordered Categorical Variables

Midterm Version 2 Solutions

332 Mathematical Induction Solutions for Chapter 14. for every positive integer n. Proof. We will prove this with mathematical induction.

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Information Flow and Recovering the. Estimating the Moments of. Normality of Asset Returns

Quiz on Deterministic part of course October 22, 2002

ISE High Income Index Methodology

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9

Final Examination MATH NOTE TO PRINTER

Value of L = V L = VL = VU =$48,000,000 (ii) Owning 1% of firm U provides a dollar return of.01 [EBIT(1-T C )] =.01 x 6,000,000 = $60,000.

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

The Integration of the Israel Labour Force Survey with the National Insurance File

Weights in CPI/HICP and in seasonally adjusted series

Calibration Methods: Regression & Correlation. Calibration Methods: Regression & Correlation

A Php 5,000 loan is being repaid in 10 yearly payments. If interest is 8% effective, find the annual payment. 1 ( ) 10) 0.

Fast Laplacian Solvers by Sparsification

Problem Set 6 Finance 1,

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

Creating a zero coupon curve by bootstrapping with cubic splines.

Cyclic Scheduling in a Job shop with Multiple Assembly Firms

YORK UNIVERSITY Faculty of Science Department of Mathematics and Statistics MATH A Test #2 November 03, 2014

Construction Rules for Morningstar Canada Dividend Target 30 Index TM

4. Greek Letters, Value-at-Risk

A Bootstrap Confidence Limit for Process Capability Indices

Facility Location Problem. Learning objectives. Antti Salonen Farzaneh Ahmadzadeh

Scribe: Chris Berlind Date: Feb 1, 2010

Applications of Myerson s Lemma

THIRD MIDTERM EXAM EC26102: MONEY, BANKING AND FINANCIAL MARKETS MARCH 24, 2004

A FRAMEWORK FOR PRIORITY CONTACT OF NON RESPONDENTS

Microeconomics: BSc Year One Extending Choice Theory

MULTIPLE CURVE CONSTRUCTION

An annuity is a series of payments made at equal intervals. There are many practical examples of financial transactions involving annuities, such as

Welfare Aspects in the Realignment of Commercial Framework. between Japan and China

Answers to exercises in Macroeconomics by Nils Gottfries 2013

Chapter 11: Optimal Portfolio Choice and the Capital Asset Pricing Model

A Comparison of Risk Return Relationship in the Portfolio Selection Models

Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator.

EDC Introduction

FM303. CHAPTERS COVERED : CHAPTERS 5, 8 and 9. LEARNER GUIDE : UNITS 1, 2 and 3.1 to 3.3. DUE DATE : 3:00 p.m. 19 MARCH 2013

ISE Cloud Computing Index Methodology

Construction Rules for Morningstar Canada Dividend Target 30 Index TM

Probability Distributions. Statistics and Quantitative Analysis U4320. Probability Distributions(cont.) Probability

Money, Banking, and Financial Markets (Econ 353) Midterm Examination I June 27, Name Univ. Id #

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8

Finite Math - Fall Section Future Value of an Annuity; Sinking Funds

Midterm Exam. Use the end of month price data for the S&P 500 index in the table below to answer the following questions.

Actuarial Science: Financial Mathematics

Analysis of Variance and Design of Experiments-II

Transcription:

OCR Statstcs 1 Workng wth data Secton 2: Measures of locaton Notes and Examples These notes have sub-sectons on: The medan Estmatng the medan from grouped data The mean Estmatng the mean from grouped data Codng data The mode Comparson of measures of locaton The medan When data s arranged n order, the medan s the tem of data n the mddle. However, when there s an even number of data, the mddle one les between two values, and we use the mean of these two values for the medan. For example, ths dataset has 9 tems: 1 1 3 4 6 7 7 9 10 There are 4 data tems below the 5 th and 4 tems above; so the mddle tem s the 5 th, whch s 6. If another tem of data s added to gve 10 tems, the mddle tems are the 5 th and 6 th : 1 1 3 4 6 7 7 9 10 12 so the medan s the mean average of the 5 th and 6 th tems,.e. 6 7 6.5. 2 Example 1 Fnd the medan of the data dsplayed n ths stem and leaf dagram 16 5 5 6 7 8 n = 20 17 0 0 1 3 3 7 8 9 17 3 represents 1.73 18 2 2 2 5 5 8 19 0 Countng from the lowest tem (1.65), the 10th s 1.73 and the 11th s 1.77. The medan s therefore 1.73 1.77 1.75. 2 1 of 10 13/11/13 MEI

When you want to fnd the medan of a data set presented n a frequency table, one useful pont s that the data s already ordered. x f 1 3 2 5 3 2 4 3 5 4 6 3 Total 20 For ths data set, there are 20 data tems, so the medan s the mean of the 10 th and 11 th tems. For ths small set of data, t s easy to see that the 10 th data tem s 3 and the 11 th s 4. The medan s therefore 3.5. However, for a larger set of data t may be more dffcult to dentfy the mddle tem or tems. One way to make ths a lttle easer s to use a cumulatve frequency table. x f Cum. freq. 1 3 3 2 5 8 3 2 10 4 3 13 5 4 17 6 3 20 The thrd column gves the cumulatve frequency. Ths s the total of the frequences so far. You can fnd each cumulatve frequency by addng each frequency to the prevous cumulatve frequency. E.g., for x = 4, the cumulatve frequency s 10 + 3 = 13. The fnal value of the cumulatve frequency (n ths case 20) tells you the total of the frequences. The cumulatve frequences show that the 10 th tem s 32 and the 11 th tem s 4. So the medan s 3.5. Estmatng the medan from grouped data Cumulatve frequency curves are useful for estmatng the medan of a large data set, as shown n the next example. 2 of 10 13/11/13 MEI

Example 2 Estmate the medan of the followng dataset, whch gves the mass of 100 eggs: Mass, m (g) Frequency 40 m < 45 4 45 m < 50 15 50 m < 55 15 55 m < 60 22 60 m < 65 17 65 m < 70 16 70 m < 75 11 75 m < 80 0 Mass, m (g) Frequency Mass Cumulatve frequency m < 40 0 40 m < 45 4 m < 45 4 45 m < 50 15 m < 50 19 50 m < 55 15 m < 55 34 55 m < 60 22 m < 60 56 60 m < 65 17 m < 65 73 65 m < 70 16 m < 70 89 70 m < 75 11 m < 75 100 The cumulatve frequency curve s drawn below: c.f. 100 80 60 40 20 40 50 60 70 mass (kg) Medan = 58 50 of the eggs le below the medan, shown by the red lne. 3 of 10 13/11/13 MEI

The mean When people talk about the average, t s usually the mean they mean! Ths s the sum of the data dvded by the number of tems of data. We can express ths usng mathematcal notaton as follows: x denotes the mean value of x For the data set x 1, x 2, x 3, x 4, x n, x 1 n x n 1 s the Greek letter sgma and stands for the sum of. The whole expresson s sayng: The mean ( x ) s equal to the sum of all the data tems (X for = 1 to n) dvded by the number of data tems (n). Example 3 shows a very smple calculaton set out usng ths formal notaton. Example 3 Fnd the mean of the data set {6, 7, 8, 8, 9}. x 1 = 6, x 2 = 7, x 3 = 8, x 4 = 8, x 5 = 9, n = 5 5 x 1 x1 x2 x3 x4 x5 6 7 8 8 9 x 5 5 5 7.6 When calculatng the mean from a frequency table, you need to be careful to use the correct totals. x f 1 3 2 5 3 2 4 3 5 4 6 3 Total 20 The mean of the data shown n the frequency table above can be wrtten as 111 2 2 2 2 2 3 3 4 4 4 5 5 5 5 6 6 6 69 x 3.45 20 20 An alternatve way of wrtng ths s 31 52 23 3 4 45 36 69 x 3.45 3 5 2 3 4 3 20 Ths can be expressed more formally as Each value of x s multpled by ts frequency, and then the results are added together. 4 of 10 13/11/13 MEI

x 6 1 6 1 fx f The frequences are added to fnd the total number of data tems It s helpful to add another column to the frequency table, for the product fx. x f fx 1 3 3 2 5 10 3 2 6 4 3 12 5 4 20 6 3 18 Total f 20 fx 69 Then you can smply add up the two columns and use the totals to calculate the mean. fx 69 x 3.45 f 20 In general, when the data s gven usng frequences, the formula for the mean s: x n 1 n 1 fx f Estmatng the mean from grouped data When the data s grouped nto classes, you can stll estmate the mean by usng the mdpont of the classes (the md-nterval value). Ths means that you assume that all the values n each class nterval are equally spaced about the md-pont. You can show most of the calculatons n a table, as shown n the followng example. Example 4 Estmate the mean weght for the followng data: 5 of 10 13/11/13 MEI

Weght, w, (kg) Frequency 50 w < 60 3 60 w < 70 5 70 w < 80 7 80 w < 90 3 90 w < 100 2 Total 20 The md-nterval value s the mean of the upper and lower bound of the weght. Weght, w, (kg) Md-nterval Frequency, f fx value, x 50 w < 60 55 3 165 60 w < 70 65 5 325 70 w < 80 75 7 525 80 w < 90 85 3 255 90 w < 100 95 2 190 f 20 fx 1460 fx 1460 x 73 f 20 The mean weght s estmated to be 73 kg. To fnd md-nterval values, you need to thnk carefully about the upper and lower bounds of each nterval. In the example above, t s clear what these bounds are. However, f the ntervals had been expressed as 50 59, 60 69 and so on, then t s clear that the orgnal weghts had been rounded to the nearest klogram, and the ntervals were actually 49.5 w < 59.5, 59.5 w < 69.5, etc. So n that case the md-nterval values would be 54.5, 64.5 and so on. Codng data It s sometmes possble to smplfy the calculaton of the mean by codng the data. You can transform the data usng a lnear codng: y a bx You can undo ths codng: y a x b Snce each data tem has been transformed usng ths codng, the mean of the data undergoes the same transformaton. So the mean of the coded data, y, s related to the mean of the orgnal data, x, by the equaton 6 of 10 13/11/13 MEI

y a bx. For example, the data set {30, 50, 20, 70, 40, 20, 30, 60} could be smplfed by dvdng all the data by 10. x Ths means usng the codng y. 10 whch gves the new data set {3, 5, 2, 7, 4, 2, 3, 6}. You can fnd the mean y of ths new data set. Then, snce x = 10y, you can fnd the mean of the orgnal data usng the equaton x 10y. Alternatvely, the numbers could be made smaller by subtractng 20 before x 20 dvdng by 10. Ths s the codng y 10 whch gves the new data set {1, 3, 0, 5, 2, 0, 1, 4} You can fnd the mean, y of ths new data set. Then, snce x = 10y + 20, you can fnd the mean of the orgnal data usng the equaton x 10y 20. Codng s especally useful when dealng wth grouped data, snce n these cases you are dealng wth md-nterval values whch follow a fxed pattern. For example, f you were dealng wth heghts grouped as 100-109, 110-119 etc., you would be workng wth md-nterval values of 104.5, 114.5, 124.5 etc. x 104.5 By usng the codng y, you would be workng wth y values of 0, 1, 10 2, etc. You mght feel that snce you can use a calculator, then smplfyng the numbers s of lttle value. However, the calculatons nvolved can be qute long-wnded, and t s easy to make a mstake n enterng the numbers. If the numbers are smpler then you are less lkely to make a mstake. In addton, you may be requred n an examnaton queston to show that you understand ths method. Example 5 Use lnear codng to calculate the mean of the followng data: Weght, w, (grams) Frequency, f 0 w < 10 4 10 w < 20 6 20 w < 30 9 30 w < 40 7 40 w < 50 4 The md-nterval values (denoted by x) are 5, 15, 25, etc. A convenent codng s x 5 y 10 7 of 10 13/11/13 MEI

The correspondng y values become 0, 1, 2, x y f fy 5 0 4 0 15 1 6 6 25 2 9 18 35 3 7 21 45 4 4 16 f 30 fy 61 61 y 2.03333 30 x 5 y x10y 5 10 61 x 10y 5 10 5 25.33 30 The mode The mode s the most common or frequent tem of data; n other words the tem wth the hghest frequency. So for the data set {6, 7, 8, 8, 9} the mode s 8 as ths appears twce. There may be more than one mode, f more than one tem has the hghest frequency. Identfyng the mode s easy when data are gven n a frequency table. x f 1 3 2 5 3 2 4 3 5 4 6 3 Total 20 The hghest frequency s for x = 2. So the mode s 2. Comparson of measures of locaton The mean ncludes all the data n the average, and takes account of the numercal value of all the data. So exceptonally large or small tems of data can have a large effect on the mean t s susceptble to outlers. 8 of 10 13/11/13 MEI

The medan s less senstve to hgh and low values (outlers), as t s smply the mddle value n order of sze. If the numercal values of each of the tems of data s relevant to the average, then the mean s a better measure; f not, the use the medan. The mode pcks out the commonest data tem. Ths s only sgnfcant f there are relatvely hgh frequences nvolved. It takes no account at all of the numercal values of the data. Suppose you are negotatng a salary ncrease for employees at a small frm. The salares are currently as follows: 6000, 12000, 14000, 14000, 15000, 15000, 15000, 15000, 16000, 16000, 18000, 18000, 18000, 20000, 100000 The 6000 s a part-tme worker who works only two days a week The 100000 s the managng drector The mean salary s 20800 The medan salary s 15000 The modal salary s also 15000 Whch s the most approprate measure? If you were the managng drector, you mght quote the mean of 20800, but of the current employees she s the only one who earns more than ths amount. If you were the unon representatve, you would quote the medan or the mode ( 15000), as these gve the lowest averages. Ths s certanly more typcal of the majorty of workers. There s no rght answer to the approprate average to take t depends on the purpose to whch t s put. However, t s clear that: The mean takes account of the numercal value of all the data, and s hgher due to the effect of the 100000 salary, whch s an outler. The medan and mode are not affected by the outlers ( 100000 and 6000) 9 of 10 13/11/13 MEI

Example 6 Shance receves the followng marks for her end-of-term exams: Subject Mark (%) Maths 30 Englsh 80 Physcs 45 Chemstry 47 French 47 Hstory 50 Bology 46 Relgous Educaton 55 Calculate the mean, medan and mode. Comment on whch s the most approprate measure of average for ths data. The mean = 30 80 45 47 47 50 46 55 50 8 In numercal order, the results are: 30, 45, 46, 47, 47, 50, 55, 80 The medan s therefore 47. The mode s 47, as there are two of these and only one each of the other marks. The mode s not sutable there s no sgnfcance n gettng two scores of 47. The medan or the mean could be used. The mean s hgher snce t takes more account of the hgh Englsh result. The medan s perhaps the most representatve, and she got 4 scores n the range 45-47; but Shance would no doubt use the mean to make more of her good Englsh result! 10 of 10 13/11/13 MEI