Chapter 3 Descriptive Statistics: Numerical Measures Part B

Similar documents
Chapter 3 Student Lecture Notes 3-1

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

MgtOp 215 Chapter 13 Dr. Ahn

UNIVERSITY OF VICTORIA Midterm June 6, 2018 Solutions

Linear Combinations of Random Variables and Sampling (100 points)

02_EBA2eSolutionsChapter2.pdf 02_EBA2e Case Soln Chapter2.pdf

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

Calibration Methods: Regression & Correlation. Calibration Methods: Regression & Correlation

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

Chapter 5 Student Lecture Notes 5-1

Midterm Exam. Use the end of month price data for the S&P 500 index in the table below to answer the following questions.

Tests for Two Correlations

PhysicsAndMathsTutor.com

Probability Distributions. Statistics and Quantitative Analysis U4320. Probability Distributions(cont.) Probability

Spatial Variations in Covariates on Marriage and Marital Fertility: Geographically Weighted Regression Analyses in Japan

Capability Analysis. Chapter 255. Introduction. Capability Analysis

OCR Statistics 1 Working with data. Section 2: Measures of location

The Institute of Chartered Accountants of Sri Lanka

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

3: Central Limit Theorem, Systematic Errors

Appendix - Normally Distributed Admissible Choices are Optimal

S yi a bx i cx yi a bx i cx 2 i =0. yi a bx i cx 2 i xi =0. yi a bx i cx 2 i x

AS MATHEMATICS HOMEWORK S1

/ Computational Genomics. Normalization

Simple Regression Theory II 2010 Samuel L. Baker

Hewlett Packard 10BII Calculator

Evaluating Performance

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed.

Random Variables. b 2.

4. Greek Letters, Value-at-Risk

Data Mining Linear and Logistic Regression

The Effects of Industrial Structure Change on Economic Growth in China Based on LMDI Decomposition Approach

Correlations and Copulas

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Multifactor Term Structure Models

Basket options and implied correlations: a closed form approach

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

Number of women 0.15

Elements of Economic Analysis II Lecture VI: Industry Supply

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8

Elton, Gruber, Brown and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 4

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9

Alternatives to Shewhart Charts

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

THE MARKET PORTFOLIO MAY BE MEAN-VARIANCE EFFICIENT AFTER ALL

σ may be counterbalanced by a larger

Notes on experimental uncertainties and their propagation

Introduction. Chapter 7 - An Introduction to Portfolio Management

Introduction. Why One-Pass Statistics?

CrimeStat Version 3.3 Update Notes:

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

Module Contact: Dr P Moffatt, ECO Copyright of the University of East Anglia Version 2

3.1 Measures of Central Tendency

THE VOLATILITY OF EQUITY MUTUAL FUND RETURNS

Physics 4A. Error Analysis or Experimental Uncertainty. Error

Monetary Tightening Cycles and the Predictability of Economic Activity. by Tobias Adrian and Arturo Estrella * October 2006.

Risk and Return: The Security Markets Line

Sampling Distributions of OLS Estimators of β 0 and β 1. Monte Carlo Simulations

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis

SIMPLE FIXED-POINT ITERATION

Sequential equilibria of asymmetric ascending auctions: the case of log-normal distributions 3

Real Exchange Rate Fluctuations, Wage Stickiness and Markup Adjustments

Available online: 20 Dec 2011

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Merton-model Approach to Valuing Correlation Products

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Dr. Wayne A. Taylor

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect

How diversifiable is firm-specific risk? James Bennett. and. Richard W. Sias * October 20, 2006

Economics 1410 Fall Section 7 Notes 1. Define the tax in a flexible way using T (z), where z is the income reported by the agent.

Numerical Analysis ECIV 3306 Chapter 6

OPERATIONS RESEARCH. Game Theory

Price Formation on Agricultural Land Markets A Microstructure Analysis

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality

Problem Set 6 Finance 1,

Tests for Two Ordered Categorical Variables

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session STS041) p The Max-CUSUM Chart

ISE High Income Index Methodology

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Parallel Prefix addition

STAT 3014/3914. Semester 2 Applied Statistics Solution to Tutorial 12

THIRD MIDTERM EXAM EC26102: MONEY, BANKING AND FINANCIAL MARKETS MARCH 24, 2004

Standardization. Stan Becker, PhD Bloomberg School of Public Health

Does a Threshold Inflation Rate Exist? Quantile Inferences for Inflation and Its Variability

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

Abstract The R chart is often used to monitor for changes in the process variability. However, the standard

Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator.

R Square Measure of Stock Synchronicity

Appendix for Solving Asset Pricing Models when the Price-Dividend Function is Analytic

Comparisons of Gene Expression Indexes for Oligonucleotide Arrays

NBER WORKING PAPER SERIES WHY ARE SOME IMMIGRANT GROUPS MORE SUCCESSFUL THAN OTHERS? Edward P. Lazear

Networks in Finance and Marketing I

Investment Decisions in New Generation Cooperatives:

- contrast so-called first-best outcome of Lindahl equilibrium with case of private provision through voluntary contributions of households

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

Simultaneous Monitoring of Multivariate-Attribute Process Mean and Variability Using Artificial Neural Networks

Principles of Finance

Transcription:

Sldes Prepared by JOHN S. LOUCKS St. Edward s Unversty Slde 1 Chapter 3 Descrptve Statstcs: Numercal Measures Part B Measures of Dstrbuton Shape, Relatve Locaton, and Detectng Outlers Eploratory Data Analyss Measures of Assocaton Between Two Varables The Weghted Mean and Workng wth Grouped Data Slde 2 Measures of Dstrbuton Shape, Relatve Locaton, and Detectng Outlers Dstrbuton Shape z-scores Chebyshev s Theorem Emprcal Rule Detectng Outlers Slde 3 1

Dstrbuton Shape: Skewness An mportant measure of the shape of a dstrbuton s called skewness. The formula for computng skewness for a data set s somewhat comple. Skewness can be easly computed usng statstcal software. Slde 4 Dstrbuton Shape: Skewness Symmetrc (not skewed) Skewness s zero. Mean and medan are equal. Relatve Frequency.35.3.25.2.15.1.5 Skewness = Slde 5 Dstrbuton Shape: Skewness Moderately Skewed Left Skewness s negatve. Mean wll usually be less than the medan. Relatve Frequency.35.3.25.2.15.1.5 Skewness =.31 Slde 6 2

Dstrbuton Shape: Skewness Moderately Skewed Rght Skewness s postve. Mean wll usually be more than the medan. Relatve Frequency.35.3.25.2.15.1.5 Skewness =.31 Slde 7 Dstrbuton Shape: Skewness Hghly Skewed Rght Skewness s postve (often above 1.). Mean wll usually be more than the medan. Relatve Frequency.35.3.25.2.15.1.5 Skewness = 1.25 Slde 8 Dstrbuton Shape: Skewness Eample: Apartment Rents Seventy effcency apartments were randomly sampled n a small college town. The monthly rent prces for these apartments are lsted n ascendng order on the net slde. Slde 9 3

Dstrbuton Shape: Skewness 425 43 43 435 435 435 435 435 44 44 44 44 44 445 445 445 445 445 45 45 45 45 45 45 45 46 46 46 465 465 465 47 47 472 475 475 475 48 48 48 48 485 49 49 49 5 5 5 5 51 51 515 525 525 525 535 549 55 57 57 575 575 58 59 6 6 6 6 615 615 Slde 1 Dstrbuton Shape: Skewness Relatve Frequency.35.3.25.2.15.1.5 Skewness =.92 Slde 11 z-scores The z-score s often called the standardzed value. It denotes the number of standard devatons a data value s from the mean. z = s Slde 12 4

z-scores An observaton s z-score s a measure of the relatve locaton of the observaton n a data set. A data value less than the sample mean wll have a z-score less than zero. A data value greater than the sample mean wll have a z-score greater than zero. A data value equal to the sample mean wll have a z-score of zero. Slde 13 z-scores z-score of Smallest Value (425) 425 49.8 z = = = 1.2 s 54.74 Standardzed Values for Apartment Rents -1.2-1.11-1.11-1.2-1.2-1.2-1.2-1.2 -.93 -.93 -.93 -.93 -.93 -.84 -.84 -.84 -.84 -.84 -.75 -.75 -.75 -.75 -.75 -.75 -.75 -.56 -.56 -.56 -.47 -.47 -.47 -.38 -.38 -.34 -.29 -.29 -.29 -.2 -.2 -.2 -.2 -.11 -.1 -.1 -.1.17.17.17.17.35.35.44.62.62.62.81 1.6 1.8 1.45 1.45 1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27 Slde 14 Chebyshev s Theorem At least (1-1/z 2 ) of the tems n any data set wll be wthn z standard devatons of the mean, where z s any value greater than 1. Slde 15 5

Chebyshev s Theorem At least 75% of the data values must be wthn z = 2 standard devatons of the mean. At least 89% of the data values must be wthn z = 3 standard devatons of the mean. At least 94% of the data values must be wthn z = 4 standard devatons of the mean. Slde 16 Chebyshev s Theorem For eample: Let z = 1.5 wth = 49.8 and s = 54.74 At least (1 1/(1.5) 2 ) = 1.44 =.56 or 56% of the rent values must be between - z(s) = 49.8 1.5(54.74) = 49 and + z(s) = 49.8 + 1.5(54.74) = 573 (Actually, 86% of the rent values are between 49 and 573.) Slde 17 Emprcal Rule For data havng a bell-shaped dstrbuton: 68.26% of the values of a normal random varable are wthn +/- 1 standard devaton of ts mean. 95.44% of the values of a normal random varable are wthn +/- 2 standard devatons of ts mean. 99.72% of the values of a normal random varable are wthn +/- 3 standard devatons of ts mean. Slde 18 6

Emprcal Rule 99.72% 95.44% 68.26% µ 3σ µ 1σ µ 2σ µ µ + 1σ µ + 3σ µ + 2σ Slde 19 Detectng Outlers An outler s an unusually small or unusually large value n a data set. A data value wth a z-score less than -3 or greater than +3 mght be consdered an outler. It mght be: an ncorrectly recorded data value a data value that was ncorrectly ncluded n the data set a correctly recorded data value that belongs n the data set Slde 2 Detectng Outlers The most etreme z-scores are -1.2 and 2.27 Usng z > 3 as the crteron for an outler, there are no outlers n ths data set. Standardzed Values for Apartment Rents -1.2-1.11-1.11-1.2-1.2-1.2-1.2-1.2 -.93 -.93 -.93 -.93 -.93 -.84 -.84 -.84 -.84 -.84 -.75 -.75 -.75 -.75 -.75 -.75 -.75 -.56 -.56 -.56 -.47 -.47 -.47 -.38 -.38 -.34 -.29 -.29 -.29 -.2 -.2 -.2 -.2 -.11 -.1 -.1 -.1.17.17.17.17.35.35.44.62.62.62.81 1.6 1.8 1.45 1.45 1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27 Slde 21 7

Eploratory Data Analyss Fve-Number Summary Bo Plot Slde 22 Fve-Number Summary 1 Smallest Value 2 3 4 5 Frst Quartle Medan Thrd Quartle Largest Value Slde 23 Fve-Number Summary Lowest Value = 425 Frst Quartle = 445 Medan = 475 Thrd Quartle = 525 Largest Value = 615 425 43 43 435 435 435 435 435 44 44 44 44 44 445 445 445 445 445 45 45 45 45 45 45 45 46 46 46 465 465 465 47 47 472 475 475 475 48 48 48 48 485 49 49 49 5 5 5 5 51 51 515 525 525 525 535 549 55 57 57 575 575 58 59 6 6 6 6 615 615 Slde 24 8

Bo Plot A bo s drawn wth ts ends located at the frst and thrd quartles. A vertcal lne s drawn n the bo at the locaton of the medan (second quartle). 375 4 425 45 475 5 525 55 575 6 625 Q1 = 445 Q3 = 525 Q2 = 475 Slde 25 Bo Plot Lmts are located (not drawn) usng the nterquartle range (IQR). Data outsde these lmts are consdered outlers. The locatons of each outler s shown wth the symbol *. contnued Slde 26 Bo Plot The lower lmt s located 1.5(IQR) below Q1. Lower Lmt: Q1-1.5(IQR) = 445-1.5(75) = 332.5 The upper lmt s located 1.5(IQR) above Q3. Upper Lmt: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5 There are no outlers (values less than 332.5 or greater than 637.5) n the apartment rent data. Slde 27 9

Bo Plot Whskers (dashed lnes) are drawn from the ends of the bo to the smallest and largest data values nsde the lmts. 375 4 425 45 475 5 525 55 575 6 625 Smallest value nsde lmts = 425 Largest value nsde lmts = 615 Slde 28 Measures of Assocaton Between Two Varables Covarance Correlaton Coeffcent Slde 29 Covarance The covarance s a measure of the lnear assocaton between two varables. Postve values ndcate a postve relatonshp. Negatve values ndcate a negatve relatonshp. Slde 3 1

Covarance The correlaton coeffcent s computed as follows: ( s )( y y) y = n 1 for samples ( µ )( y µ y ) σ y = N for populatons Slde 31 Correlaton Coeffcent The coeffcent can take on values between -1 and +1. Values near -1 ndcate a strong negatve lnear relatonshp. Values near +1 ndcate a strong postve lnear relatonshp. Slde 32 Correlaton Coeffcent The correlaton coeffcent s computed as follows: s y σ y r y = ρ y = ss y σσ y for samples for populatons Slde 33 11

Correlaton Coeffcent Correlaton s a measure of lnear assocaton and not necessarly causaton. Just because two varables are hghly correlated, t does not mean that one varable s the cause of the other. Slde 34 Covarance and Correlaton Coeffcent A golfer s nterested n nvestgatng the relatonshp, f any, between drvng dstance and 18-hole score. Average Drvng Dstance (yds.) 277.6 259.5 269.1 267. 255.6 272.9 Average 18-Hole Score 69 71 7 7 71 69 Slde 35 Covarance and Correlaton Coeffcent Average Std. Dev. 277.6 259.5 269.1 267. 255.6 272.9 y 69 71 7 7 71 69 ( y y ) ( )( y y ) ( ) 1.65-7.45 2.15.5-11.35 5.95-1. 1. 1. -1. -1.65-7.45-11.35-5.95 266.95 7. Total -35.4 8.2192.8944 Slde 36 12

Covarance and Correlaton Coeffcent Sample Covarance ( )( ) y y s 35.4 y = = = 7.8 n 1 6 1 Sample Correlaton Coeffcent s y 7.8 r y = = = -.9631 ss (8.2192)(.8944) y Slde 37 The Weghted Mean and Workng wth Grouped Data Weghted Mean Mean for Grouped Data Varance for Grouped Data Standard Devaton for Grouped Data Slde 38 Weghted Mean When the mean s computed by gvng each data value a weght that reflects ts mportance, t s referred to as a weghted mean. In the computaton of a grade pont average (GPA), the weghts are the number of credt hours earned for each grade. When data values vary n mportance, the analyst must choose the weght that best reflects the mportance of each value. Slde 39 13

Weghted Mean where: w = w = value of observaton w = weght for observaton Slde 4 Grouped Data The weghted mean computaton can be used to obtan appromatons of the mean, varance, and standard devaton for the grouped data. To compute the weghted mean, we treat the mdpont of each class as though t were the mean of all tems n the class. We compute a weghted mean of the class mdponts usng the class frequences as weghts. Smlarly, n computng the varance and standard devaton, the class frequences are used as weghts. Slde 41 Mean for Grouped Data Sample Data f M = n = Populaton Data where: µ = f M N f = frequency of class M = mdpont of class Slde 42 14

Sample Mean for Grouped Data Gven below s the prevous sample of monthly rents for 7 effcency apartments, presented here as grouped data n the form of a frequency dstrbuton. Rent ($) Frequency 42-439 8 44-459 17 46-479 12 48-499 8 5-519 7 52-539 4 54-559 2 56-579 4 58-599 2 6-619 6 Slde 43 Sample Mean for Grouped Data Rent ($) f 42-439 8 44-459 17 46-479 12 48-499 8 5-519 7 52-539 4 54-559 2 56-579 4 58-599 2 6-619 6 Total 7 M 429.5 449.5 469.5 489.5 59.5 529.5 549.5 569.5 589.5 69.5 f M 3436. 7641.5 5634. 3916. 3566.5 2118. 199. 2278. 1179. 3657. 34525. 34,525 = = 493.21 7 Ths appromaton dffers by $2.41 from the actual sample mean of $49.8. Slde 44 Varance for Grouped Data For sample data s 2 For populaton data f ( M ) = n 1 2 2 f µ σ = ( M ) N 2 Slde 45 15

Sample Varance for Grouped Data Rent ($) f 42-439 8 44-459 17 46-479 12 48-499 8 5-519 7 52-539 4 54-559 2 56-579 4 58-599 2 6-619 6 Total 7 M 429.5 449.5 469.5 489.5 59.5 529.5 549.5 569.5 589.5 69.5 M - -63.7-43.7-23.7-3.7 16.3 36.3 56.3 76.3 96.3 116.3 (M f (M - ) 2 - ) 2 458.96 32471.71 191.56 32479.59 562.16 6745.97 13.76 11.11 265.36 1857.55 1316.96 5267.86 3168.56 6337.13 582.16 2328.66 9271.76 18543.53 13523.36 8114.18 28234.29 contnued Slde 46 Sample Varance for Grouped Data Sample Varance s 2 = 28,234.29/(7 1) = 3,17.89 Sample Standard Devaton s = 3,17.89 = 54.94 Ths appromaton dffers by only $.2 from the actual standard devaton of $54.74. Slde 47 End of Chapter 3, Part B Slde 48 16