Much of what appears here comes from ideas presented in the book:

Similar documents
Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Econ 300: Quantitative Methods in Economics. 11th Class 10/19/09

Chapter 7: Point Estimation and Sampling Distributions

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

Chapter 7 - Lecture 1 General concepts and criteria

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Random Variables and Probability Distributions

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

8.1 Estimation of the Mean and Proportion

Analysis of truncated data with application to the operational risk estimation

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Financial Risk Management

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Financial Time Series and Their Characteristics

Math 140 Introductory Statistics. First midterm September

STATISTICS and PROBABILITY

Chapter 5: Statistical Inference (in General)

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

AP Statistics Chapter 6 - Random Variables

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Huber smooth M-estimator. Mâra Vçliòa, Jânis Valeinis. University of Latvia. Sigulda,

Bias Reduction Using the Bootstrap

Module 4: Point Estimation Statistics (OA3102)

MVE051/MSG Lecture 7

The Two-Sample Independent Sample t Test

MATH 3200 Exam 3 Dr. Syring

Applied Statistics I

Statistical analysis and bootstrapping

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Lecture 2. Probability Distributions Theophanis Tsandilas

Learning From Data: MLE. Maximum Likelihood Estimators

Chapter 8: Sampling distributions of estimators Sections

Business Statistics 41000: Probability 3

STRESS-STRENGTH RELIABILITY ESTIMATION

Chapter 7. Inferences about Population Variances

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Section 0: Introduction and Review of Basic Concepts

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Week 1 Quantitative Analysis of Financial Markets Distributions B

Probability & Statistics

Chapter 3 Discrete Random Variables and Probability Distributions

1 Describing Distributions with numbers

Chapter 8. Introduction to Statistical Inference

Week 1 Quantitative Analysis of Financial Markets Basic Statistics A

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

STAT 113 Variability

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

A Test of the Normality Assumption in the Ordered Probit Model *

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Chapter 2 Uncertainty Analysis and Sampling Techniques

Chapter ! Bell Shaped

FINITE SAMPLE DISTRIBUTIONS OF RISK-RETURN RATIOS

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Lecture 2 Describing Data

Tutorial 6. Sampling Distribution. ENGG2450A Tutors. 27 February The Chinese University of Hong Kong 1/6

Commonly Used Distributions

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Monte Carlo Simulation (Random Number Generation)

UNIT 4 MATHEMATICAL METHODS

DATA SUMMARIZATION AND VISUALIZATION

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

The Bernoulli distribution

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

Lecture 10: Point Estimation

IEOR E4602: Quantitative Risk Management

Some estimates of the height of the podium

Strategies for Improving the Efficiency of Monte-Carlo Methods

An Improved Skewness Measure

Chapter 4: Asymptotic Properties of MLE (Part 3)

An approximate sampling distribution for the t-ratio. Caution: comparing population means when σ 1 σ 2.

Time Observations Time Period, t

Chapter 4 Variability

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Review: Population, sample, and sampling distributions

Lecture Data Science

Week 7 Quantitative Analysis of Financial Markets Simulation Methods

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

2011 Pearson Education, Inc

Chapter 7: Estimation Sections

Stat 139 Homework 2 Solutions, Fall 2016

STA Module 3B Discrete Random Variables

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

Statistics for Business and Economics

Module 2: Monte Carlo Methods

Chapter 3 Discrete Random Variables and Probability Distributions

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Robust X control chart for monitoring the skewed and contaminated process

Transcription:

Chapter 11 Robust statistical methods Much of what appears here comes from ideas presented in the book: Huber, Peter J. (1981), Robust statistics, John Wiley & Sons (New York; Chichester). There are many definitions in the literature as to what a robust statistical method is. Huber (1981) defines a robust statistical procedure as a method that does not exhibit sensitivity to small deviations from the assumptions. A common motivation comes from introductory statistics. We commonly use the one-sample t-test as a way to test for plausible values of the population mean parameter, µ. The null t distribution (with n 1 degrees of freedom) for the t-statistic, t = x µ s/ n, comes from assuming that sample values, x 1,..., x n, are independent draws from a normal population with mean µ and finite variance, σ 2. In this motivating example, we can think of this normal population as being the assumed distribution. The question we will ask in this set of notes is what happens to the sampling distribution (or some features of the distribution) of an estimator or test-statistic when the actual population distribution is slightly different from what we assumed. A secondary 1

2 Stat 673, Autumn 2008 question is to identify statistics that are distributionally robust (Huber 1981). Such statistics should have good performance under the assumed model and under small deviations from this model the performance should only be impaired slightly. Along the way we have to decide how we are going to measure deviations in the sampling distribution of these statistics when the assumed distribution is not correct. We also have to define what we mean by under small deviations of the assumed model. We will begin with the latter. We do all our computing in this chapter in R. While we solve the exercises we will learn some more complicated ways to use R. We also revisit Monte Carlo experiments and resampling methods. 11.1 The ɛ-contaminated statistical model Suppose that F 0 ( ) is the cumulative distribution function (cdf) for the assumed distribution. In the ɛ-contaminated model we assume that we observe independent random draws not just from F 0. Some of the time we observe draws from some other distribution, denoted by H. For each x value the cdf of the contaminated population is given by F (x) = (1 ɛ)f 0 (x) + ɛh(x). Here ɛ is the probability of drawing a bad observation from a distribution described by the cdf H, and (1 ɛ) is the probability of drawing an observation from the assumed ( good ) distribution F 0 (x). Suppose that the probability density/mass functions exist for all the cdfs (denoted by f, f 0 and h respectively). Then we can write the density function as f(x) = (1 ɛ)f 0 (x) + ɛh(x). To simulate from the ɛ-contaminated model we draw a Bernoulli random variable, Z, with success probability ɛ. When Z = 1 we draw an observation x from a population with cdf H, otherwise (when Z = 0) we draw x from F 0.

Peter F. Craigmile 3 Exercise 1: For some parameter µ and σ 2 > 0, suppose that F 0 is the cdf for a normal distribution with mean µ and variance σ 2 and H is the cdf of a normal distribution with mean µ and variance 9σ 2. Describe what the contaminated density function looks like in this case. Exercise 2: Let µ 0 and σ0 2 > 0 be the mean and variance of the assumed distribution F 0, and let µ H and σh 2 > 0 be the mean and variance of the distribution H. Assume all the parameters are finite. Remembering that the distributions F 0 and H are independent of one another, show that the mean of the ɛ-contaminated distribution F is given by and the variance is µ F = (1 ɛ)µ 0 + ɛµ H, σ 2 F = (1 ɛ)σ 2 0 + ɛσ 2 H + ɛ(1 ɛ)(µ H µ 0 ) 2. Calculate the mean and variance for the special case of Exercise 1. Exercise 3: Suppose that you have written two R functions to simulate from F 0 and H, respectively. Using the idea that we can pass functions as arguments to a function, write a function that simulates n independent observations from the contaminated distribution. Produce graphical summaries of this contaminated distribution in the case of Exercise 1, with n = 1,000, µ = 0, σ 2 = 1, and ɛ = 0.05. Compare this distribution to the case that ɛ = 0.10. 11.2 Comparing estimators We start with an example involving estimating the center of a distribution. We know that the sample mean is sensitive to outliers and extreme value it is not a robust measure of the center of a distribution. A more robust measure of the center of the distribution is the median. The key trouble with the median however is that for many assumed distributions (including the normal), the median is not an efficient estimator of the center of the distribution.

4 Stat 673, Autumn 2008 We start by carefully defining our notion of efficiency. For some integer n, suppose we have a random sample of values {x 1,..., x n } drawn from some population distribution F (which does not have to be the ɛ-contaminated distribution), which depends on some unknown parameter θ. The bias of an estimator, θ n, of θ is defined to be bias( θ n ) = E( θ n θ) = E( θ n ) θ, where θ is the true value of the parameter. We say that θ n is an unbiased estimate of θ when bias( θ n ) = 0. We use the mean squared error as a way to compare different estimators, usually picking the estimator that has the smallest mean squared error. The mean squared error of the estimator θ n is ] ) 2 MSE( θ n ) = E( [ θn θ. = var( θ n ) + [ bias( θ n )] 2, where var( θ n ) is the variance of the estimator. When θ n is an unbiased estimate of θ the mean squared error is equal to the variance. (This is the situation we are most used to.) Now suppose that we have two estimators of θ, θ n and θ n. One definition of the relative efficiency (RE) of θ n relative to θ n, is RE( θ n, θ n ) = MSE( θ n ) MSE( θ n ). Often this ratio is expressed as a percentage. Note that the relative efficiency simplifies to a ratio of variances when both estimators are unbiased for θ. For many population distributions, F, we cannot calculate the relative efficiency at a fixed sample size n. Instead it is common to consider the asymptotic relative efficiency (ARE), in which we compare ratios of functions of the mean squared errors as n. This needs careful mathematical argument (see e.g., Chapter 10, and in particular Section 10.1.2 of Casella and Berger, 2002). At fixed sample sizes all is not lost however. We do have the opportunity to write Monte Carlo experiments that allow us to approximate these quantities, as the next exercise illustrates.

Peter F. Craigmile 5 Exercise 4: Suppose we want to write a Monte Carlo experiment to compare the relative performance of the sample mean and sample median as estimates of the population mean for the ɛ-contaminated distribution defined in Exercise 1, when n = 100, µ = 0, and σ = 1. We want to investigate the relationship as we vary ɛ from 0 to 0.1, in steps of 0.01. (i) How do you know that both estimators are unbiased estimators of the population mean of the contaminated distribution? (ii) Write R code to compare the relative efficiency for these two estimators. graph that summarizes the relative efficiency as a function of ɛ. Produce a (iii) How can you be sure of the relationship between ɛ and the relative efficiency that you observe in the Monte Carlo experiment is significant? Write any additional R code to answer this part of the exercise. 11.3 M-estimators for estimating the center More efficient, robust measures of the center, µ, can be obtained by using an M-estimator. Suppose we have a random sample {x i : i = 1,..., n} drawn from a population with pdf f(x µ). We have already shown in this class that the maximum likelihood estimate of µ is given by or equivalently max µ n f(x i µ) = max µ min µ n log f(x i µ) n log f(x i µ). We obtain an M-estimator by replacing log f( ) by some other function ρ( ). If we choose ρ(x) = x 2 then the M-estimator is the mean, and for ρ(x) = x, we obtain the median. For some constant c, we obtain the metric-trimming M-estimator with the function { x 2, x < c, ρ(x) = 0 otherwise.

6 Stat 673, Autumn 2008 The metric-winzoring M-estimator due to Huber is obtained with the function { x 2, x < c, ρ(x) = c(2 x c) otherwise. As c goes to zero, we obtain the median, and as c goes to infinity we obtain the mean. To choose c in general we need to consider estimates for the spread as well as the center. 11.4 Estimating the spread of a distribution For the spread, we know the variance and standard deviation is not resistant to outliers. A commonly used estimator is the interquartile range, calculated by IQR = Q 3 Q 1. Exercise 5: For the standard normal distribution, show that the IQR is 1.348980 (to 6 decimal places). Hence deduce for a N(µ, σ 2 ) distribution that the IQR is 1.348980σ. Using this result we see that an estimate of σ for a normal distribution is σ = IQR 1.3490. Another estimate of spread is the median absolute deviations (MAD) estimator, defined by MAD = median,...,n { x i M }, where M is the sample median of the data. We need to multiply the MAD estimate by 1.4826 to get an unbiased estimate of σ for a N(µ, σ 2 ) distribution. The asymptotic relative efficiency for the MAD estimator as compared to the sample standard deviation is 37%, but the estimator is very resistant to outliers. We use the mad function in R to calculate the MAD estimate of spread. The MAD and IQR estimator are less affected by heavy tails. Exercise 6: Verify that the MAD estimator for a normal distribution is approximately equal to σ.

Peter F. Craigmile 7 Exercise 7: Think about how you could write a simulation to compare the efficiency of the MAD estimator with the sample standard deviation, for the ɛ-contaminated distribution defined in Exercise 1, with n = 100, µ = 0, and σ = 1. (The true spread of this distribution is given in Exercise 2). 11.5 M-estimators for the center and scale An M-estimator for the center based on the rescaled data is given by n ( ) xi µ min ρ. µ s One common choice for s is the MAD estimator. For the metric-trimming and metricwinzoring functions, ρ( ), we then express c in terms of the number of units of s, for example, 1.5s. The following code calculates the MAD estimator and then metric-winzoring M-estimator of the center for data stored in the vector x. In the first line we need to load the MASS library. library(mass) huber(x) Since the MAD estimate is not very efficient we can also use a metric-winzoring M-estimate for the spread: this gives an M-estimator for the center and spread: hubers(x) For more details on robust statistics (in particular how to calculate standard errors for these estimators) see Huber (1981). References Casella, G. and R. L. Berger (2002). Statistical inference. Duxbury Press. Huber, P. J. (1981). Robust Statistics. John Wiley & Sons.