Section 2.4. Properties of point estimators 135

Similar documents
Point Estimation. Edwin Leuven

Computer Statistics with R

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

Chapter 8: Sampling distributions of estimators Sections

Chapter 7: Point Estimation and Sampling Distributions

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

may be of interest. That is, the average difference between the estimator and the truth. Estimators with Bias(ˆθ) = 0 are called unbiased.

8.1 Estimation of the Mean and Proportion

Chapter 8. Introduction to Statistical Inference

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Much of what appears here comes from ideas presented in the book:

Bias Reduction Using the Bootstrap

Chapter 5. Statistical inference for Parametric Models

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Chapter 4: Asymptotic Properties of MLE (Part 3)

5/5/2014 یادگیري ماشین. (Machine Learning) ارزیابی فرضیه ها دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی. Evaluating Hypothesis (بخش دوم)

Lecture 10: Point Estimation

Applied Statistics I

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Chapter 5: Statistical Inference (in General)

Maximum Likelihood Estimation

MATH 3200 Exam 3 Dr. Syring

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Chapter 7 - Lecture 1 General concepts and criteria

Review of key points about estimators

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

INLEDNING. Promemorior från P/STM / Statistiska centralbyrån. Stockholm : Statistiska centralbyrån, Nr 1-24.

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Analysis of truncated data with application to the operational risk estimation

ELEMENTS OF MONTE CARLO SIMULATION

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2012 Prepared by: Thomas W. Engler, Ph.D., P.E

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

LET us say we have a population drawn from some unknown probability distribution f(x) with some

Econ 300: Quantitative Methods in Economics. 11th Class 10/19/09

FREDRIK BAJERS VEJ 7 G 9220 AALBORG ØST Tlf.: URL: Fax: Monte Carlo methods

Estimation of dynamic term structure models

Lecture 22. Survey Sampling: an Overview

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

Chapter 6: Point Estimation

Counting Basics. Venn diagrams

Probability & Statistics

STAT Chapter 6: Sampling Distributions

1/2 2. Mean & variance. Mean & standard deviation

Monte Carlo Simulation (General Simulation Models)

Chapter 3. Dynamic discrete games and auctions: an introduction

COMPARISON OF RATIO ESTIMATORS WITH TWO AUXILIARY VARIABLES K. RANGA RAO. College of Dairy Technology, SPVNR TSU VAFS, Kamareddy, Telangana, India

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections

Chapter 8: Sampling distributions of estimators Sections

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

Importance Sampling and Monte Carlo Simulations

An Improved Skewness Measure

Calibration Estimation under Non-response and Missing Values in Auxiliary Information

DATA SUMMARIZATION AND VISUALIZATION

The Assumption(s) of Normality

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E

14.461: Technological Change, Lectures 12 and 13 Input-Output Linkages: Implications for Productivity and Volatility

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Data Analysis. BCF106 Fundamentals of Cost Analysis

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

The Constant Expected Return Model

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

1 Introduction 1. 3 Confidence interval for proportion p 6

IEOR E4703: Monte-Carlo Simulation

Multivariate Statistics Lecture Notes. Stephen Ansolabehere

Review of key points about estimators

Chapter 4 Continuous Random Variables and Probability Distributions

The following content is provided under a Creative Commons license. Your support

Chapter 4 Variability

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

MM and ML for a sample of n = 30 from Gamma(3,2) ===============================================

5.3 Statistics and Their Distributions

9. Statistics I. Mean and variance Expected value Models of probability events

2 f. f t S 2. Delta measures the sensitivityof the portfolio value to changes in the price of the underlying

Technology Support Center Issue

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Non-informative Priors Multiparameter Models

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Module 4: Point Estimation Statistics (OA3102)

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Back to estimators...

3. Probability Distributions and Sampling

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 23

STAT 825 Notes Random Number Generation

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Statistics for Business and Economics

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

Basics. STAT:5400 Computing in Statistics Simulation studies in statistics Lecture 9 September 21, 2016

Transcription:

Section 2.4. Properties of point estimators 135 The fact that S 2 is an estimator of σ 2 for any population distribution is one of the most compelling reasons to use the n 1 in the denominator of the definition of S 2. This result does not imply, however, that E[S]=σ. The previous two examples have illustrated estimates of parameters. It is helpful to quantify the bias in order to have a measure of the expected distance between a point estimator and its target value. Definition 2.2 Let ˆθ denote a statistic that is calculated from the sample X 1, X 2,..., X n. The bias associated with using ˆθ as an estimator of θ is B (ˆθ, θ ) = E [ˆθ ] θ. There is a subset of the biased estimators that is of interest. The classification is a bit of a consolation prize for biased estimators. Their redeeming feature is that although they are biased estimators for finite sample sizes n, they are as n. These estimators are known as asymptotically estimators and are defined formally below. Definition 2.3 Let ˆθ denote a statistic that is calculated from the sample X 1, X 2,..., X n. If lim B(ˆθ, θ ) = 0, n then ˆθ is an asymptotically estimator of θ. All estimators are necessarily asymptotically. But only some of the biased estimators are asymptotically. To this end, we subdivide the biased portion of the Venn diagram from Figure 2.12 to include asymptotically estimators in Figure 2.14. biased asymptotically Figure 2.14: Venn diagram of, biased, and asymptotically point estimators. Example 2.17 Let X 1, X 2,..., X n denote a random sample from a U(0, θ) population, where θ is a positive unknown parameter. Classify the following point estimators of θ into the categories given in Figure 2.14 and select the best estimator: 2 X, 3 X, X (n),

136 Chapter 2. Point Estimation (n+1)x (n) /n, (n+1)x (1), 17, where X (1) = min{x 1, X 2,..., X n } and X (n) = max{x 1, X 2,..., X n }. When faced with a real data set, we oftentimes have to choose a point estimator from a set of potential point estimators such as this. The purpose of this example is to investigate the properties of these six point estimators. The first point estimator, 2 X, is the method of moments estimator. The derivation was given in Example 2.2. Since the population mean of the U(0, θ) distribution is θ/2 and E[2 X]=2E[ X]=2 θ 2 = θ, via Example 2.16, the method of moments estimator is classified as an estimator. The point estimator 3 X is classified as a biased estimator because E[3 X]=3E[ X]=3 θ 2 = 3 2 θ. This estimator overestimates the population parameter θ on average. The positive bias is B (ˆθ, θ ) = B(3 X, θ)=e[3 X] θ= 3 2 θ θ= θ 2. The point estimator X (n) is the maximum likelihood estimator. The derivation, after some minor manipulation of the objective function or the support of the population distribution, was given in Example 2.9. Using an order statistic result, or the APPL code X := UniformRV(0, theta); Y := OrderStat(X, n, n); Mean(Y); the expected value of X (n) is E [ X (n) ] = nθ n+1. The maximum likelihood estimator misses low, on average, because E [ X (n) ] is less than θ. Since the expected value is not equal to θ for finite values of n, this estimator is biased. The bias is B (ˆθ, θ ) = B(X (n), θ)=e [ X (n) ] θ= nθ n+1 θ= θ n+1. This estimator should be classified as asymptotically, however, because lim B(ˆθ, θ ) ( = lim θ ) = 0. n n n+1

Section 2.4. Properties of point estimators 137 The point estimator(n+1)x (n) /n was presented in Example 2.9 as a modification of the maximum likelihood estimator that included an unbiasing constant. The expected value of (n+1)x (n) /n is [ ] n+1 E n X (n) = n+1 nθ n n+1 = θ, so this point estimator is classified as an estimator. The point estimator (n+1)x (1) is also an estimator. This can be seen by invoking an order statistic result and computing the expected value, or by the APPL code X := UniformRV(0, theta); Y := OrderStat(X, n, 1); Mean(Y); The point estimator ˆθ = 17 is quite bizarre. The statistician simply ignores the data values X 1, X 2,..., X n and pulls 17 out of thin air as the estimate of θ. The expected value of ˆθ is E [ˆθ ] = E[17]=17, which is not θ (unless θ just happens to be 17), so this estimator is classified as a biased estimator. We now know that three of the six suggested point estimators are. The results of our analysis are summarized in Figure 2.15. biased 2 X (n+1)x (n) /n (n+1)x (1) X (n) asymptotically 3 X 17 Figure 2.15: Venn diagram of several point estimates for θ for a U(0, θ) population. Now to the more difficult question: which is the best of the six estimates? This is a purposefully vague question at this point, so the question will be addressed from several different angles. The choice between the point estimators boils down to which point estimator will perform best for a higher fraction of data sets than the other point estimators. This does not imply, of course, that the estimator selected will be the best for every data set. We begin by plotting the sampling distributions of the three estimators to gain some additional insight. This can only be done for specific values of n and θ, so let s arbitrarily choose n = 5 and θ = 10. For this choice, the probability density functions of 2 X, 6X (5) /5, and 6X (1) are plotted in Figure 2.16. APPL was used to calculate the probability density functions. The sampling distributions of 2 X, 6X (5) /5, and 6X (1) reveal vastly different shapes. The probability density function of 2 X is bell shaped (via the central limit theorem) and symmetric about θ = 10; the probability density functions of 6X (5) /5 and 6X (1) are skewed distributions. Since the support of

138 Chapter 2. Point Estimation f ˆθ (x) 0.4 6X (5) /5 0.3 0.2 0.1 6X (1) 2 X 0.0 x 0 5 10 15 20 Figure 2.16: Sampling distributions of 2 X, 6X (5) /5, and 6X (1) when n=5 and θ=10. the population is (0, 10), the support of 2 X is (0, 20), the support of 6X (5) /5 is (0, 12), and the support of 6X (1) is (0, 60). Figure 2.16 reveals that 6X (1) has a significantly larger variance than the other two estimators, so it is probably the weakest candidate of the three estimators. Since the variance of the estimators played a role in analyzing the sampling distributions of the three estimators, perhaps it is worthwhile calculating the population means and population variances of all six of the estimators. The values are summarized in Table 2.5. Notice that the three estimators, 2 X, (n+1)x (n) /n, and (n+1)x (1) all collapse to the same estimator when n = 1; the point estimator is just double the single observation. Choosing the point estimator with the smallest variance is not appropriate here because this would result in choosing the strange point estimate Point estimate ˆθ E [ˆθ ] V [ˆθ ] Categorization 2 X θ 3 X X (n) (n+1)x (n) n 3θ 2 nθ n+1 θ θ 2 3n 3θ 2 4n nθ 2 (n+2)(n+1) 2 θ 2 n(n+2) biased asymptotically (n+1)x (1) θ nθ 2 n+2 17 17 0 biased Table 2.5: Population means and variances of the six point estimators for θ.

Section 2.4. Properties of point estimators 139 ˆθ = 17. Instead, it is advantageous to choose the estimator with the smallest variance. Using this criteria, (n+1)x (n) /n has the smallest variance of the three for samples of n=2 or more observations. But the estimator with the smallest variance is not the only criteria that can be used to select the preferred estimator. The R code below simulates 10,000 random samples of size n=5 from a U(0, θ) population when θ=10. All six point estimators are calculated for each sample, and the point estimator that lies closest to θ is identified and tabulated. Finally, the fraction of times that each estimator is closest to θ=10 is printed. set.seed(8) n = 5 theta = 10 nrep = 10000 theta.hat = numeric(6) count = numeric(6) for (i in 1:nrep) { x = runif(n, 0, theta) theta.hat[1] = 2 * mean(x) theta.hat[2] = 3 * mean(x) theta.hat[3] = max(x) theta.hat[4] = (n + 1) * max(x) / n theta.hat[5] = (n + 1) * min(x) theta.hat[6] = 17 index = which.min(abs(theta.hat - theta)) count[index] = count[index] + 1 } print(count / nrep) The results of the Monte Carlo simulation are given in Table 2.6 for sample sizes n=5, n=50, and n=500. The entries give the fractions of the simulations giving the closest estimator to the true parameter value θ=10. As expected, the column sums of the entries in the table equal 1. When n=5, even the maligned ˆθ=17 is the closest to θ=10 for two of the 10,000 random samples. The reader is encouraged to imagine what type Point estimate n=5 n=50 n=500 2 X 0.1765 0.0912 0.0328 3 X 0.1323 0.0000 0.0000 X (n) 0.3178 0.3749 0.3905 (n+1)x (n) /n 0.3262 0.5275 0.5762 (n+1)x (1) 0.0470 0.0064 0.0005 17 0.0002 0.0000 0.0000 Table 2.6: Monte Carlo simulation results for a U(0,10) population.

140 Chapter 2. Point Estimation of data set would lead to this awful estimator outdoing the other estimators. Table 2.6 shows that, by a somewhat narrow margin, the estimator (n+1)x (n) /n dominates the other estimators for the sample sizes considered here. In summary, based on ˆθ = (n+1)x (n) /n being (a) an estimate, (b) the estimate with the smallest variance, and (c) the estimate that is most likely to be the closest to the population value of θ for several sample sizes in a Monte Carlo experiment, we conclude that ˆθ=(n+1)X (n) /n is the best of the six point estimators. It carries the additional bonus that all of the data values are necessarily less than ˆθ, which is a desirable property for this particular population distribution. This example has brought up three issues concerning point estimators that will be addressed in the paragraphs that follow. The first issue is motivated by the Monte Carlo simulation experiment. The objective of the experiment was to find the point estimator that was most likely to be closest to the true parameter value. The distance between the estimator ˆθ and the true parameter value θ is an important quantity known as the error of estimation, which is formally defined next. Definition 2.4 Let ˆθ denote a statistic that is calculated from the sample X 1, X 2,..., X n that is used to estimate the population parameter θ. The error of estimation is R (ˆθ, θ ) = ˆθ θ. The second issue concerns the comparison of the three estimators and the three biased estimators. Could there ever be circumstances in which one would choose a biased estimator over an estimator? Consider the generic and idealized presentation of the sampling distributions of two point estimators ˆθ 1 and ˆθ 2 in Figure 2.17 for a fixed sample size n. The sampling distribution of ˆθ 1 is centered over the true parameter value θ, so ˆθ 1 is an estimator of θ, that is, E [ˆθ ] 1 = θ. The sampling distribution of ˆθ 2, however, is not centered over the true parameter value θ, so ˆθ 2 is a biased estimator of θ, that is, E [ˆθ ] 2 θ. But the decision between the two is complicated by the fact that the variance of the second estimator is much smaller than the variance of the first estimator, that is, V [ˆθ ] ] 2 < V [ˆθ 1. The choice between the estimator with the larger variance and the biased estimator with the smaller variance is a difficult one. (ˆθ ) f ˆΘ ˆθ 2 ˆθ 1 θ ˆθ Figure 2.17: Two sampling distributions.