Statistical estimation

Similar documents
Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Back to estimators...

MATH 3200 Exam 3 Dr. Syring

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Applied Statistics I

Lecture 10: Point Estimation

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Point Estimation. Copyright Cengage Learning. All rights reserved.

Review of key points about estimators

BIO5312 Biostatistics Lecture 5: Estimations

Chapter 7: Point Estimation and Sampling Distributions

Chapter 7: Estimation Sections

Review of key points about estimators

Chapter 8. Introduction to Statistical Inference

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Chapter 7 - Lecture 1 General concepts and criteria

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 7: Estimation Sections

Chapter 5: Statistical Inference (in General)

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Probability & Statistics

Chapter 6: Point Estimation

Chapter 5. Statistical inference for Parametric Models

Learning From Data: MLE. Maximum Likelihood Estimators

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

6. Genetics examples: Hardy-Weinberg Equilibrium

Random Variables Handout. Xavier Vilà

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Exercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

Point Estimation. Edwin Leuven

PROBABILITY AND STATISTICS

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Rowan University Department of Electrical and Computer Engineering

MVE051/MSG Lecture 7

may be of interest. That is, the average difference between the estimator and the truth. Estimators with Bias(ˆθ) = 0 are called unbiased.

Random Samples. Mathematics 47: Lecture 6. Dan Sloughter. Furman University. March 13, 2006

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

EE641 Digital Image Processing II: Purdue University VISE - October 29,

Simulation Wrap-up, Statistics COS 323

Statistical analysis and bootstrapping

Chapter 4: Asymptotic Properties of MLE (Part 3)

8.1 Estimation of the Mean and Proportion

Statistics 6 th Edition

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

Homework Problems Stat 479

CS340 Machine learning Bayesian model selection

12 The Bootstrap and why it works

M249 Diagnostic Quiz

STRESS-STRENGTH RELIABILITY ESTIMATION

Computer Statistics with R

The Bernoulli distribution

Practice Exam 1. Loss Amount Number of Losses

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

Chapter 7: Estimation Sections

A New Hybrid Estimation Method for the Generalized Pareto Distribution

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Lecture 17: More on Markov Decision Processes. Reinforcement learning

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

12. Conditional heteroscedastic models (ARCH) MA6622, Ernesto Mordecki, CityU, HK, 2006.

Commonly Used Distributions

Stochastic Models. Statistics. Walt Pohl. February 28, Department of Business Administration

ECE 295: Lecture 03 Estimation and Confidence Interval

The Binomial Model. Chapter 3

CSC 411: Lecture 08: Generative Models for Classification

Credibility. Chapters Stat Loss Models. Chapters (Stat 477) Credibility Brian Hartman - BYU 1 / 31

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

Continuous random variables

Confidence Intervals Introduction

Bivariate Birnbaum-Saunders Distribution

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

GOV 2001/ 1002/ E-200 Section 3 Inference and Likelihood

Two hours UNIVERSITY OF MANCHESTER. 23 May :00 16:00. Answer ALL SIX questions The total number of marks in the paper is 90.

A comment on Christoffersen, Jacobs and Ornthanalai (2012), Dynamic jump intensities and risk premiums: Evidence from S&P500 returns and options

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

Section 8.2: Monte Carlo Estimation

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Hardy Weinberg Model- 6 Genotypes

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Financial Risk Management

Chapter 4: Estimation

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

Qualifying Exam Solutions: Theoretical Statistics

Probability. An intro for calculus students P= Figure 1: A normal integral

2011 Pearson Education, Inc

Log-linear Modeling Under Generalized Inverse Sampling Scheme

Chapter 5. Sampling Distributions

Econometric Methods for Valuation Analysis

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Probability Weighted Moments. Andrew Smith

Random Variable: Definition

Chapter 8: Sampling distributions of estimators Sections

Transcription:

Statistical estimation Statistical modelling: theory and practice Gilles Guillot gigu@dtu.dk September 3, 2013 Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 1 / 27

1 Introductory example 2 Principles of estimation 3 Likelihood theory 4 Reading 5 Exercises Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 2 / 27

Introductory example Introductory example A batch of 1000 electronic components contains some faulty items. One takes a sample of size 100 with replacement, of which 3 are faulty. What is the proportion of faulty items in the batch? Arriving in a new city, you see a tram passing in the street with the number 16. How many tram lines are there in this city? For a set of measurements y 1,..., y n of temperatures at dates t 1,.., t n observed at a certain location, we want to t a line y = at + b. What are the values a and b that best t the data? Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 3 / 27

Introductory example A common set up We have some data There is a mechanism that generates the data This mechanism depends on a unknown parameter that we want to estimate Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 4 / 27

Introductory example The Statistics way We relate the unknown parameter to the data by mean of a probability distribution. Proportion of faulty items: the number of faulty items in te sample can be assumed to be follow a binomial distribution B(n, p) Tram lines: the number observed can be assumed to follow a uniform distribution U{1,..., N} Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 5 / 27

Principles of estimation Estimator, estimate Estimator Denoting generically θ the unknown parameter value, an estimator is a rule (or algorithm) allowing us to guess θ, from the data. From a mathematical point of view, it is a function R n R d (x 1,..., x n ) ˆθ Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 6 / 27

Principles of estimation d is the dimension of the parameter space (often d = 1 for us) Since we assume that data are random, we will often stress this by denoting them (X 1,..., X n ). The number ˆθ is an estimate of θ. It is a random variable, denoted sometimes ˆθ(X 1,..., X n ) Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 7 / 27

Principles of estimation Bias of an estimator Denition: bias The bias of an estimator is the average discrepancy between the estimate and the true parameter value: Bias(ˆθ) = E[ˆθ θ] = E[ˆθ] θ An estimator is said to be unbiased if Bias(ˆθ) = 0 Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 8 / 27

Principles of estimation Precision and accuracy Precision and accuracy are two concepts that belong to science and engineering best explained by the gure below: In statistics, we have two related concepts: variance and mean square error. Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 9 / 27

Principles of estimation Denition: variance of an estimator V [ˆθ] = E [(ˆθ E[ˆθ]) 2] V [ˆθ] is a measure of how much ˆθ is scaterred around its mean (which may dier from the true value θ). Denition: mean square error of an estimator MSE[ˆθ] = E [(ˆθ θ) 2] is a measure of how much ˆθ is scaterred around the true value θ. When ˆθ is unbiased, E[ˆθ] = θ hence V [ˆθ] = MSE[ˆθ]. Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 10 / 27

Principles of estimation Condence interval Denition: condence interval A condence interval at level (1 α) [0, 1] is an interval that contains the true unknown parameter value θ with probbaility 1 α. Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 11 / 27

Principles of estimation Example: estimation of a proportion We have a sample of n objects of which x are faulty. We estimate the unknown proportion p by ˆp = x/n. Exercise: give the bias and variance of ˆp. Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 12 / 27

Likelihood theory A new look at the probability of the data We consider again the problem of estimating a proportion with binomial sampling. P (X = x) = p x (1 p) n x is the probability to obtain x faulty objects. If we consider p x (1 p) n x as a function of p, it can be interpreted as the likelihood of the unknown parameter p. To acknowledge the dependence on p, we denote L(x; p) = p x (1 p) n x or for short L(p). Probability of data 0.0000 0.0005 0.0010 0.0015 0.0020 0.0 0.2 0.4 0.6 0.8 1.0 p Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 13 / 27

Likelihood theory The maximum likelihood principle The above suggests a method to estimate the unknown parameter p: ˆp = Argmax p p x (1 p) n x ˆp is the parameter value that makes our data most probable, It is known as the Maximum Likelihood Estimate of p. and denoted ˆp ML ( ) n Note that dening ˆp as Argmax p p x (1 p) n x would lead to the same ( x n x) result as does not depend on p. Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 14 / 27

Likelihood theory Likelihood in a general statistical model Denition: likelihood function We consider a dataset consisting of n observations (x 1,..., x n ) We assume that the probability density function or probability mass function of (x 1,..., x n ) denoted by f θ (x 1,..., x n ) is known up to an unknown parameter θ. The likelihood function L is dened as L(x 1,..., x n ; θ) = f θ (x 1,..., x n ) Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 15 / 27

Likelihood theory Examples of likelihood functions Poisson counts We observe the number of phone calls at various calling centres over a given period and denote them by (x 1,..., x n ). We assume that the x i are independent realizations of a Poisson random variable X i with parameter λ, i.e. P (X i = x) = exp( λ)λ x /x! NB: x N and λ R + E[X i ] = λ and V [X i ] = λ Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 16 / 27

Likelihood theory Likelihood for i.i.d Poisson observations Remember: "Likelihood = probability of data for a given parameter value " L(x 1,..., x n ; λ) = n exp( λ)λ x i /x i! i=1 = exp( nλ) λ i x i i x i! exp( nλ)λ i x i Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 17 / 27

Likelihood theory Likelihood for i.i.d Normal observations "Likelihood = probability of data for a given parameter value " Parameter: θ = (µ, σ) R R + L(x 1,..., x n ; µ, σ) = n i=1 n i=1 1 σ 2π exp[ 1 (x i µ) 2 2 σ 2 ] 1 σ exp[ 1 (x i µ) 2 2 σ 2 ] Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 18 / 27

Likelihood theory General maximum likelihood principle Maximum likelihood estimator We consider a dataset consisting of n observations (x 1,..., x n ) We assume that we know the likelihood function L(x 1,..., x n ; θ) = f θ (x 1,..., x n ) The maximum likelihood estimator of θ is dened as ˆθ ML (x 1,..., x n ) = Argmax θ L(x 1,..., x n ; θ) Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 19 / 27

Likelihood theory Deriving ˆp explicitly for the previous binomial sampling We want to maximize L(p) = p x (1 p) n x We could work on L(p) directly in this case but let us denote l(p) = ln L(p). l(p) = ln[p x (1 p) n x ] = x ln p + (n x) ln(1 p) l (p) = x/p (n x)/(1 p) (1) l (p) = 0 if p = x/n ˆp = x/n is the estimate of p for a generic sample with random outcome X, ˆp = X/n is the estimator or p, it is a random variable Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 20 / 27

Likelihood theory Maximum likelihood estimator for i.i.d Poisson observations Omitting the term that does not depend on λ, we have l(x 1,..., x n ; λ) = ln L(x 1,..., x n ; λ) = ln[exp( nλ)λ i x i ] = nλ + x i ln λ i Hence d dλ l(x 1,..., x n ; λ) = n + x i /λ i And d dλ l(x 1,..., x n ; λ) = 0 λ = x i /n i The MLE of λ is ˆλ ML = i x i/n = x Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 21 / 27

Likelihood theory Likelihood for i.i.d Normal observations "Likelihood = probability of data for a given parameter value " Parameter: θ = (µ, σ) R R + L(x 1,..., x n ; µ, σ) = n i=1 n i=1 1 σ 2π exp[ 1 (x i µ) 2 2 σ 2 ] 1 σ exp[ 1 (x i µ) 2 2 σ 2 ] l(x 1,..., x n ; µ, σ) = n i=1 [ ln σ 1 2 = n ln σ 1 2σ 2 (x i µ) 2 ] σ 2 n (x i µ) 2 i=1 Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 22 / 27

Likelihood theory d dµ l(x 1,..., x n ; µ, σ) = 1 2σ 2 n (x i µ) i=1 d dσ l(x 1,..., x n ; µ, σ) = n σ + 1 σ 3 d dµ l = 0 and d dσ l = 0 give hence and nσ 2 + ˆθ ML = (µ, σ) ML = n (x i µ) 2 i=1 n (x i µ) = 0 i=1 n (x i µ) 2 = 0 i=1 ( 1 n n x i, 1 n i=1 ) n (x i x) 2 Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 23 / 27 i=1

Likelihood theory Remarks on the MLE The rule likelihod = product of marginal densities applies only when the observations are independent Taking the log linearizes the product into a sum simplies greatly the math expressions in the case density proportional to exp(ax b )x c avoids numerical instabilities when using numerical computation Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 24 / 27

Likelihood theory Remarks on the MLE (cont') Deriving the MLE in closed form is often impossible in real-life problems. One has to resort to numerical optimization. Hence the importance of optimization methods in statistics. If the parameter θ belongs to a discrete set, dierentiating l(θ) is meaningless. One has to resort to discrete optimization methods. The likelihood L(x 1,..., x n ; θ) is sometimes denoted L(θ x 1,..., x n ). This is misleading and mathematically completely wrong since in the likelihood theory, θ is not a random variable. Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 25 / 27

Reading Reading To go beyond these slides, you can read the rst two chapters of In All Likelihood, Yudi Pawitan, Oxford Science Publications, 2001. This book is not in DTU digital library but almost completely on [Google books ] Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 26 / 27

Exercises Exercises 1 We assume that we have recorded the life duration of n light bulbs denoted x 1,..., x n. We assume that they are n iid replicates of an exponential E(α) distribution. Derive analytically the expression of the MLE of α. 2 Derive analytically the MLE of a for a dataset consisting of n iid replicates of a U[0, a] distribution. Evaluate the bias of this estimator. What is the limit of the bias when n tends to +? 3 For a distribution f θ, the expectation of X under f θ can be expressed as a function φ(θ). The moment method consists in identifying φ(θ) to the empirical mean. Apply this principle to the case above and discuss the estimator in terms of bias, variance, other remarks? Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 27 / 27