Parametric Density Estimation: Maximum Likelihood Estimation

Similar documents
5. Best Unbiased Estimators

14.30 Introduction to Statistical Methods in Economics Spring 2009

STAT 135 Solutions to Homework 3: 30 points

Lecture 5 Point Es/mator and Sampling Distribu/on

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

ECON 5350 Class Notes Maximum Likelihood Estimation

Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

Introduction to Probability and Statistics Chapter 7

Notes on Expected Revenue from Auctions

Maximum Empirical Likelihood Estimation (MELE)

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

18.S096 Problem Set 5 Fall 2013 Volatility Modeling Due Date: 10/29/2013

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Estimating Proportions with Confidence

Topic-7. Large Sample Estimation

0.1 Valuation Formula:

Exam 2. Instructor: Cynthia Rudin TA: Dimitrios Bisias. October 25, 2011

. The firm makes different types of furniture. Let x ( x1,..., x n. If the firm produces nothing it rents out the entire space and so has a profit of

. (The calculated sample mean is symbolized by x.)

Monetary Economics: Problem Set #5 Solutions

x satisfying all regularity conditions. Then

5 Statistical Inference

Statistics for Economics & Business

Exam 1 Spring 2015 Statistics for Applications 3/5/2015

CHAPTER 8 Estimating with Confidence

Summary. Recap. Last Lecture. .1 If you know MLE of θ, can you also know MLE of τ(θ) for any function τ?

Basic formula for confidence intervals. Formulas for estimating population variance Normal Uniform Proportion

Math 124: Lecture for Week 10 of 17

Unbiased estimators Estimators

Chapter 10 - Lecture 2 The independent two sample t-test and. confidence interval

ii. Interval estimation:

Topic 14: Maximum Likelihood Estimation

Sampling Distributions and Estimation

1. Suppose X is a variable that follows the normal distribution with known standard deviation σ = 0.3 but unknown mean µ.

A Bayesian perspective on estimating mean, variance, and standard-deviation from data

Dr. Maddah ENMG 624 Financial Eng g I 03/22/06. Chapter 6 Mean-Variance Portfolio Theory

FINM6900 Finance Theory How Is Asymmetric Information Reflected in Asset Prices?

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

NOTES ON ESTIMATION AND CONFIDENCE INTERVALS. 1. Estimation

Confidence Intervals Introduction

BASIC STATISTICS ECOE 1323

1 Random Variables and Key Statistics

Sampling Distributions & Estimators

Chapter 8 Interval Estimation. Estimation Concepts. General Form of a Confidence Interval

Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

Lecture 4: Probability (continued)

Insurance and Production Function Xingze Wang, Ying Hsuan Lin, and Frederick Jao (2007)

Simulation Efficiency and an Introduction to Variance Reduction Methods

Statistics for Business and Economics

Bayes Estimator for Coefficient of Variation and Inverse Coefficient of Variation for the Normal Distribution

4.5 Generalized likelihood ratio test

Kernel Density Estimation. Let X be a random variable with continuous distribution F (x) and density f(x) = d

The material in this chapter is motivated by Experiment 9.

Lecture 9: The law of large numbers and central limit theorem

Lecture 5: Sampling Distribution

5 Decision Theory: Basic Concepts

BIOSTATS 540 Fall Estimation Page 1 of 72. Unit 6. Estimation. Use at least twelve observations in constructing a confidence interval

Sampling Distributions and Estimation

Chapter 8: Estimation of Mean & Proportion. Introduction

A point estimate is the value of a statistic that estimates the value of a parameter.

FOUNDATION ACTED COURSE (FAC)

Outline. Populations. Defs: A (finite) population is a (finite) set P of elements e. A variable is a function v : P IR. Population and Characteristics

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

B = A x z

Many Techniques Developed. Uncertainty. Aspects of Uncertainty. Decision Theory = Probability + Utility Theory

Problem Set 1a - Oligopoly

Research Article The Probability That a Measurement Falls within a Range of n Standard Deviations from an Estimate of the Mean

Models of Asset Pricing

Models of Asset Pricing

Discriminating Between The Log-normal and Gamma Distributions

Models of Asset Pricing

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 2

Solution to Tutorial 6

Math 312, Intro. to Real Analysis: Homework #4 Solutions

AY Term 2 Mock Examination

1 Estimating sensitivities

The Likelihood Ratio Test

The Idea of a Confidence Interval

Overlapping Generations

Faculdade de Economia da Universidade de Coimbra

Rafa l Kulik and Marc Raimondo. University of Ottawa and University of Sydney. Supplementary material

Limits of sequences. Contents 1. Introduction 2 2. Some notation for sequences The behaviour of infinite sequences 3

A New Constructive Proof of Graham's Theorem and More New Classes of Functionally Complete Functions

APPLICATION OF GEOMETRIC SEQUENCES AND SERIES: COMPOUND INTEREST AND ANNUITIES


I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint).

Control Charts for Mean under Shrinkage Technique

Appendix 1 to Chapter 5

These characteristics are expressed in terms of statistical properties which are estimated from the sample data.

CAPITAL ASSET PRICING MODEL

of Asset Pricing R e = expected return

Point Estimation by MLE Lesson 5

Chpt 5. Discrete Probability Distributions. 5-3 Mean, Variance, Standard Deviation, and Expectation

Transcription:

Parametric Desity stimatio: Maimum Likelihood stimatio C6

Today Itroductio to desity estimatio Maimum Likelihood stimatio

Itroducto Bayesia Decisio Theory i previous lectures tells us how to desig a optimal classifier if we kew: P(c i ) (priors) P( c i ) (class-coditioal desities) Ufortuately, we rarely have this complete iformatio!

Probability desity methods Parametric methods assume we kow the shape of the distributio, but ot the parameters. Two types of parameter estimatio: Maimum Likelihood stimatio Bayesia stimatio No parametric methods the form of the desity is etirely determied by the data without ay model.

Idepedece Across Classes We have traiig data for each class salmo sea bass salmo salmo sea bass sea bass Whe estimatig parameters for oe class, will oly use the data collected for that class reasoable assumptio that data from class c i gives o iformatio about distributio of class c j estimate parameters for distributio of salmo from estimate parameters for distributio of bass from

Idepedece Across Classes For each class c i we have a proposed desity p i ( c i ) with ukow parameters θ i which we eed to estimate Sice we assumed idepedece of data across the classes, estimatio is a idetical procedure for all classes To simplify otatio, we drop sub-idees ad say that we eed to estimate parameters θ for desity p() the fact that we eed to do so for each class o the traiig data that came from that class is implied

Maimum Likelihood Parameter stimatio Parameters θ are ukow but fied (i.e. ot radom variables). Give the traiig data, choose the parameter value θ that makes the data most probable (i.e., maimizes the probability of obtaiig the sample that has actually bee observed)

Maimum Likelihood Parameter stimatio We have desity p() which is completely specified by parameters θ [θ,, θ k ] If p() is N(, σ ) the θ [, σ ] To highlight that p() depeds o parameters θ we will write p( θ) Note overloaded otatio, p( θ) is ot a coditioal desity Let D{,,, } be the idepedet traiig samples i our data If p() is N(, σ ) the,,, are iid samples from N(, σ )

Maimum Likelihood Parameter stimatio Cosider the followig fuctio, which is called likelihood of θ with respect to the set of samples D k p( D θ ) p( θ ) k k F( θ ) Maimum likelihood estimate (abbreviated ML) of θ is the value of θ that maimizes the likelihood fuctio p(d θ) ˆ θ arg ma θ ( ) p( D θ )

ML Parameter stimatio vs. ML Classifier Recall ML classifier fied data decide class c i which maimizes p( c i ) Compare with ML parameter estimatio fied data choose θ that maimizes p(d θ) ML classifier ad ML parameter estimatio use the same priciples applied to differet problems

Maimum Likelihood stimatio (ML) Istead of maimizig p(d θ), it is usually easier to maimize l(p(d θ)) Sice log is mootoic ˆ θ arg ma θ arg ma θ ( ) p(d θ ) ( ) l p(d θ ) p(d θ) l(p(d θ)) To simplify otatio, l(p(d θ))l(θ) ˆ θ arg ma L k θ θ k θ k k ( θ ) arg ma l p( θ ) arg ma l p( θ ) k

ML: Maimizatio Methods Maimizig L(θ) ca be solved usig stadard methods from Calculus Let θ (θ, θ,, θ p ) t ad let θ be the gradiet operator θ θ, θ,..., Set of ecessary coditios for a optimum is: θ l 0 Also have to check that θ that satisfies the above coditio is maimum, ot miimum or saddle poit. Also check the boudary of rage of θ θ p t

ML ample: Gaussia with ukow Fortuately for us, most of the ML estimates of ay desities we would care about have bee computed Let s go through a eample ayway Let p( ) be N(,σ ) that is σ is kow, but is ukow ad eeds to be estimated, so θ ˆ arg ma L( ) arg ma l p( k ) k ( ) k arg ma l ep k πσ σ arg ma k l πσ ( ) k σ

ML ample: Gaussia with ukow arg ma ( L( )) arg ma k l πσ ( ) k σ d d σ ( L( )) ( ) 0 k k ˆ k k 0 k k Thus the ML estimate of the mea is just the average value of the traiig data, very ituitive! average of the traiig data would be our guess for the mea eve if we did t kow about ML estimates

ML for Gaussia with ukow, σ Similarly it ca be show that if p(,σ ) is N(, σ ), that is both mea ad variace are ukow, the agai very ituitive result ˆ σˆ ( ) k ˆ k k Similarly it ca be show that if p(,σ) is N(, Σ), that is is a multivariate Gaussia with both mea ad covariace matri ukow, the ˆ k ˆ ( )( ) Σ k ˆ k ˆ k k k t

How to Measure Performace of ML? How good is a ML estimate? or actually ay other estimate of a parameter? θ ˆ θ The atural measure of error would be But θ ˆ θ is radom, we caot compute it before we carry out eperimets We wat to say somethig meaigful about our estimate as a fuctio of θ A way to solve this difficulty is to average the error, i.e. compute the mea absolute error θˆ (,,..., ) [ θ ˆ θ ] θ ˆ θ p dd... d

How to Measure Performace of ML?s It is usually much easier to compute a almost equivalet measure of performace, the mea squared error: ( θ θˆ ) Do a little algebra, ad use Var(X)(X )-((X)) ( θ ˆ θ ) Var( ˆ θ ) + ( ( ˆ θ ) θ ) variace estimator should have low variace bias epectatio should be close to the true θ

How to Measure Performace of ML? ( θ ˆ θ ) Var( ˆ θ ) + ( ( ˆ θ ) θ ) variace bias ideal case bad case bad case p( θˆ ) p( θˆ ) p( θˆ ) ( ˆ θ ) θ θ ( θˆ ) ( ˆ θ ) θ o bias low variace large bias low variace o bias high variace

Let s compute the bias for ML estimate of the mea Bias ad Variace for ML of the Mea How about variace of ML estimate of the mea? [ ] k k ˆ Thus this estimate is ubiased! ( ) [ ] [ ] i j j i i j j i i i i i ) )( ( ) )( ( ) ( ˆ σ σ [ ] k k k Thus variace is very small for a large umber of samples (the more samples, the smaller is variace) Thus the ML of the mea is a very good estimator

Bias ad Variace for ML of the Mea Suppose someoe claims they have a ew great estimator for the mea, just take the first sample! ˆ Thus this estimator is ubiased: However its variace is: [( ) ] ˆ ( ) [ ] σ ( ) ( ) ˆ p( θˆ ) Thus variace ca be very large ad does ot improve as we icrease the umber of samples ( ˆ θ ) θ o bias high variace

ML Bias for Mea ad Variace How about ML estimate for the variace? k k [ ] ˆ σ ( ˆ ) σ σ Thus this estimate is biased! This is because we used ˆ istead of true Bias 0 as ifiity, asymptotically ubiased Ubiased estimate σˆ k ( k ˆ ) Variace of ML of variace ca be show to go to 0 as goes to ifiity

ML for Uiform distributio U[0,θ] X is U[0,θ ] if its desity is /θ iside [0,θ] ad 0 otherwise (uiform distributio o [0,θ] ) p ( θ ) F ( θ ) θ 3 θ 3 θ The likelihood is F ( θ ) k θ 0 if θ ma{,..., } p( ) k θ k if θ < ma{,..., } Thus k ˆ θ arg ma θ k p ( k θ ) ma{,..., This is ot very pleasig sice for sure θ should be larger tha ay observed! }