EE641 Digital Image Processing II: Purdue University VISE - October 29,

Similar documents
Chapter 4: Asymptotic Properties of MLE (Part 3)

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Generating Random Numbers

Binomial Random Variables. Binomial Random Variables

IEOR 165 Lecture 1 Probability Review

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections

Lecture 10: Point Estimation

Lecture Stat 302 Introduction to Probability - Slides 15

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

MA : Introductory Probability

Unobserved Heterogeneity Revisited

Chapter 8: Sampling distributions of estimators Sections

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Learning From Data: MLE. Maximum Likelihood Estimators

MVE051/MSG Lecture 7

CS340 Machine learning Bayesian model selection

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Notes on the EM Algorithm Michael Collins, September 24th 2005

The Bernoulli distribution

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Dynamic Portfolio Choice II

PROBABILITY AND STATISTICS

Exercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

CSC 411: Lecture 08: Generative Models for Classification

Chapter 7: Estimation Sections

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

MATH 3200 Exam 3 Dr. Syring

Chapter 5. Statistical inference for Parametric Models

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

Chapter 7. Sampling Distributions and the Central Limit Theorem

The method of Maximum Likelihood.

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Back to estimators...

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Discrete Random Variables and Probability Distributions

Monte Carlo Methods in Option Pricing. UiO-STK4510 Autumn 2015

Chapter 8: Sampling distributions of estimators Sections

Statistical estimation

IEOR E4703: Monte-Carlo Simulation

Some Discrete Distribution Families

Chapter 6: Point Estimation

Modelling financial data with stochastic processes

Financial Risk Management

Central Limit Theorem (CLT) RLS

ECON 6022B Problem Set 2 Suggested Solutions Fall 2011

Random Variables Handout. Xavier Vilà

Estimation after Model Selection

12. Conditional heteroscedastic models (ARCH) MA6622, Ernesto Mordecki, CityU, HK, 2006.

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Chapter 7 - Lecture 1 General concepts and criteria

Chapter 7: Point Estimation and Sampling Distributions

Chapter 7. Sampling Distributions and the Central Limit Theorem

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model

Commonly Used Distributions

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

STAT/MATH 395 PROBABILITY II

Problem Set: Contract Theory

Random Variables and Probability Functions

Statistics 6 th Edition

6. Genetics examples: Hardy-Weinberg Equilibrium

Applied Statistics I

Chapter 3 Discrete Random Variables and Probability Distributions

Martingales. by D. Cox December 2, 2009

Chapter 8. Introduction to Statistical Inference

STAT 111 Recitation 3

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Econometric Methods for Valuation Analysis

Homework Assignments

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

STAT 111 Recitation 4

Problem Set: Contract Theory

Chapter 5: Statistical Inference (in General)

Computer Vision Group Prof. Daniel Cremers. 7. Sequential Data

ECSE B Assignment 5 Solutions Fall (a) Using whichever of the Markov or the Chebyshev inequalities is applicable, estimate

Elementary Statistics Lecture 5

The Vasicek Distribution

Course information FN3142 Quantitative finance

Multi-armed bandit problems

Multi-armed bandits in dynamic pricing

Drunken Birds, Brownian Motion, and Other Random Fun

EE266 Homework 5 Solutions

Point Estimation. Copyright Cengage Learning. All rights reserved.

(a) Is it possible for the rate of exit from OLF into E tobethesameastherateof exit from U into E? Why or why not?

From Discrete Time to Continuous Time Modeling

Arbitrages and pricing of stock options

GPD-POT and GEV block maxima

Non-informative Priors Multiparameter Models

MidTerm 1) Find the following (round off to one decimal place):

Probability Distributions: Discrete

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Conjugate Models. Patrick Lam

The Normal Distribution

Transcription:

EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 The EM Algorithm. Suffient Statistics and Exponential Distributions Let p(y θ) be a family of density functions parameterized by θ Ω, and let Y be a random object with a density function from this family. Definition: A statistic is any function T (Y ) of the data Y. Definition: We say that a statistic T (Y ) is a sufficient statistic for θ if there exist functions g(, ) and h( ) such that for all y IR and θ Ω. p(y θ) h(y) g(t (y), θ) () If T (Y ) is a sufficient statistic for θ where θ parameterizes the distribution for Y, then the ML estimator of θ must be a function of T (Y ). To see this, notice that ˆθ ML arg max p(y θ) arg max arg max arg max f(t (y)) log p(y θ) log h(y) + log g(t (y), θ)} log g(t (y), θ) for some function f( ). Example : Let Y n } be i.i.d. random variables with distribution (µ, ). Define the following statistic corresponding to the sample mean of the random variables. t y n By writing the density function for the sequence Y as p(y µ) } π (y n µ)

EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 π π π π yn (y n µ) (y n y n µ + µ ) (t µ µ ) y n + t / } } (t / µ), where we can see that it has the form of equation (). Therefore, t is a sufficient statistic for the parameter µ. Computing the ML estimate yeilds the following. ˆµ ML arg max µ arg max µ log p(y µ) } (t / µ) arg min µ (t / µ) t Many commonly used distributions such as Gaussian, onential, Poisson, Bernoulli, and binomial have a structure which maes them particularly useful. These distributions are nown as onential families and have the following special property. Definition: A family of density functions p(y θ) for y R and θ Ω is said to be a -parameter onential family if there exist functions g(θ) IR, s(y), d(θ) and statistic T (y) IR such that p(y θ) < g(θ), T (y) > +d(θ) + s(y)} () for all y IR and θ Ω where <, > denotes the inner product. We refer to T (y) as the natural sufficient statistic or natural statistic for the onential distribution. Example : Let Y n } be i.i.d. random variables with distribution (µ, σ ). Define the following statistics corresponding to the sample mean

EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 3 and variance of the random variables. p(y µ, σ ) t t y n yn Then we may write the density function for Y in the following form. } πσ ( πσ ) σ (y n µ) ( πσ ) σ σ (y n µ) (y n y n µ + µ ) ( πσ ) σ t + µ σ t } σ µ σ t + µ [ µ σ, σ Using the following definitions g(θ) T (y) σ t ] t t [ µ σ, σ t t d(θ) σ µ log(πσ ) s(y) 0 ] } σ µ log(πσ ) σ µ log(πσ ) we can see that p(y µ, σ ) has the form of equation () with suffient statistic T (y). With some calculations it may be easily shown that the ML estimates of µ and σ are given by ˆµ ML t ˆσ ML t ( t )

EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 4. General Formulation of EM Update One reason that the EM algorithm is so useful is that for many practical situations the distributions are onential, and in this case the EM updates have a particularly simple form. Let Y is the observed or incomplete data and let X be the unobserved data, and assume that the joint density of (Y, X) is from and onential family with parameter vector θ. Then we now that p(y, x θ) < g(θ), T (y, x) > +d(θ) + s(y, x)} for some sufficient statistic T (y, x). Assuming the ML estimate of θ exists, then it is given by θ ML arg max < g(θ), T (y, x) > +d(θ)} (3) f(t (y, x)) (4) where f( ) is some function of the dimensional suffient statistic for the onential density. Recalling the form of the Q function, we have Q(θ, θ) E [log p(y, X θ ) Y y, θ] where Y is the observed data and X is the unnown data. Since our objective is to maximize Q with respect to θ, we only need to now the function Q within a constant that is not dependent on θ. Therefore, we have Q(θ, θ) E [log p(y, X θ ) Y y, θ] E [< g(θ ), T (y, X) > +d(θ ) + s(y, X) Y y, θ] < g(θ ), T > +d(θ ) + constant were T E [T (y, X) Y y, θ] is the conditional ectation of the sufficient statistic T (y, x). update of the EM algorithm is then given by the recursion A single θ arg max θ Ω Q(θ, θ) (5) arg max θ Ω < g(θ ), T > +d(θ ) f( T )

EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 5 Intuitively, we see that the EM update has the same form as the computation of the ML estimate, but with the ected value of the statistic replacing the actual statistic. Example 3: Let X n } be i.i.d. random variables with P X n 0} π 0 and P X n } π π 0. Let Y n } be conditionally i.i.d random variables given X, and let the conditional distribution of Y n given X n be Gaussian (µ Xn, σx ) where µ 0, µ, σ 0, and σ are parameters of the distribution. Then the complete set of parameters for the density of (Y, X) are given by θ [µ 0, µ, σ 0, σ, π 0 ]. Define the statistics t, t, δ(x n ) y n δ(x n ) y nδ(x n ) where 0, } and δ( ) is a Kronier delta function. We now that if both Y and X are nown then the ML estimates are given by ˆµ t, (6) ˆσ t ( ), t, (7) ˆπ. (8) We can ress the density function for p(y x, θ) by starting with the ressions derived in example for each of the two classes corresponding to X n 0 and X n. p(y x, θ) 0 0 µ σ µ σ, σ t, t,, σ, µ σ µ log(πσ ), σ t, t, log(πσ )

EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 6 0 0 µ σ, σ, µ σ µ σ, σ, µ σ log(πσ ) log(πσ ),, t, t, t, t, The distribution for X also has onential form with p(x θ) π 0 0 π 0 log π This yields to joint density for (Y, X) with the following form. p(y, x θ) p(y x, θ)p(x θ) 0 µ σ, σ, µ σ log(πσ ) + log π, Therefore, we can see that (Y, X) have an onential density function. Using the result of equation (5), we now that the EM update must have the form of equations (6), (7), and (8). Where the statistic T (Y, X) is replace with its conditional ectation. t, t, ˆµ t, (9) ˆσ t, t, (0) where t, ˆπ P X n Y y, θ} y n P X n Y y, θ} ()

EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 7 t, y np X n Y y, θ}