The EM algorithm for HMMs

Similar documents
Notes on the EM Algorithm Michael Collins, September 24th 2005

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Computer Vision Group Prof. Daniel Cremers. 7. Sequential Data

Exact Inference (9/30/13) 2 A brief review of Forward-Backward and EM for HMMs

Estimation of the Markov-switching GARCH model by a Monte Carlo EM algorithm

Chapter 7: Estimation Sections

Decision Theory: Value Iteration

Hidden Markov Models. Selecting model parameters or training

a 13 Notes on Hidden Markov Models Michael I. Jordan University of California at Berkeley Hidden Markov Models The model

BCJR Algorithm. Veterbi Algorithm (revisted) Consider covolutional encoder with. And information sequences of length h = 5

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Probability Distributions: Discrete

Lecture Notes: November 29, 2012 TIME AND UNCERTAINTY: FUTURES MARKETS

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Estimating Mixed Logit Models with Large Choice Sets. Roger H. von Haefen, NC State & NBER Adam Domanski, NOAA July 2013

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Notes on Syllabus Section VI: TIME AND UNCERTAINTY, FUTURES MARKETS

Handout 4: Deterministic Systems and the Shortest Path Problem

Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks

Hidden Markov Model for High Frequency Data

Reasoning with Uncertainty

Estimating a Dynamic Oligopolistic Game with Serially Correlated Unobserved Production Costs. SS223B-Empirical IO

4 Reinforcement Learning Basic Algorithms

6. Genetics examples: Hardy-Weinberg Equilibrium

Algorithms and Networking for Computer Games

Chapter 4: Asymptotic Properties of MLE (Part 3)

Asymptotic results discrete time martingales and stochastic algorithms

Unobserved Heterogeneity Revisited

CS340 Machine learning Bayesian model selection

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Statistical estimation

Non-Deterministic Search

Back to estimators...

Some Discrete Distribution Families

Stochastic Games and Bayesian Games

The change of correlation structure across industries: an analysis in the regime-switching framework

CS 188: Artificial Intelligence

Hidden Markov Models for Financial Market Predictions

Chapter 7: Estimation Sections

STAT 111 Recitation 2

Occasional Paper. Dynamic Methods for Analyzing Hedge-Fund Performance: A Note Using Texas Energy-Related Funds. Jiaqi Chen and Michael L.

Sequential Decision Making

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

IEOR E4004: Introduction to OR: Deterministic Models

Political Lobbying in a Recurring Environment

Modelling financial data with stochastic processes

Course information FN3142 Quantitative finance

To earn the extra credit, one of the following has to hold true. Please circle and sign.

Introduction to Political Economy Problem Set 3

Double Chain Ladder and Bornhutter-Ferguson

Sum-Product: Message Passing Belief Propagation

Sum-Product: Message Passing Belief Propagation

EE641 Digital Image Processing II: Purdue University VISE - October 29,

1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016

Bayesian course - problem set 3 (lecture 4)

Hidden Markov Models. Slides by Carl Kingsford. Based on Chapter 11 of Jones & Pevzner, An Introduction to Bioinformatics Algorithms

BMI/CS 776 Lecture #15: Multiple Alignment - ProbCons. Colin Dewey

For every job, the start time on machine j+1 is greater than or equal to the completion time on machine j.

Predicting Electricity Pool Prices Using Hidden Markov Models

CMPSCI 311: Introduction to Algorithms Second Midterm Practice Exam SOLUTIONS

Stochastic Games and Bayesian Games

Project exam for STK Computational statistics

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model

Capital Allocation Principles

A start of Variational Methods for ERGM Ranran Wang, UW

1 A tax on capital income in a neoclassical growth model

Arrow Debreu Equilibrium. October 31, 2015

6.825 Homework 3: Solutions

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Lecture 10: Point Estimation

Academic Research Review. Classifying Market Conditions Using Hidden Markov Model

From Discrete Time to Continuous Time Modeling

A Production-Based Model for the Term Structure

Inference in Bayesian Networks

Decision Theory: Sequential Decisions

CS 188: Artificial Intelligence. Outline

CSE 473: Artificial Intelligence

A note on the nested Logit model

STP Problem Set 3 Solutions

David A. Robalino (World Bank) Eduardo Zylberstajn (Fundacao Getulio Vargas, Brazil) Extended Abstract

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2016

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

Modelling, Estimation and Hedging of Longevity Risk

Exercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation

CSE 473: Ar+ficial Intelligence

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

On the Minimum Description Length Complexity of Multinomial Processing Tree Models

Betting Against Beta: A State-Space Approach

Markov Decision Processes

PhD Qualifier Examination

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Valuing American Options by Simulation

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008

Calibration of Interest Rates

2D penalized spline (continuous-by-continuous interaction)

SUPPLEMENT TO EQUILIBRIA IN HEALTH EXCHANGES: ADVERSE SELECTION VERSUS RECLASSIFICATION RISK (Econometrica, Vol. 83, No. 4, July 2015, )

Transcription:

The EM algorithm for HMMs Michael Collins February 22, 2012

Maximum-Likelihood Estimation for Fully Observed Data (Recap from earlier) We have fully observed data, x i,1... x i,m, s i,1... s i,m for i = 1... n. The likelihood function is n L(θ) = log p(x i,1... x i,m, s i,1... s i,m ; θ) i=1 Maximum-likelihood estimates of transition probabilities are n t(s i=1 s) = count(i, s s ) n i=1 s count(i, s s ) Maximum-likelihood estimates of emission probabilities are e(x s) = n i=1 count(i, s x) n i=1 x count(i, s x)

Maximum-Likelihood Estimation for Partially Observed Data We have partially observed data, x i,1... x i,m for i = 1... n. Note we do not have state sequences. The likelihood function is n L(θ) = log p(x i,1... x i,m, s 1... s m ; θ) s 1...s m i=1 We can maximize this function using EM... (the algorithm will converge to a local maximum of the likelihood function)

An Example Suppose we have an HMM with two states (k = 2) and 4 possible emissions (a, b, x, y) and our (partially observed) training data consists of the following counts of 4 different sequences (no other sequences are seen): a x (100 times) a y (100 times) b x (100 times) b y (100 times) What are the maximum-likelihood estimates for the HMM?

Forward and Backward Probabilities Define α[j, s] to be the sum of probabilities of all paths ending in state s at position j in the sequence, for j = 1... m and s {1... k}. More formally: α[j, s] = s 1...s j 1 [ t(s 1 )e(x 1 s 1 ) ( j 1 k=2 t(s k s k 1 )e(x k s k ) ) t(s s j 1 )e(x j s) Define β[j, s] for s {1... k} and j {1... (m 1)} to be the sum of probabilities of all paths starting with state s at position j and going to the end of the sequence. More formally: β[j, s] = s j+1...s m t(s j+1 s)e(x j+1 s j+1 ) m k=j+2 t(s k s k 1 )e(x k s k ) ]

Recursive Definitions of the Forward Probabilities Initialization: for s = 1... k α[1, s] = t(s)e(x 1 s) For j = 2... m: α[j, s] = (α[j 1, s ] t(s s ) e(x j s)) s {1...k}

Recursive Definitions of the Backward Probabilities Initialization: for s = 1... k β[m, s] = 1 For j = m 1... 1: β[j, s] = (β[j + 1, s ] t(s s) e(x j+1 s )) s {1...k}

The Forward-Backward Algorithm Given these definitions: p(x 1... x m, S j = s; θ) = s 1...s m:s j =s p(x 1... x m, s 1... s m ; θ) = α[j, s] β[j, s] Note: we ll assume the special definition that β[m, s] = 1 for all s

The Forward-Backward Algorithm Given these definitions: p(x 1... x m, S j = s, S j+1 = s ; θ) = s 1...s m:s j =s,s j+1 =s p(x 1... x m, s 1... s m ; θ) = α[j, s] t(s s) e(x j+1 s ) β[j + 1, s ] Note: we ll assume the special definition that β[m, s] = 1 for all s

Things we can Compute Using Forward-Backward Probabilities The probability of any sequence: p(x 1... x m ; θ) = = s s 1...s m p(x 1... x m, s 1... s m ; θ) α[m, s] The probability of any state transition: p(x 1... x m, S j = s, S j+1 = s ; θ) = p(x 1... x m, s 1... s m ; θ) s 1...s m:s j =s,s j+1 =s = α[j, s] t(s s) e(x j+1 s ) β[j + 1, s ]

Things we can Compute Using Forward-Backward Probabilities (continued) The conditional probability of any state transition: p(s j = s, S j+1 = s x 1... x m ; θ) = α[j, s] t(s s) e(x j+1 s ) β[j + 1, s ] α[m, s] s The conditional probability of any state at any position: p(s j = s x 1... x m ; θ) = α[j, s] β[j, s] α[m, s] s

Things we can Compute Using Forward-Backward Probabilities (continued) Define count(i, s s ; θ) to be the expected number of times the transition s s is seen in the training example x i,1, x i,2,..., x i,m, for parameters θ. Then count(i, s s ; θ) = m 1 j=1 p(s j = s, S j+1 = s x i,1... x i,m ; θ) (We can compute p(s j = s, S j+1 = s x i,1... x i,m ; θ) using the forward-backward probabilities, see previous slide)

Things we can Compute Using Forward-Backward Probabilities (continued) For completeness, a formal definition of count(i, s s ; θ): count(i, s s ; θ) = s 1...s m p(s 1... s m x i,1... x i,m ; θ)count(s s, s 1... s m ) where count(s s, s 1... s m ) is the number of times the transition s s is seen in the sequence s 1... s m

Things we can Compute Using Forward-Backward Probabilities (continued) Define count(i, s z; θ) to be the expected number of times the state s is paired with the emission z in the training sequence x i,1, x i,2,..., x i,m, for parameters θ. Then count(i, s z; θ) = m p(s j = s x i,1... x i,m ; θ)[[x i,j = z]] j=1 (We can compute p(s j = s x i,1... x i,m ; θ) using the forward-backward probabilities, see previous slides)

The EM Algorithm for HMMs Initialization: set initial parameters θ 0 to some value For t = 1... T : Use the forward-backward algorithm to compute all expected counts of the form count(i, s s ; θ t 1 ) or count(i, s z; θ t 1 ) Update the parameters based on the expected counts: n t t (s i=1 s) = count(i, s s ; θ t 1 ) n i=1 s count(i, s s ; θ t 1 ) n e t i=1 (x s) = count(i, s x; θt 1 ) n i=1 x count(i, s x; θt 1 )

The Initial State Probabilities For simplicity I ve omitted the estimates for the initial state parameters t(s), but these are simple to derive in a similar way to the transition and the emission parameters For completeness, the expected counts are: count(i, s; θ t 1 ) = α[1, s] β[1, s] α[m, s] s (the expected number of times state s is seen as the initial state) The parameter updates are then t t (s) = n i=1 count(i, s; θt 1 ) n