Introduction to Sequential Monte Carlo Methods

Similar documents
On Solving Integral Equations using. Markov Chain Monte Carlo Methods

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

15 : Approximate Inference: Monte Carlo Methods

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Analysis of the Bitcoin Exchange Using Particle MCMC Methods

A Macro-Finance Model of the Term Structure: the Case for a Quadratic Yield Model

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Chapter 7: Estimation Sections

Identifying Long-Run Risks: A Bayesian Mixed-Frequency Approach

Multi-period Portfolio Choice and Bayesian Dynamic Models

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

ELEMENTS OF MONTE CARLO SIMULATION

Equity correlations implied by index options: estimation and model uncertainty analysis

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Evaluating structural models for the U.S. short rate using EMM and optimal filters

GPD-POT and GEV block maxima

Exact Sampling of Jump-Diffusion Processes

Sequential Monte Carlo Samplers

Idiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective

Much of what appears here comes from ideas presented in the book:

Inference of the Structural Credit Risk Model

Model Estimation. Liuren Wu. Fall, Zicklin School of Business, Baruch College. Liuren Wu Model Estimation Option Pricing, Fall, / 16

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

American Option Pricing: A Simulated Approach

Top-down particle filtering for Bayesian decision trees

Estimating the Greeks

IEOR E4703: Monte-Carlo Simulation

Chapter 2 Uncertainty Analysis and Sampling Techniques

1 Explaining Labor Market Volatility

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Homework Assignments

Week 1 Quantitative Analysis of Financial Markets Distributions B

Exam M Fall 2005 PRELIMINARY ANSWER KEY

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods

BROWNIAN MOTION Antonella Basso, Martina Nardon

Stratified Sampling in Monte Carlo Simulation: Motivation, Design, and Sampling Error

An Improved Skewness Measure

Math 416/516: Stochastic Simulation

Week 7 Quantitative Analysis of Financial Markets Simulation Methods

Final exam solutions

Calculating VaR. There are several approaches for calculating the Value at Risk figure. The most popular are the

Particle methods and the pricing of American options

CPSC 540: Machine Learning

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

CPSC 540: Machine Learning

2 f. f t S 2. Delta measures the sensitivityof the portfolio value to changes in the price of the underlying

Monte Carlo Methods in Structuring and Derivatives Pricing

Without Replacement Sampling for Particle Methods on Finite State Spaces. May 6, 2017

EE266 Homework 5 Solutions

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

Statistical Inference and Methods

Adaptive Experiments for Policy Choice. March 8, 2019

Estimation after Model Selection

Financial Time Series Volatility Analysis Using Gaussian Process State-Space Models

Optimum Thresholding for Semimartingales with Lévy Jumps under the mean-square error

Relevant parameter changes in structural break models

Likelihood-based Optimization of Threat Operation Timeline Estimation

Introduction to Algorithmic Trading Strategies Lecture 8

Estimation of dynamic term structure models

Gamma. The finite-difference formula for gamma is

Financial Mathematics and Supercomputing

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

High Dimensional Bayesian Optimisation and Bandits via Additive Models

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

Volume 37, Issue 2. Handling Endogeneity in Stochastic Frontier Analysis

Homework 1 posted, due Friday, September 30, 2 PM. Independence of random variables: We say that a collection of random variables

Business Statistics 41000: Probability 3

Asymptotic Methods in Financial Mathematics

The Values of Information and Solution in Stochastic Programming

Estimating a Dynamic Oligopolistic Game with Serially Correlated Unobserved Production Costs. SS223B-Empirical IO

Strategies for Improving the Efficiency of Monte-Carlo Methods

Stochastic Grid Bundling Method

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Approximate Revenue Maximization with Multiple Items

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Lecture Notes 1

Chapter 8: Sampling distributions of estimators Sections

Chapter 7: Estimation Sections

Machine Learning for Quantitative Finance

Semiparametric Modeling, Penalized Splines, and Mixed Models

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

IEOR E4703: Monte-Carlo Simulation

Asymptotic Theory for Renewal Based High-Frequency Volatility Estimation

Dependence Structure and Extreme Comovements in International Equity and Bond Markets

Week 1 Quantitative Analysis of Financial Markets Basic Statistics A

The Binomial Lattice Model for Stocks: Introduction to Option Pricing

(5) Multi-parameter models - Summarizing the posterior

Estimating Mixed Logit Models with Large Choice Sets. Roger H. von Haefen, NC State & NBER Adam Domanski, NOAA July 2013

STA 532: Theory of Statistical Inference

Implementing Models in Quantitative Finance: Methods and Cases

Lecture 22: Dynamic Filtering

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Quarterly Storage Model of U.S. Cotton Market: Estimation of the Basis under Rational Expectations. Oleksiy Tokovenko 1 Lewell F.

Forward Monte-Carlo Scheme for PDEs: Multi-Type Marked Branching Diffusions

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Calibration of Interest Rates

Risk Estimation via Regression

Transcription:

Introduction to Sequential Monte Carlo Methods Arnaud Doucet NCSU, October 2008 Arnaud Doucet () Introduction to SMC NCSU, October 2008 1 / 36

Preliminary Remarks Sequential Monte Carlo (SMC) are a set of methods allowing us to approximate virtually any sequence of probability distributions. SMC are very popular in physics where they are used to compute eigenvalues of positive operators, the solution of PDEs/integral equations or simulate polymers. We focus here on Applications of SMC to Hidden Markov Models (HMM) for pedagogical reasons... In the HMM framework, SMC are also widely known as Particle Filtering/Smoothing methods. Arnaud Doucet () Introduction to SMC NCSU, October 2008 2 / 36

Markov Models We model the stochastic processes of interest as a discrete-time Markov process fx k g k1. fx k g k1 is characterized by its initial density and its transition density X 1 µ () X k j (X k 1 = x k 1 ) f ( j x k 1 ). We introduce the notation x i :j = (x i, x i+1,..., x j ) for i j. We have by de nition p (x ) = p (x 1 ) = µ (x 1 ) n k=2 n k=2 p (x k j x 1:k 1 ) f (x k j x k 1 ) Arnaud Doucet () Introduction to SMC NCSU, October 2008 3 / 36

Observation Model We do not observe fx k g k1 ; the process is hidden. We only have access to another related process fy k g k1. We assume that, conditional on fx k g k1, the observations fy k g k1 are independent and marginally distributed according to Formally this means that Y k j (X k = x k ) g ( j x k ). p (y j x ) = n g (y k j x k ). k=1 Arnaud Doucet () Introduction to SMC NCSU, October 2008 4 / 36

Figure: Graphical model representation of HMM Arnaud Doucet () Introduction to SMC NCSU, October 2008 5 / 36

Tracking Example Assume you want to track a target in the XY plane then you can consider the 4-dimensional state X k = (X k,1, V k,1, X k,2, V k,2 ) T The so-called constant velocity model states that i.i.d. X k = AX k 1 + W k, W k N (0, Σ), ACV 0 1 T A =, A 0 A CV =, CV 0 1 Σ = σ 2 ΣCV 0 T, Σ 0 Σ CV = 3 /3 T 2 /2 CV T 2 /2 T We obtain that f (x k j x k 1 ) = N (x k ; Ax k 1, Σ). Arnaud Doucet () Introduction to SMC NCSU, October 2008 6 / 36

Tracking Example (cont.) The observation equation is dependent on the sensor. Simple case so Y k = CX k + DE k, E k i.i.d. N (0, Σ e ) g (y k j x k ) = N (y k ; Cx k, Σ e ). Complex realistic case (Bearings-only-tracking) Y k = tan 1 Xk,2 i.i.d. + E k, E k N X k,1 so g (y k j x k ) = N 0, σ 2 y k ; tan 1 xk,2, σ 2. x k,1 Arnaud Doucet () Introduction to SMC NCSU, October 2008 7 / 36

Stochastic Volatility We have the following standard model X k = φx k 1 + V k, V k i.i.d. N 0, σ 2 so that We observe f (x k j x k 1 ) = N x k ; φx k 1, σ 2. Y k = β exp (X k /2) W k, W k i.i.d. N (0, 1) so that g (y k j x k ) = N y k ; 0, β 2 exp (x k ). Arnaud Doucet () Introduction to SMC NCSU, October 2008 8 / 36

Inference in HMM Given a realization of the observations Y = y, we are interested in inferring the states X. We are in a Bayesian framework where Prior: p (x ) = µ (x 1 ) n k=2 Likelihood: p (y j x ) = Using Bayes rule, we obtain f (x k j x k 1 ), n g (y k j x k ) k=1 p (x j y ) = p (y j x ) p (x ) p (y ) where the marginal likelihood is given by Z p (y ) = p (y j x ) p (x ) dx. Arnaud Doucet () Introduction to SMC NCSU, October 2008 9 / 36

Sequential Inference in HMM In particular, we will focus here on the sequential estimation of p (x j y ) and p (y ); that is at each time n we want update our knowledge of the hidden process in light of y n. There is a simple recursion relating p (x 1 j y 1 ) to p (x j y ) given by p (x j y ) = p (x 1 j y 1 ) f (x nj x n 1 ) g (y n j x n ) p (y n j y 1 ) where Z p (y n j y 1 ) = g (y n j x n ) f (x n j x n 1 ) p (x n 1 j y 1 ) dx n. We will also simply write p (x j y ) p (x 1 j y 1 ) f (x n j x n 1 ) g (y n j x n ). Arnaud Doucet () Introduction to SMC NCSU, October 2008 10 / 36

In many papers/books in the literature, you will nd the following two-step prediction-updating recursion for the marginals so-called ltering distributions p (x n j y ) which is a direct consequence. Prediction Step p (x n j y 1 ) = Updating Step = = Z Z Z p (x n j y 1 ) dx n 1 p (x n j x n 1, y 1 ) p (x n 1 j y 1 ) dx n 1 f (x n j x n 1 ) p (x n 1 j y 1 ) dx n 1. p (x n j y ) = g (y nj x n ) p (x n j y 1 ) p (y n j y 1 ) Arnaud Doucet () Introduction to SMC NCSU, October 2008 11 / 36

(Marginal) Likelihood Evaluation We have seen that Z p (y ) = p (y j x ) p (x ) dx. We also have the following decomposition where p (y k j y 1:k 1 ) = p (y ) = p (y 1 ) = = Z Z Z n k=2 p (y k, x k j y 1:k p (y k j y 1:k 1 ) 1 ) dx k g (y k j x k ) p (x k j y 1:k 1 ) dx k g (y k j x k ) f (x k j x k 1 ) p (x k 1 j y 1:k 1 ) dx k 1 We have broken" an high dimensional integral into the product of lower dimensional integrals. Arnaud Doucet () Introduction to SMC NCSU, October 2008 12 / 36

Closed-form Inference in HMM We have closed-form solutions for Finite state-space HMM; i.e. E = fe 1,..., e p g as all integrals are becoming nite sums Linear Gaussian models; all the posterior distributions are Gaussian; e.g. the celebrated Kalman lter. A whole reverse engineering literature exists for closed-form solutions in alternative cases... In many cases of interest, it is impossible to compute the solution in closed-form and we need approximations, Arnaud Doucet () Introduction to SMC NCSU, October 2008 13 / 36

Standard Approximations for Filtering Distributions Gaussian approximations: Extended Kalman lter, Unscented Kalman lter. Gaussian sum approximations. Projection lters, Variational approximations. Simple discretization of the state-space. Analytical methods work in simple cases but are not reliable and it is di cult to diagnose when they fail. Standard discretization of the space is expensive and di cult to implement in high-dimensional scenarios. Arnaud Doucet () Introduction to SMC NCSU, October 2008 14 / 36

Breakthrough At the beginning of the 90 s, the optimal ltering area was considered virtually dead; there had not been any signi cant progress for years then... Gordon, N.J. Salmond, D.J. Smith, A.F.M. "Novel approach to nonlinear/non-gaussian Bayesian state estimation", IEE Proceedings F: Radar and Signal Processing, vol. 140, no. 2, pp. 107-113, 1993. This article introduces a simple method which relies neither on a functional approximation nor a deterministic grid. This paper was ignored by most researchers for a few years... Arnaud Doucet () Introduction to SMC NCSU, October 2008 15 / 36

Monte Carlo Sampling. Importance Sampling. Sequential Importance Sampling. Sequential Importance Sampling with Resampling. Arnaud Doucet () Introduction to SMC NCSU, October 2008 16 / 36

Monte Carlo Sampling Assume for the time being that you are interested in estimating the high-dimensional probability density p (x j y ) = p (x, y ) p (y ) p (x, y ) where n is xed. A Monte Carlo approximation consists of sampling a large number N of i.i.d. random variables X (i) i.i.d. p (x j y ) and build the following approximation bp (x j y ) = 1 N N δ (i) X (x ) i=1 where δ a (x ) is the delta-dirac mass which is such that Z 1 if a 2 A E δ a (x ) dx = n, 0 otherwise. A Arnaud Doucet () Introduction to SMC NCSU, October 2008 17 / 36

Issues with Standard Monte Carlo Sampling There are standard methods to sample from classical distributions such as Beta, Gamma, Normal, Poisson etc. We will not detail them here although we will rely on them. Problem 1: For most problems of interest, we cannot sample from p (x j y ). Problem 2: Even if we could sample exactly from p (x j y ), then the computational complexity of the algorithm would most likely increase with n: we want here an algorithm of xed computational complexity at each time step. To summarize, we cannot use standard MC sampling in our case and, even if we could, this would not solve our problem... Arnaud Doucet () Introduction to SMC NCSU, October 2008 18 / 36

Importance Sampling Importance Sampling (IS). We have p (x j y ) = p (y j x ) p (x ), p (y ) Z p (y ) = p (y j x ) p (x ) dx Generally speaking, we have for a so-called importance distribution q (x j y ) such that selected such that p (x j y ) > 0 ) q (x j y ) > 0 p (x j y ) = w (x, y ) q (x j y ), p (y ) Z p (y ) = w (x, y ) q (x j y ) dx where the unnormalized importance weight is w (x, y ) = p (x, y ) q (x j y ) p (x j y ) q (x j y ). Arnaud Doucet () Introduction to SMC NCSU, October 2008 19 / 36

Monte Carlo IS Estimates It is easy to sample from p (x ) thus we can build the standard MC approximation bp (x j y ) = 1 N N δ (i) X (x ) where X (i) i.i.d. p (x ). i=1 We plug these approximations in the IS identities to obtain Z p (y ) = p (y j x ) p (x ) dx, ) bp (y ) = 1 N N p y j X (i). i=1 bp (y ) is an unbiased estimate of p (y ) with variance Z 1 p 2 (y j x ) p (x ) dx 1. N Arnaud Doucet () Introduction to SMC NCSU, October 2008 20 / 36

We also get an approximation of the posterior using p (x j y ) = bp (x j y ) = = = p (y j x ) p (x ) R p (y j x ) p (x ) dx p (y j x ) bp (x ) R p (y j x ) bp (x ) dx 1 N N i=1 p y j X (i) δ (i) X (x ) N i=1 1 N N i=1 p W n (i) δ (i) X (x ) y j X (i) where the normalized importance weights are W n (i) = p y j X (i). N j=1 p y j X (j) Arnaud Doucet () Introduction to SMC NCSU, October 2008 21 / 36

Assume we are interested in computing E p( x jy )(ϕ), then we can use the estimate E bp( x jy )(ϕ) = W n (i) ϕ. N i=1 X (i) This estimate is biased for a nite N but is asymptotically consistent with lim N E bp( x jy N! )(ϕ) E p( x jy )(ϕ) Z p = 2 (x j y ) ϕ (x ) E p (x ) p( x jy )(ϕ) dx and ) N p N E bp( x jy )(ϕ) 0, MSE = bias 2 {z } O (N 2 ) Z p 2 (x j y ) p (x ) E p( x jy )(ϕ) ϕ (x ) 2 E p( x jy )(ϕ) dx. + variance {z } so asymptotic bias is irrelevant. O (N 1 ) Arnaud Doucet () Introduction to SMC NCSU, October 2008 22 / 36

Summary of Our Progresses Problem 1: For most problems of interest, we cannot sample from p (x j y ). Problem 1 solved : We use an IS approximation of p (x j y ) that relies on the IS prior distribution p (x ). Problem 2: Even if we could sample exactly from p (x j y ), then the computational complexity of the algorithm would most likely increase with n: we want here an algorithm of xed computational complexity at each time step. Problem 2 not solved yet: If at each time step n, we need to obtain new samples from p (x ) then the algorithm computational complexity will increase at each time step. Arnaud Doucet () Introduction to SMC NCSU, October 2008 23 / 36

Sequential Importance Sampling (SIS) To avoid having computational e orts increasing over time, we use the fact that p (x ) {z } IS at time n = p (x 1 ) f (x n j x n 1 ) {z } {z } IS at time n 1 New sampled component = µ (x 1 ) n k=2 f (x k j x k 1 ). In practical terms, this means that at time n 1, we have already sampled X (i) 1 p (x 1) and that to obtain at time n samples/particles X (i) p (x ), we just need to sample X n (i) X (i) n 1 f x n j X (i) n 1 and set = ( X (i) 1 {z }, X n (i) {z} previously sampled paths X (i) new sampled component Arnaud Doucet () Introduction to SMC NCSU, October 2008 24 / 36 )

Now, whatever being n, we have only one component X n to sample! However, can we compute our IS estimates of p (y ) and the target p (x j y ) recursively? Remember that where W (i) n We have p bp (y ) = 1 N bp (x j y ) = N p y j X (i), i=1 N i=1 W n (i) δ (i) X (x ), y j X (i), N i=1 W n (i) = 1. p (y j x ) = p (y 1 j x 1 ) g (y n j x n ) Arnaud Doucet () Introduction to SMC NCSU, October 2008 25 / 36

Sequential Importance Sampling Algorithm At time 1, Sample N particles X (i) 1 µ (x 1 ) and compute W (i) 1 g y 1 j X (i) 1. At time n, n 2 Sample N particles X n (i) f x n j X (i) n 1 and compute W (i) n W (i) n 1.g y n j X n (i). Arnaud Doucet () Introduction to SMC NCSU, October 2008 26 / 36

Practical Issues The algorithm can be easily parallelized. The computational complexity does not increase over time. n o It is not necessary to store the paths if we are only interested X (i) in n approximating o p (x n j y ) as the weights only depends on X (i) n! Arnaud Doucet () Introduction to SMC NCSU, October 2008 27 / 36

Example of Applications Consider the following model X k = 0.5X k 1 + 25X k 1 1 + Xk 2 + 8 cos (1.2k) + V k 1 = ϕ (X k 1 ) + V k Y k = X 2 k 20 + W k, where X 1 N (0, 1), V k i.i.d. N 0, 2.5 2 and W k i.i.d. N (0, 1). Arnaud Doucet () Introduction to SMC NCSU, October 2008 28 / 36

60 Histogram of log(importance weights) 50 40 30 20 10 Figure: Histogram of log by one single particle. 0 65 60 55 50 45 40 35 30 25 20 15 p y 1:100 j X (i ) 1:100. The approximation is dominated Arnaud Doucet () Introduction to SMC NCSU, October 2008 29 / 36

Summary SIS is an attractive idea: sequential and parallelizable, reduces the design of an high-dimensional proposal to the design of a sequence of low-dimensional proposals. SIS can only work for moderate size problems. Is there a way to partially x this problem? Arnaud Doucet () Introduction to SMC NCSU, October 2008 30 / 36

Resampling n o Problem: As n increases, the variance of p y j X (i) increases and all the mass is concentrated on a few random samples/particles as W (i 0) n bp (x j y ) = N i=1 1 and W (i) n 0 for i 6= i 0. W n (i) δ (i) X (x ) δ (i X 0 ) (x ) Intuitive KEY idea: Kill in a principled way the particles with low weights W n (i) (relative to 1/N) and multiply the particles with high weights W n (i) (relative to 1/N). Rationale: If a particle at time n has a low weight then typically it will still have a low weight at time n + 1 (though I can easily give you a counterexample) and you want to focus your computational e orts on the promising parts of the space. Arnaud Doucet () Introduction to SMC NCSU, October 2008 31 / 36

At time n, IS provides the following approximation of p (x j y ) bp (x j y ) = N i=1 W n (i) δ (i) X (x ). The simplest resampling schemes consists of sampling N times ex (i) bp (x j y ) to build the new approximation ep (x j y ) = 1 N N δ X e (i) (x ). i=1 n o The new resampled particles X e (i) are approximately distributed according to p (x j y ) but statistically dependent Theoretically much more di cult to study. Arnaud Doucet () Introduction to SMC NCSU, October 2008 32 / 36

Sequential Importance Sampling Resampling Algorithm At time 1, Sample N particles X (i) 1 µ (x 1 ) and compute Resample W (i) 1 g y 1 j X (i) 1. n o n X (i) 1, W (i) 1 to obtain new particles also denoted At time n, n 2 Sample N particles X n (i) f x n j X (i) n 1 and compute Resample W (i) n g y n j X n (i). n o n X (i), W n (i) to obtain new particles also denoted X (i) 1 X (i) o. o. Arnaud Doucet () Introduction to SMC NCSU, October 2008 33 / 36

We also have Z p (y n j y 1 ) = g (y n j x n ) f (x n j x n 1 ) p (x n 1 j y 1 ) dx n so bp (y n j y 1 ) = 1 N N g y n j X n (i). i=1 Perhaps surprisingly, it can be shown that if we de ne bp (y ) = bp (y 1 ) n k=2 bp (y k j y 1:k 1 ) then E [bp (y )] = p (y ). Arnaud Doucet () Introduction to SMC NCSU, October 2008 34 / 36

Example (cont.) Consider again the following model X k = 0.5X k 1 + 25X k 1 1 + Xk 2 + 8 cos (1.2k) + V k 1 Y k = X 2 k 20 + W k, where X 1 N (0, 1), V k i.i.d. N 0, 2.5 2 and W k i.i.d. N (0, 1). Arnaud Doucet () Introduction to SMC NCSU, October 2008 35 / 36

Advanced SMC Methods I have presented the most basic algorithm. In practice, practitioners often select an IS distribution q (x n j y n, x n 1 ) 6= f (x n j x n 1 ). In such cases, we have W n (i) f X n (i) X (i) n 1 g y n j X n (i) q y n, X (i) X (i) n n 1 Better resampling steps have been developed. Variance reduction can also be developed. SMC methods can be used to sample from virtually any sequence of distributions. Arnaud Doucet () Introduction to SMC NCSU, October 2008 36 / 36