Exact Inference. Factor Graphs through Max-Sum Algorithm Figures from Bishop PRML Sec. 8.3/8.4. x 3. f s. x 2. x 1

Similar documents
Inference in Bayesian Networks

Factor Graphs. Seungjin Choi

Sum-Product: Message Passing Belief Propagation

Sum-Product: Message Passing Belief Propagation

Probabilistic Graphical Models

COMP90051 Statistical Machine Learning

Gibbs Fields: Inference and Relation to Bayes Networks

Global Joint Distribution Factorizes into Local Marginal Distributions on Tree-Structured Graphs

UGM Crash Course: Conditional Inference and Cutset Conditioning

Exact Inference (9/30/13) 2 A brief review of Forward-Backward and EM for HMMs

Modelli Grafici Probabilistici (2): concetti generali

Machine Learning

A start of Variational Methods for ERGM Ranran Wang, UW

Computer Vision Group Prof. Daniel Cremers. 7. Sequential Data

Machine Learning. Graphical Models. Marc Toussaint University of Stuttgart Summer 2015

More Advanced Single Machine Models. University at Buffalo IE661 Scheduling Theory 1

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

STAT/MATH 395 PROBABILITY II

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.

- 1 - **** d(lns) = (µ (1/2)σ 2 )dt + σdw t

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Portfolio Risk Management and Linear Factor Models

Bayes s Rule Example. defective. An MP3 player is selected at random and found to be defective. What is the probability it came from Factory I?

CS 294-2, Grouping and Recognition (Prof. Jitendra Malik) Aug 30, 1999 Lecture #3 (Maximum likelihood framework) DRAFT Notes by Joshua Levy ffl Maximu

CPSC 540: Machine Learning

Stratified Sampling in Monte Carlo Simulation: Motivation, Design, and Sampling Error

Is Greedy Coordinate Descent a Terrible Algorithm?

CPSC 540: Machine Learning

Machine Learning in Computer Vision Markov Random Fields Part II

15 : Approximate Inference: Monte Carlo Methods

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Exam M Fall 2005 PRELIMINARY ANSWER KEY

Dynamic Programming (DP) Massimo Paolucci University of Genova

The exam is closed book, closed calculator, and closed notes except your three crib sheets.

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

Direct Methods for linear systems Ax = b basic point: easy to solve triangular systems

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Random Tree Method. Monte Carlo Methods in Financial Engineering

2 all subsequent nodes. 252 all subsequent nodes. 401 all subsequent nodes. 398 all subsequent nodes. 330 all subsequent nodes

Introduction to Functions Section 2.1

CS 174: Combinatorics and Discrete Probability Fall Homework 5. Due: Thursday, October 4, 2012 by 9:30am

Lecture 1: Lucas Model and Asset Pricing

Sequential Decision Making

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods

Tolerance Intervals for Any Data (Nonparametric)

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

A relation on 132-avoiding permutation patterns

The Uniform Distribution

The Normal Distribution. (Ch 4.3)

Brownian Motion, the Gaussian Lévy Process

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

Asymptotic methods in risk management. Advances in Financial Mathematics

Importance Sampling. Sargur N. Srihari

Dynamic Portfolio Execution Detailed Proofs

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1

Particle methods and the pricing of American options

Another Variant of 3sat. 3sat. 3sat Is NP-Complete. The Proof (concluded)

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

1 Solutions to Tute09

Reinforcement Learning

Levin Reduction and Parsimonious Reductions

EE641 Digital Image Processing II: Purdue University VISE - October 29,

Notes on the EM Algorithm Michael Collins, September 24th 2005

Q1. [?? pts] Search Traces

The Multistep Binomial Model

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

Computational Independence

Interpolation. 1 What is interpolation? 2 Why are we interested in this?

Mengdi Wang. July 3rd, Laboratory for Information and Decision Systems, M.I.T.

A Theory of Loss-leaders: Making Money by Pricing Below Cost

Chapter 7: Estimation Sections

CS360 Homework 14 Solution

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

3. Continuous Probability Distributions

Unobserved Heterogeneity Revisited

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

MATH 142 Business Mathematics II

Lecture 2. (1) Permanent Income Hypothesis. (2) Precautionary Savings. Erick Sager. September 21, 2015

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Blockchain Economics

CS188 Spring 2012 Section 4: Games

Utility Indifference Pricing and Dynamic Programming Algorithm

6. Genetics examples: Hardy-Weinberg Equilibrium

Lecture Note Set 3 3 N-PERSON GAMES. IE675 Game Theory. Wayne F. Bialas 1 Monday, March 10, N-Person Games in Strategic Form

Ch 5. Several Numerical Methods

The stochastic discount factor and the CAPM

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Analysis of Variance and Design of Experiments-II

STOCHASTIC INTEGRALS

Top-down particle filtering for Bayesian decision trees

Techniques for Calculating the Efficient Frontier

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

Financial Economics 4: Portfolio Theory

QUANTITATIVE FINANCE RESEARCH CENTRE

COS 513: Gibbs Sampling

Modelling, Estimation and Hedging of Longevity Risk

Optimal Stopping for American Type Options

A Markovian Futures Market for Computing Power

2.1 Mathematical Basis: Risk-Neutral Pricing

Transcription:

Exact Inference x 1 x 3 x 2 f s Geoffrey Roeder roeder@cs.toronto.edu 8 February 2018 Factor Graphs through Max-Sum Algorithm Figures from Bishop PRML Sec. 8.3/8.4

Building Blocks UGMs, Cliques, Factor Graphs

Markov Random Fields / UGMs Parameterization: maximal cliques x 1 x 2 x 3 x 4

Example: Equivalent DGM and UGM x 1 x 2 x N 1 x N x 1 x 2 x N x N 1 p(x) =p(x 1 )p(x 2 x 1 )p(x 3 x 2 ) p(x N x N 1 ). vert this to an undirected graph representation, p(x) = 1 Z ψ 1,2(x 1,x 2 )ψ 2,3 (x 2,x 3 ) ψ N 1,N (x N 1,x N ) ψ 1,2 (x 1,x 2 ) = p(x 1 )p(x 2 x 1 ) ψ 2,3 (x 2,x 3 ) = p(x 3 x 2 ). ψ N 1,N (x N 1,x N ) = p(x N x N 1 )

Conversion: Moralization (Marry the Parents of Every Child) x 1 x 3 x 1 x 3 x 2 x 2 (a) x 4 (b) x 4 p(x) =p(x 1 )p(x 2 )p(x 3 )p(x 4 x 1,x 2,x 3 ).

DGMs and UGMs represent distinct distributions A B C F A B C D U P D

Motivations: Exact Inference in a Chain

Query probability of a configuration for node X_n: p(x_n) x 1 x 2 x N 1 x N x 1 x 2 x N x N 1 p(x) = 1 Z ψ 1,2(x 1,x 2 )ψ 2,3 (x 2,x 3 ) ψ N 1,N (x N 1,x N ). p(x n )= x 1 x n 1 p(x). x n+1 x N

Query probability of a configuration for node X_n: p(x_n) x 1 x 2 x N 1 x N x 1 x 2 x N x N 1 p(x) = 1 Z ψ 1,2(x 1,x 2 )ψ 2,3 (x 2,x 3 ) ψ N 1,N (x N 1,x N ). p(x n )= x 1 x n 1 p(x). x n+1 x N Naively: N variables, K states per variable: computation complexity? p(x n )= x 1 x n 1 p(x). x n+1 x N

Query probability of a configuration for node X_n: p(x_n) x 1 x 2 x N 1 x N x 1 x 2 x N x N 1 p(x) = 1 Z ψ 1,2(x 1,x 2 )ψ 2,3 (x 2,x 3 ) ψ N 1,N (x N 1,x N ). p(x n )= x 1 x n 1 p(x). x n+1 x N We ignored the conditional independence! Notice for x_n: summation ψ N 1,N (x N 1,x N ) x N

Be clever about order of computation: x 1 x 2 x N 1 x N p(x n )= 1 Z [ [ ]] ψ n 1,n (x n 1,x n ) ψ 2,3 (x 2,x 3 ) ψ 1,2 (x 1,x 2 ) xn 1 x 2 x 1 }{{} µ α (x n ) [ ] ψ n,n+1 (x n,x n+1 ) ψ N 1,N (x N 1,x N ). (8.52) xn+1 x N }{{} µ β (x n ) p(x n )= 1 Z µ α(x n )µ β (x n )

Be clever about order of computation: x 1 x 2 x N 1 x N µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N p(x n )= 1 Z µ α(x n )µ β (x n )

We get joint marginals over variables, too: x 1 x 2 x N 1 x N µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N p(x n 1,x n )= 1 Z µ α(x n 1 )ψ n 1,n (x n 1,x n )µ β (x n ). obtain the joint distributions over all of the sets of var

Factor Graph Review

x 1 x 2 x 3 f a f b f c f d p(x) = Y s f(x s ) p(x) =f a (x 1,x 2 )f b (x 1,x 2 )f c (x 2,x 3 )f d (x 3 )

x 1 x 2 x 1 x 2 f x 1 x 2 f c f a f b (a) x 3 (b) x 3 (c) x 3 (a) p(x 1 )p(x 2 )p(x 3 x 1,x 2 ) (b) f(x 1,x 2,x 3 )=p(x 1 )p(x 2 )p(x 3 x 1,x 2 ) (c) f a (x 1 )=p(x 1 ),f b (x 2 )=p(x 2 ),f c (x 1,x 2,x 3 )=p(x 3 x 1,x 2 )

x 1 x 2 x 1 x 2 x 1 x 2 f a f(x 1,x 2,x 3 ) f b f c (a) x 3 (b) x 3 (c) x 3

Sum-Product Algorithm Generalize Exact Inference in Chains to Tree-Structured PGMs

Problem setup: notation Fs(x, Xs) f s µ fs x(x) x p(x) = x\x p(x) = p(x) F s (x, X s ) s ne(x) ne(x): set of factor nodes that are neighbours of x X_s: set of all variables in the subtree connected to the variable node x via factor node f_s F_s(x, X_s): the product of all the factors in the group associated with factor f_s

Problem setup: notation Fs(x, Xs) f s µ fs x(x) x p(x) = = [ s ne(x) s ne(x) F s (x, X s ) X s µ fs x(x). ] ne(x): set of factor nodes that are neighbours of x X_s: set of all variables in the subtree connected to the variable node x via factor node f_s F_s(x, X_s): the product of all the factors in the group associated with factor f_s

Problem setup: notation Fs(x, Xs) f s µ fs x(x) x p(x) = = [ s ne(x) s ne(x) F s (x, X s ) X s µ fs x(x). ] We evaluate the marginal p(x) as product of messages from surrounding factors!

Factor messages: decomposition [ ] p(x) = F s (x, X s ) s ne(x) X s = µ fs x(x). s ne(x) Each factor is itself described by a factor sub-graph, so we can decompose: F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) (Each variable associated with f_x is {x, x_1,, x_m}) Rewriting the factor-to-variable message: [ ] µ fs x(x) =... f s (x, x 1,...,x M ) G m (x m,x sm ) x 1 x M X xm = x 1... x M f s (x, x 1,...,x M ) m ne(f s )\x m ne(f s )\x µ xm f s (x m ) (8.66)

Factor messages: decomposition [ ] p(x) = F s (x, X s ) s ne(x) X s = µ fs x(x). s ne(x) Each factor is itself described by a factor sub-graph, so we can decompose: F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) (Each variable associated with f_x is {x, x_1,, x_m}) Rewriting the factor-to-variable message: [ ] µ fs x(x) =... f s (x, x 1,...,x M ) G m (x m,x sm ) x 1 x M X xm = x 1... x M f s (x, x 1,...,x M ) m ne(f s )\x m ne(f s )\x µ xm f s (x m ) (8.66)

Factor-to-variable messages: decomposition [ ] p(x) = F s (x, X s ) s ne(x) X s = µ fs x(x). s ne(x) Each factor is itself described by a factor sub-graph, so we can decompose: F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) (Each variable associated with f_x is {x, x_1,, x_m}) Rewriting the factor-to-variable message: [ ] µ fs x(x) =... f s (x, x 1,...,x M ) G m (x m,x sm ) x 1 x M X xm = x 1... x M f s (x, x 1,...,x M ) m ne(f s )\x m ne(f s )\x µ xm f s (x m ) (8.66) µ xm f s (x m ) X sm G m (x m,x sm ).

Variable-to-factor messages: decomposition x M µ xm f s (x M ) f s µ fs x(x) x x m G m (x m,x sm ) µ xm f s (x m ) X sm G m (x m,x sm ).

Factor-to-variable messages: one step back towards the leaves f L x m f s f l F l (x m,x ml ) F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) G m (x m,x sm )= ] F l (x m,x ml ) l ne(x m )\f s

Factor-to-variable messages: one step back towards the leaves f L x m f s f l F l (x m,x ml ) µ xm f s (x m ) = = l ne(x m )\f s ] [ X ml F l (x m,x ml ) l ne(x m )\f s µ fl x m (x m ) ]

Sum-Product Initialization at Leaves µ x f (x) =1 µ f x (x) =f(x) x f f x Sum-Product: Marginal distribution over x p(x) = = [ s ne(x) s ne(x) F s (x, X s ) X s µ fs x(x). ] See Bishop p. 409 for a fully worked, simple example!