Exact Inference x 1 x 3 x 2 f s Geoffrey Roeder roeder@cs.toronto.edu 8 February 2018 Factor Graphs through Max-Sum Algorithm Figures from Bishop PRML Sec. 8.3/8.4
Building Blocks UGMs, Cliques, Factor Graphs
Markov Random Fields / UGMs Parameterization: maximal cliques x 1 x 2 x 3 x 4
Example: Equivalent DGM and UGM x 1 x 2 x N 1 x N x 1 x 2 x N x N 1 p(x) =p(x 1 )p(x 2 x 1 )p(x 3 x 2 ) p(x N x N 1 ). vert this to an undirected graph representation, p(x) = 1 Z ψ 1,2(x 1,x 2 )ψ 2,3 (x 2,x 3 ) ψ N 1,N (x N 1,x N ) ψ 1,2 (x 1,x 2 ) = p(x 1 )p(x 2 x 1 ) ψ 2,3 (x 2,x 3 ) = p(x 3 x 2 ). ψ N 1,N (x N 1,x N ) = p(x N x N 1 )
Conversion: Moralization (Marry the Parents of Every Child) x 1 x 3 x 1 x 3 x 2 x 2 (a) x 4 (b) x 4 p(x) =p(x 1 )p(x 2 )p(x 3 )p(x 4 x 1,x 2,x 3 ).
DGMs and UGMs represent distinct distributions A B C F A B C D U P D
Motivations: Exact Inference in a Chain
Query probability of a configuration for node X_n: p(x_n) x 1 x 2 x N 1 x N x 1 x 2 x N x N 1 p(x) = 1 Z ψ 1,2(x 1,x 2 )ψ 2,3 (x 2,x 3 ) ψ N 1,N (x N 1,x N ). p(x n )= x 1 x n 1 p(x). x n+1 x N
Query probability of a configuration for node X_n: p(x_n) x 1 x 2 x N 1 x N x 1 x 2 x N x N 1 p(x) = 1 Z ψ 1,2(x 1,x 2 )ψ 2,3 (x 2,x 3 ) ψ N 1,N (x N 1,x N ). p(x n )= x 1 x n 1 p(x). x n+1 x N Naively: N variables, K states per variable: computation complexity? p(x n )= x 1 x n 1 p(x). x n+1 x N
Query probability of a configuration for node X_n: p(x_n) x 1 x 2 x N 1 x N x 1 x 2 x N x N 1 p(x) = 1 Z ψ 1,2(x 1,x 2 )ψ 2,3 (x 2,x 3 ) ψ N 1,N (x N 1,x N ). p(x n )= x 1 x n 1 p(x). x n+1 x N We ignored the conditional independence! Notice for x_n: summation ψ N 1,N (x N 1,x N ) x N
Be clever about order of computation: x 1 x 2 x N 1 x N p(x n )= 1 Z [ [ ]] ψ n 1,n (x n 1,x n ) ψ 2,3 (x 2,x 3 ) ψ 1,2 (x 1,x 2 ) xn 1 x 2 x 1 }{{} µ α (x n ) [ ] ψ n,n+1 (x n,x n+1 ) ψ N 1,N (x N 1,x N ). (8.52) xn+1 x N }{{} µ β (x n ) p(x n )= 1 Z µ α(x n )µ β (x n )
Be clever about order of computation: x 1 x 2 x N 1 x N µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N p(x n )= 1 Z µ α(x n )µ β (x n )
We get joint marginals over variables, too: x 1 x 2 x N 1 x N µ α (x n 1 ) µ α (x n ) µ β (x n ) µ β (x n+1 ) x 1 x n 1 x n x n+1 x N p(x n 1,x n )= 1 Z µ α(x n 1 )ψ n 1,n (x n 1,x n )µ β (x n ). obtain the joint distributions over all of the sets of var
Factor Graph Review
x 1 x 2 x 3 f a f b f c f d p(x) = Y s f(x s ) p(x) =f a (x 1,x 2 )f b (x 1,x 2 )f c (x 2,x 3 )f d (x 3 )
x 1 x 2 x 1 x 2 f x 1 x 2 f c f a f b (a) x 3 (b) x 3 (c) x 3 (a) p(x 1 )p(x 2 )p(x 3 x 1,x 2 ) (b) f(x 1,x 2,x 3 )=p(x 1 )p(x 2 )p(x 3 x 1,x 2 ) (c) f a (x 1 )=p(x 1 ),f b (x 2 )=p(x 2 ),f c (x 1,x 2,x 3 )=p(x 3 x 1,x 2 )
x 1 x 2 x 1 x 2 x 1 x 2 f a f(x 1,x 2,x 3 ) f b f c (a) x 3 (b) x 3 (c) x 3
Sum-Product Algorithm Generalize Exact Inference in Chains to Tree-Structured PGMs
Problem setup: notation Fs(x, Xs) f s µ fs x(x) x p(x) = x\x p(x) = p(x) F s (x, X s ) s ne(x) ne(x): set of factor nodes that are neighbours of x X_s: set of all variables in the subtree connected to the variable node x via factor node f_s F_s(x, X_s): the product of all the factors in the group associated with factor f_s
Problem setup: notation Fs(x, Xs) f s µ fs x(x) x p(x) = = [ s ne(x) s ne(x) F s (x, X s ) X s µ fs x(x). ] ne(x): set of factor nodes that are neighbours of x X_s: set of all variables in the subtree connected to the variable node x via factor node f_s F_s(x, X_s): the product of all the factors in the group associated with factor f_s
Problem setup: notation Fs(x, Xs) f s µ fs x(x) x p(x) = = [ s ne(x) s ne(x) F s (x, X s ) X s µ fs x(x). ] We evaluate the marginal p(x) as product of messages from surrounding factors!
Factor messages: decomposition [ ] p(x) = F s (x, X s ) s ne(x) X s = µ fs x(x). s ne(x) Each factor is itself described by a factor sub-graph, so we can decompose: F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) (Each variable associated with f_x is {x, x_1,, x_m}) Rewriting the factor-to-variable message: [ ] µ fs x(x) =... f s (x, x 1,...,x M ) G m (x m,x sm ) x 1 x M X xm = x 1... x M f s (x, x 1,...,x M ) m ne(f s )\x m ne(f s )\x µ xm f s (x m ) (8.66)
Factor messages: decomposition [ ] p(x) = F s (x, X s ) s ne(x) X s = µ fs x(x). s ne(x) Each factor is itself described by a factor sub-graph, so we can decompose: F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) (Each variable associated with f_x is {x, x_1,, x_m}) Rewriting the factor-to-variable message: [ ] µ fs x(x) =... f s (x, x 1,...,x M ) G m (x m,x sm ) x 1 x M X xm = x 1... x M f s (x, x 1,...,x M ) m ne(f s )\x m ne(f s )\x µ xm f s (x m ) (8.66)
Factor-to-variable messages: decomposition [ ] p(x) = F s (x, X s ) s ne(x) X s = µ fs x(x). s ne(x) Each factor is itself described by a factor sub-graph, so we can decompose: F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) (Each variable associated with f_x is {x, x_1,, x_m}) Rewriting the factor-to-variable message: [ ] µ fs x(x) =... f s (x, x 1,...,x M ) G m (x m,x sm ) x 1 x M X xm = x 1... x M f s (x, x 1,...,x M ) m ne(f s )\x m ne(f s )\x µ xm f s (x m ) (8.66) µ xm f s (x m ) X sm G m (x m,x sm ).
Variable-to-factor messages: decomposition x M µ xm f s (x M ) f s µ fs x(x) x x m G m (x m,x sm ) µ xm f s (x m ) X sm G m (x m,x sm ).
Factor-to-variable messages: one step back towards the leaves f L x m f s f l F l (x m,x ml ) F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) G m (x m,x sm )= ] F l (x m,x ml ) l ne(x m )\f s
Factor-to-variable messages: one step back towards the leaves f L x m f s f l F l (x m,x ml ) µ xm f s (x m ) = = l ne(x m )\f s ] [ X ml F l (x m,x ml ) l ne(x m )\f s µ fl x m (x m ) ]
Sum-Product Initialization at Leaves µ x f (x) =1 µ f x (x) =f(x) x f f x Sum-Product: Marginal distribution over x p(x) = = [ s ne(x) s ne(x) F s (x, X s ) X s µ fs x(x). ] See Bishop p. 409 for a fully worked, simple example!