Introduction to PGMs: Discrete Variables. Sargur Srihari

Similar documents
Bayesian belief networks

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

MgtOp 215 Chapter 13 Dr. Ahn

Chapter 5 Student Lecture Notes 5-1

Foundations of Machine Learning II TP1: Entropy

CHAPTER 3: BAYESIAN DECISION THEORY

Random Variables. b 2.

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

/ Computational Genomics. Normalization

Tests for Two Correlations

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

Supplementary material for Non-conjugate Variational Message Passing for Multinomial and Binary Regression

Linear Combinations of Random Variables and Sampling (100 points)

Appendix - Normally Distributed Admissible Choices are Optimal

PhysicsAndMathsTutor.com

Problem Set 6 Finance 1,

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

Multifactor Term Structure Models

Midterm Exam. Use the end of month price data for the S&P 500 index in the table below to answer the following questions.

Correlations and Copulas

Economics 1410 Fall Section 7 Notes 1. Define the tax in a flexible way using T (z), where z is the income reported by the agent.

Probability Distributions. Statistics and Quantitative Analysis U4320. Probability Distributions(cont.) Probability

arxiv: v1 [math.nt] 29 Oct 2015

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

Data Mining Linear and Logistic Regression

Tests for Two Ordered Categorical Variables

Graphical Methods for Survival Distribution Fitting

3: Central Limit Theorem, Systematic Errors

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

Elements of Economic Analysis II Lecture VI: Industry Supply

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8

Notes on experimental uncertainties and their propagation

4. Greek Letters, Value-at-Risk

Testing for Omitted Variables

Bayes Nets Representing and Reasoning about Uncertainty (Continued)

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect

Scribe: Chris Berlind Date: Feb 1, 2010

Elton, Gruber, Brown and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 4

Introduction to game theory

Tree-based and GA tools for optimal sampling design

Understanding Annuities. Some Algebraic Terminology.

R Functions to Symbolically Compute the Central Moments of the Multivariate Normal Distribution

Digital assets are investments with

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

A Polynomial-Time Algorithm for Action-Graph Games

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

Alternatives to Shewhart Charts

Discounted Cash Flow (DCF) Analysis: What s Wrong With It And How To Fix It

On the Moments of the Traces of Unitary and Orthogonal Random Matrices

Option pricing and numéraires

S yi a bx i cx yi a bx i cx 2 i =0. yi a bx i cx 2 i xi =0. yi a bx i cx 2 i x

Simultaneous Monitoring of Multivariate-Attribute Process Mean and Variability Using Artificial Neural Networks

Sampling Distributions of OLS Estimators of β 0 and β 1. Monte Carlo Simulations

Chapter 3 Student Lecture Notes 3-1

A Study on the Series Expansion of Gaussian Quadratic Forms

A Set of new Stochastic Trend Models

Dependent jump processes with coupled Lévy measures

A Bootstrap Confidence Limit for Process Capability Indices

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

Merton-model Approach to Valuing Correlation Products

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

ISyE 2030 Summer Semester 2004 June 30, 2004

Evaluating Performance

A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

CLOSED-FORM LIKELIHOOD EXPANSIONS FOR MULTIVARIATE DIFFUSIONS. BY YACINE AÏT-SAHALIA 1 Princeton University

The Hiring Problem. Informationsteknologi. Institutionen för informationsteknologi

Robust Stochastic Lot-Sizing by Means of Histograms

MODELING INTRA AND INTER CORRELATIONS IN CREDIT DEFAULT LOSSES

Information Flow and Recovering the. Estimating the Moments of. Normality of Asset Returns

Parsing beyond context-free grammar: Tree Adjoining Grammar Parsing I

Теоретические основы и методология имитационного и комплексного моделирования

Consumption Based Asset Pricing

A Utilitarian Approach of the Rawls s Difference Principle

Comparative analysis of CDO pricing models

AMS Financial Derivatives I

ASSESSING GOODNESS OF FIT OF GENERALIZED LINEAR MODELS TO SPARSE DATA USING HIGHER ORDER MOMENT CORRECTIONS

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

Hewlett Packard 10BII Calculator

Production and Supply Chain Management Logistics. Paolo Detti Department of Information Engeneering and Mathematical Sciences University of Siena

SUPPLEMENT TO BOOTSTRAPPING REALIZED VOLATILITY (Econometrica, Vol. 77, No. 1, January, 2009, )

Dr.Ram Manohar Lohia Avadh University, Faizabad , (Uttar Pradesh) INDIA 1 Department of Computer Science & Engineering,

Comparison of Singular Spectrum Analysis and ARIMA

Rare-Event Estimation for Dynamic Fault Trees

Fall 2017 Social Sciences 7418 University of Wisconsin-Madison Problem Set 3 Answers

Quadratic Games. First version: February 24, 2017 This version: December 12, Abstract

Inference on Reliability in the Gamma and Inverted Gamma Distributions

nto dosyncratc and event rsk. Ths dstncton s used because events lke mergers, earnngs surprses, bankruptces and ratng mgratons are key nputs for the s

An Approximate E-Bayesian Estimation of Step-stress Accelerated Life Testing with Exponential Distribution

arxiv: v1 [math-ph] 19 Oct 2007

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

1 A Primer on Linear Models. 2 Chapter 1 corrections. 3 Chapter 2 corrections. 4 Chapter 3 corrections. 1.1 Corrections 23 May 2015

2.1 Rademacher Calculus... 3

Marginal quantization of an Euler diffusion process and its application

Survey of Math Test #3 Practice Questions Page 1 of 5

Risk and Return: The Security Markets Line

Dynamic Analysis of Knowledge Sharing of Agents with. Heterogeneous Knowledge

Transcription:

Introducton to : Dscrete Varables Sargur srhar@cedar.buffalo.edu

Topcs. What are graphcal models (or ) 2. Use of Engneerng and AI 3. Drectonalty n graphs 4. Bayesan Networks 5. Generatve Models and Samplng 6. Usng wth fully Bayesan Models 7. Dscrete Case 8. Complexty Issues 2

Dscrete Varables When constructng more complex probablty dstrbutons from smpler (exponental) dstrbutons graphcal models are useful Graphcal models have nce propertes when each parent-chld par are conjugate Two cases of nterest: Both correspond to dscrete varables Both correspond to Gaussan varables 3

Probablty dstrbuton varable x havng K states Dscrete Case for a sngle dscrete Usng -of -K representaton For K=6 when x 3 = then x represented as x=(0,0,,0,0,0) T Note that K å x = k = If probablty of x k = s gven by parameter K k ( x µ) = Õ x p µ where µ = ( µ,.., µ k = The dstrbuton s normalzed: k k p(x µ) There are K- ndependent values for needed to defne 4 dstrbuton x: K ) K x T å k = µ k µ k = µ k x K

Two Dscrete Varables x and x 2 each wth K states each Denote probablty of both x k = and x 2l = by x : x 2 : x x 2 µ kl x K x 2K where x k denotes k th component of x Jont dstrbuton s Snce parameters are subject to constrant There are K 2 - parameters p(x, x 2 µ) = K k= K l= x µ k x 2 l kl å k å l µ kl For arbtrary dstrbuton over M varables there are K M - parameters 5 =

Graphcal Models for Two Dscrete Varables Jont dstrbuton p(x,x 2 ) Usng product rule s factored as p(x 2 x )p(x ) Has two node graph Margnal dstrbuton p(x ) has K- parameters Condtonal dstrbuton p(x 2 x ) also requres K- parameters for each of K values of x Total number of parameters s (K-)+K(K-) =K 2 - As before 6

Two Independent Dscrete Varables x and x 2 are ndependent Has graphcal model Each varable descrbed by a separate multnomal dstrbuton Total no of parameters s 2(K-) For M varables no of parameters s M(K-) Reduced number of parameters by droppng lnks n graph Grows lnearly wth no of varables 7

Fully connected has hgh complexty General case of M dscrete varables x,.., x M If BN s fully connected Completely general dstrbuton wth K M - parameters If there are no lnks Jont dstrbuton factorzes nto product of margnals Total no of parameters s M(K-) Graphs of ntermedate levels of connectvty More general dstrbuton than fully factorzed ones Requre fewer parameters than general jont dstrbuton Example: chan of nodes 8

Specal Case: Chan of Nodes Margnal dstrbuton p(x ) requres K- parameters Each of the M- condtonal dstrbutons p(x x - ), for =2,..,M requres K(K-) parameters Total parameter count s K-+(M-)K(K-) Whch s quadratc n K Grows lnearly (not exponentally) wth length of chan 9

Alternatve: Sharng Parameters Reduce parameters by sharng or tyng parameters In above, all condtonal dstrbutons p(x x - ), for =2,..,M share same set of K(K-) parameters governng dstrbuton of x Total of K 2 - parameters needed to specfy dstrbuton 0

Converson nto Bayesan Model Gven graph over dscrete varables We can turn t nto a Bayesan model by ntroducng Drchlet prors for parameters Each node acqures an addtonal parent for each dscrete node Te the parameters governng condtonal dstrbutons p(x x - ) Chan of Nodes Wth prors Sharng Parameters

Bnomal: Beta Pror Bernoull: p(x= μ)=μ Lkelhood of Bernoull wth D={x,..x N } p(d µ) = Bnomal: Bn(m N, µ) = Conjugate Pror: Beta(µ a, b) = N µ x n n= N m ( µ) x n µ m ( µ) N m Γ(a + b) Γ(a)Γ(b) µ a ( µ) b

Multnomal: Drchlet Pror Generalzed Bernoull (-of-k) x=(0,0,,0,0,0) T K=6 Multnomal K x ( x µ) = Õ µ k k where µ = ( k = p µ,.., µ ) Mult ( mm2 mk µ, N ) = ç Õ Where the normalzaton coeffcent s the no of ways of parttonng N objects nto K groups of sze Conjugate pror dstrbuton for parameters m k Normalzed form s K æ N K ö.. ç µ k èmm2.. mk ø k = m m, m2.. å a - Õ k k p( µ a) a µ where 0 µ k and µ = k k Dr k = K=2 s Bnomal G( a ) K K 0 a k - ( µ a) = Õ µ wherea = k 0 åa k G( a)... G( a k ) k = k = K k m k T K=2 s Bernoull æ N ö N! ç m m2.. m = è k ø m! m2!.. m k!

Controllng Number of parameters n models: Parameterzed Condtonal Dstrbutons Control exponental growth of parameters n models of dscrete varables Use parameterzed models for condtonal dstrbutons nstead of complete tables of condtonal probablty values 4

Parameterzed Condtonal Dstrbutons Consder graph wth bnary varables Each parent varable x governed by sngle parameter µ representng probablty p(x =) M parameters n total for parent nodes Condtonal dstrbuton p(y x,..,x M ) requres 2 M parameters Representng probablty p(y=) for each of the 2 M settngs of parent varables 000000000 to 5

Condtonal dstrbuton usng logstc sgmod Parsmonous form of condtonal dstrbuton Logstc sgmod actng on lnear combnaton of parent varables p( y M æ ö = x,.., xm ) = s ç w0 + å w x = s è = ø ( T w x) where s(a) = (+exp(-a)) - s the logstc sgmod x=(x 0,x,..,x M ) T s vector of parent states No of parameters grows lnearly wth M Analogous to choce of a restrctve form of covarance matrx n multvarate Gaussan 6

Lnear Gaussan Models Expressng multvarate Gaussan as a drected graph Correspondng to lnear Gaussan model over component varables Mean of a condtonal dstrbuton s a lnear functon of the condtonng varable Allows expressng nterestng structure of dstrbuton General Gaussan case and dagonal covarance case represent opposte extremes 7

Graph wth contnuous random varables Arbtrary acyclc graph over D varables Node represents a sngle contnuous random varable x havng Gaussan dstrbuton Mean of dstrbuton s a lnear combnaton of states of ts parent nodes pa of node æ p ( x = Nç å pa ) x wj x j + b, v è jîpa Where w j and b are parameters governng the mean v s the varance of the condtonal dstrbuton for x ö ø 8

Jont Dstrbuton Log of jont dstrbuton ln p(x) = = - D å = D å = ln p( x 2v æ ç x è Where x=( x,..,x D ) T - + const Ths s a quadratc functon of x pa ) å jîpa w j x j ö - b ø Hence jont dstrbuton p(x) s a multvarate Gaussan 2 Terms ndependent of x 9

Mean and Covarance of Jont Dstrbuton Recursve Formulaton Snce each varable x has, condtonal on the states of ts parents, a Gaussan dstrbuton, we can wrte where e s a zero mean, unt varance Gaussan random varable satsfyng E[e ]=0 and E[e e j ]=I j and I j s the,j element of the dentty matrx Takng expectaton E[ x ] = å wje[ x j ] + b jîpa Thus we can fnd components of E[x]=(E[x ],..E[x D ]) T by startng at lowest numbered node and workng recursvely through the graph Smlarly elements of covarance matrx å cov[ x, x ] = w cov[ x, x ] + I j jîpa j jk k j x v j = å jîpa w j + b + v e 20

Three cases for no. of parameters No lnks n the graph 2D parameters Fully connected graph D(D+)/2 parameters Graphs wth ntermedate level of complexty Chan x 2 x 4 x x 5 x 3 x 6 2

Extreme Case wth no lnks D solated nodes There are no parameters w j Only D parameters b and D parameters v Mean of p(x) gven by (b,..,b D ) T Covarance matrx s dagonal of form dag(v,..,v D ) Jont dstrbuton has total of 2D parameters Represents set of D ndependent unvarate Gaussan dstrbutons 22

Extreme case wth all lnks Fully connected graph Each node has all lower numbered nodes as parents Matrx w j has - entres on the th row and hence s a lower trangular matrx (wth no entres on leadng dagonal) Total no of parameters w j s to take D 2 no of elements n D x D matrx, subtractng D to account for dagonal and dvde by 2 to account for elements only below dagonal 23

Graph wth ntermedate complexty Lnk mssng between varables x and x 3 Mean and covarance of jont dstrbuton are µ = å ( b, b + w b, b + w b + w w b ) 2 æ v ç = ç w2v ç è w32w2v 2 w 32 v 3 2 ( v w + 2 2 + v w 32 2 2 w v 2 2 2 v ) 32 w w 32 2 32 2 w 32 ( v ( v 2 2 T w + + 2 v w w 2 2 2 2 v v ö ) ) ø 24

Extenson to multvarate Gaussan varables Nodes n the graph represent multvarate Gaussan varables Wrte condtonal dstrbuton for node n the form æ p (x ç pa ) N x Wjx j + b, è jîpa = å å where W j s a matrx (non-square f x and x j have dfferent dmensonaltes) v ö ø 25

Summary. allow vsualzng probablstc models Jont dstrbutons are drected/undrected 2. can be used to generate samples Ancestral samplng wth drected s smple 3. are useful for Bayesan statstcs Dscrete varable PGM represented usng Drchlet prors 4. Parameter exploson controlled by tyng parameters 5. Multvarate Gaussan expressed as PGM Graph s a lnear Gaussan model over components 26