Bayesian belief networks

Similar documents
Introduction to PGMs: Discrete Variables. Sargur Srihari

Bayes Nets Representing and Reasoning about Uncertainty (Continued)

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

/ Computational Genomics. Normalization

MgtOp 215 Chapter 13 Dr. Ahn

CHAPTER 3: BAYESIAN DECISION THEORY

Random Variables. b 2.

A Set of new Stochastic Trend Models

Data Mining Linear and Logistic Regression

Tests for Two Correlations

Graphical Methods for Survival Distribution Fitting

Foundations of Machine Learning II TP1: Entropy

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

Global sensitivity analysis of credit risk portfolios

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

Chapter 5 Student Lecture Notes 5-1

Multifactor Term Structure Models

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

Scribe: Chris Berlind Date: Feb 1, 2010

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

Notes on experimental uncertainties and their propagation

Applications of Myerson s Lemma

Tests for Two Ordered Categorical Variables

Linear Combinations of Random Variables and Sampling (100 points)

Economics 1410 Fall Section 7 Notes 1. Define the tax in a flexible way using T (z), where z is the income reported by the agent.

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Quiz on Deterministic part of course October 22, 2002

A Bootstrap Confidence Limit for Process Capability Indices

Bid-auction framework for microsimulation of location choice with endogenous real estate prices

RECONCILING ATTRIBUTE VALUES FROM MULTIPLE DATA SOURCES

Understanding Annuities. Some Algebraic Terminology.

Module Contact: Dr P Moffatt, ECO Copyright of the University of East Anglia Version 2

2) In the medium-run/long-run, a decrease in the budget deficit will produce:

Probability Distributions. Statistics and Quantitative Analysis U4320. Probability Distributions(cont.) Probability

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8

Теоретические основы и методология имитационного и комплексного моделирования

Appendix - Normally Distributed Admissible Choices are Optimal

Correlations and Copulas

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

An Approximate E-Bayesian Estimation of Step-stress Accelerated Life Testing with Exponential Distribution

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect

Dependent jump processes with coupled Lévy measures

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

A Real Option Approach to Telecommunications. Network Optimization

Machine Learning Markets

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da *

Chapter 15: Debt and Taxes

Midterm Exam. Use the end of month price data for the S&P 500 index in the table below to answer the following questions.

Examining the Validity of Credit Ratings Assigned to Credit Derivatives

Problem Set 6 Finance 1,

Introduction to game theory

Comparative analysis of CDO pricing models

ISyE 2030 Summer Semester 2004 June 30, 2004

OPERATIONS RESEARCH. Game Theory

4. Greek Letters, Value-at-Risk

Elements of Economic Analysis II Lecture VI: Industry Supply

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Instituto de Engenharia de Sistemas e Computadores de Coimbra Institute of Systems Engineering and Computers INESC - Coimbra

Capability Analysis. Chapter 255. Introduction. Capability Analysis

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

Natural Resources Data Analysis Lecture Notes Brian R. Mitchell. IV. Week 4: A. Goodness of fit testing

Inference on Reliability in the Gamma and Inverted Gamma Distributions

EDC Introduction

Appendix for Solving Asset Pricing Models when the Price-Dividend Function is Analytic

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis

Heterogeneity in Expectations, Risk Tolerance, and Household Stock Shares

3: Central Limit Theorem, Systematic Errors

Tree-based and GA tools for optimal sampling design

Comparison of Singular Spectrum Analysis and ARIMA

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

Jeffrey Ely. October 7, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

S yi a bx i cx yi a bx i cx 2 i =0. yi a bx i cx 2 i xi =0. yi a bx i cx 2 i x

Rare-Event Estimation for Dynamic Fault Trees

Gaussian-log-Gaussian wavelet trees, frequentist and Bayesian inference, and statistical signal processing applications

Model Study about the Applicability of the Chain Ladder Method. Magda Schiegl. ASTIN 2011, Madrid

Analysis of Variance and Design of Experiments-II

Dynamic Analysis of Knowledge Sharing of Agents with. Heterogeneous Knowledge

CS54701: Information Retrieval

Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator.

Physics 4A. Error Analysis or Experimental Uncertainty. Error

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

A Case Study for Optimal Dynamic Simulation Allocation in Ordinal Optimization 1

INTRODUCTION TO MACROECONOMICS FOR THE SHORT RUN (CHAPTER 1) WHY STUDY BUSINESS CYCLES? The intellectual challenge: Why is economic growth irregular?

Statistical Delay Computation Considering Spatial Correlations

Lecture Note 1: Foundations 1

Games and Decisions. Part I: Basic Theorems. Contents. 1 Introduction. Jane Yuxin Wang. 1 Introduction 1. 2 Two-player Games 2

Option pricing and numéraires

Quadratic Games. First version: February 24, 2017 This version: December 12, Abstract

THE ECONOMICS OF TAXATION

In this appendix, we present some theoretical aspects of game theory that would be followed by players in a restructured energy market.

Equilibrium in Prediction Markets with Buyers and Sellers

references Chapters on game theory in Mas-Colell, Whinston and Green

Self-controlled case series analyses: small sample performance

Discounted Cash Flow (DCF) Analysis: What s Wrong With It And How To Fix It

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/15/2017. Behavioral Economics Mark Dean Spring 2017

Probabilistic Engineering Mechanics. Stochastic sensitivity analysis by dimensional decomposition and score functions

Quadratic Games. First version: February 24, 2017 This version: August 3, Abstract

Transcription:

CS 2750 achne Learnng Lecture 12 ayesan belef networks los Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square CS 2750 achne Learnng Densty estmaton Data: D { D1 D2.. Dn} D x a vector of attrbute values ttrbutes: modeled by random varables { 1 2 K d} wth: Contnuous values Dscrete values.g. blood pressure wth numercal values or chest pan wth dscrete values [no-pan mld moderate strong] Underlyng true probablty dstrbuton: p CS 2750 achne Learnng

Data: Densty estmaton D { D1 D2.. Dn} D x a vector of attrbute values Objectve: try to estmate the underlyng true probablty dstrbuton over varables p usng examples n D true dstrbuton n samples p D D D.. D } { 1 2 n estmate pˆ Standard d assumptons: Samples are ndependent of each other come from the same dentcal dstrbuton fxed p CS 2750 achne Learnng Learnng va parameter estmaton In ths lecture we consder parametrc densty estmaton asc settngs: set of random varables { 1 2 K d} model of the dstrbuton over varables n wth parameters Θ : pˆ Θ Data D D D.. D } { 1 2 n Objectve: fnd the descrpton of parameters observed data Θ so they ft the CS 2750 achne Learnng

Parameter estmaton axmum lkelhood L maxmze p D Θ ξ yelds: one set of parameters Θ L the target dstrbuton s approxmated as: pˆ p Θ L ayesan parameter estmaton uses the posteror dstrbuton over possble parameters p D Θ ξ p Θ ξ p Θ D ξ p D ξ Yelds: all possble settngs of Θ and ther weghts he target dstrbuton s approxmated as: p ˆ p D p Θ p Θ D ξ dθ Θ CS 2750 achne Learnng Parameter estmaton. Other possble crtera: axmum a posteror probablty P maxmze p Θ D ξ mode of the posteror Yelds: one set of parameters Θ P pproxmaton: pˆ p Θ P xpected value of the parameter Θˆ Θ mean of the posteror xpectaton taken wth regard to posteror p Θ D ξ Yelds: one set of parameters pproxmaton: p ˆ p Θˆ CS 2750 achne Learnng

Densty estmaton So far we have covered densty estmaton for smple dstrbuton models: ernoull nomal ultnomal Gaussan Posson ut what f: he dmenson of { 1 2 K d} s large xample: patent data Compact parametrc dstrbutons do not seem to ft the data.g.: multvarate Gaussan may not ft We have only a small number of examples to do accurate parameter estmates CS 2750 achne Learnng How to learn complex dstrbutons How to learn complex multvarate dstrbutons number of varables? pˆ wth large One soluton: Decompose the dstrbuton along condtonal ndependence relatons Decompose the parameter estmaton problem to a set of smaller parameter estmaton tasks Decomposton of dstrbutons under condtonal ndependence assumpton s the man dea behnd ayesan belef networks CS 2750 achne Learnng

xample Problem descrpton: Dsease: pneumona Patent symptoms fndngs lab tests: ever Cough Paleness WC whte blood cells count Chest pan etc. Representaton of a patent case: Symptoms and dsease are represented as random varables Our objectves: Descrbe a multvarate dstrbuton representng the relatons between symptoms and dsease Desgn of nference and learnng procedures for the multvarate model CS 2750 achne Learnng ont probablty dstrbuton ont probablty dstrbuton for a set varables Defnes probabltes for all possble assgnments to values of varables n the set P pneumona WCcount Pneumona rue alse 2 3 table WCcount hgh normal low 0.0008 0.0001 0.0001 0.0042 0.9929 0.0019 0.005 0. 993 0. 002 PPneumona 0.001 0.999 PWCcount argnalzaton summng of rows or columns - summng out varables CS 2750 achne Learnng

Varable ndependence he jont dstrbuton over a subset of varables can be always computed from the jont dstrbuton through margnalzaton Not the other way around!!! Only excepton: when varables are ndependent P P P P pneumona WCcount Pneumona rue alse PWCcount WCcount hgh normal low 0.0008 0.0001 0.0001 0.0042 0.9929 0.0019 0.005 0. 993 0. 002 CS 2750 achne Learnng PPneumona 0.001 0.999 Condtonal probablty : Probablty of gven P P P Condtonal probablty Condtonal probablty s defned n terms of jont probabltes ont probabltes can be expressed n terms of condtonal probabltes P P P product rule P K P K 1 2 n n 1 1 1 Condtonal probablty s useful for varous probablstc nferences P Pneumona rue ever rue WCcount hgh Cough rue CS 2750 achne Learnng chan rule

CS 2750 achne Learnng odelng uncertanty wth probabltes ull jont dstrbuton: jont dstrbuton over all random varables defnng the doman t s suffcent to do any type of probablstc nferences CS 2750 achne Learnng Inference ny query can be computed from the full jont dstrbuton!!! ont over a subset of varables s obtaned through margnalzaton Condtonal probablty over set of varables gven other varables values s obtaned through margnalzaton and defnton of condtonals j d j D c C b a P c C a P j j d D c C b a P d D c C b a P c C a P d D c C a P c C a d D P

Inference. ny query can be computed from the full jont dstrbuton!!! ny jont probablty can be expressed as a product of condtonals va the chan rule. P 1 2 K n P n 1 K n 1 P 1 K n 1 n P n 1 K n 1 P n 1 1 K n 2 P 1 K 2 n 1 1 1 P K It s often easer to defne the dstrbuton n terms of condtonal probabltes:.g. P ever Pneumona P ever Pneumona CS 2750 achne Learnng odelng uncertanty wth probabltes ull jont dstrbuton: jont dstrbuton over all random varables defnng the doman t s suffcent to represent the complete doman and to do any type of probablstc nferences Problems: Space complexty. o store full jont dstrbuton requres to remember Od n numbers. n number of random varables d number of values Inference complexty. o compute some queres requres. Od n steps. cquston problem. Who s gong to defne all of the probablty entres? CS 2750 achne Learnng

Pneumona example. Complextes. Space complexty. Pneumona 2 values: ever 2: Cough 2: WCcount 3: hgh normal low paleness 2: Number of assgnments: 2*2*2*3*248 We need to defne at least 47 probabltes. me complexty. ssume we need to compute the probablty of Pneumona from the full jont P Pneumona P ever Cough j k h n l u Sum over 2*2*3*224 combnatons CS 2750 achne Learnng j WCcount k Pale u ayesan belef networks Ns ayesan belef networks. Represent the full jont dstrbuton over the varables more compactly wth a smaller number of parameters. ake advantage of condtonal and margnal ndependences among random varables and are ndependent P P P and are condtonally ndependent gven C P C P C P C P C P C CS 2750 achne Learnng

larm system example. ssume your house has an alarm system aganst burglary. You lve n the sesmcally actve area and the alarm system can get occasonally set off by an earthquake. You have two neghbors ary and ohn who do not know each other. If they hear the alarm they call you but ths s not guaranteed. We want to represent the probablty dstrbuton of events: urglary arthquake larm ary calls and ohn calls Causal relatons urglary arthquake larm ohncalls arycalls CS 2750 achne Learnng ayesan belef network. 1. Drected acyclc graph Nodes random varables urglary arthquake larm ary calls and ohn calls Lnks drect causal dependences between varables. he chance of larm beng s nfluenced by arthquake he chance of ohn callng s affected by the larm urglary P arthquake P larm P ohncalls P P arycalls CS 2750 achne Learnng

ayesan belef network. 2. Local condtonal dstrbutons relate varables and ther parents urglary P arthquake P larm P P ohncalls P arycalls CS 2750 achne Learnng ayesan belef network. urglary ohncalls P P 0.001 0.999 larm P 0.90 0.1 0.05 0.95 arthquake P 0.95 0.05 0.94 0.06 0.29 0.71 0.001 0.999 arycalls 0.002 0.998 P 0.7 0.3 0.01 0.99 CS 2750 achne Learnng

ayesan belef networks general wo components: S ΘS Drected acyclc graph Nodes correspond to random varables ssng lnks encode ndependences Parameters Local condtonal probablty dstrbutons for every varable-parent confguraton P pa Where: pa - stand for parents of P 0.95 0.05 0.94 0.06 0.29 0.71 0.001 0.999 CS 2750 achne Learnng ull jont dstrbuton n Ns ull jont dstrbuton s defned n terms of local condtonal dstrbutons obtaned va the chan rule: P 1 2.. n xample: 1.. n P ssume the followng assgnment of values to random varables CS 2750 achne Learnng pa hen ts probablty s: P P P P P

ayesan belef networks Ns ayesan belef networks Represent the full jont dstrbuton over the varables more compactly usng the product of local condtonals. ut how dd we get to local parameterzatons? nswer: Graphcal structure encodes condtonal and margnal ndependences among random varables and are ndependent P P P and are condtonally ndependent gven C P C P C P C P C P C he graph structure mples the decomposton!!! CS 2750 achne Learnng Independences n Ns 3 basc ndependence structures: 1. 2. 3. urglary urglary arthquake larm larm larm ohncalls arycalls ohncalls CS 2750 achne Learnng

Independences n Ns 1. 2. 3. urglary urglary arthquake larm larm larm ohncalls arycalls ohncalls 1. ohncalls s ndependent of urglary gven larm P P P P P CS 2750 achne Learnng Independences n Ns 1. 2. 3. urglary urglary arthquake larm larm larm ohncalls arycalls ohncalls 2. urglary s ndependent of arthquake not knowng larm urglary and arthquake become dependent gven larm!! P P P CS 2750 achne Learnng

Independences n Ns 1. 2. urglary urglary arthquake 3. larm larm larm ohncalls arycalls ohncalls 3. arycalls s ndependent of ohncalls gven larm P P P P P CS 2750 achne Learnng Independences n N N dstrbuton models many condtonal ndependence relatons relatng dstant varables and sets hese are defned n terms of the graphcal crteron called d- separaton D-separaton n the graph Let Y and Z be three sets of nodes If and Y are d-separated by Z then and Y are condtonally ndependent gven Z D-separaton : s d-separated from gven C f every undrected path between them s blocked Path blockng 3 cases that expand on three basc ndependence structures CS 2750 achne Learnng

Undrected path blockng 1. Wth lnear substructure Z n C 2. Wth wedge substructure Z n C 3. Wth vee substructure Y Y Y Z or any of ts descendants not n C CS 2750 achne Learnng Independences n Ns urglary arthquake larm RadoReport ohncalls arycalls arthquake and urglary are ndependent gven arycalls urglary and arycalls are ndependent not knowng larm urglary and RadoReport are ndependent gven arthquake urglary and RadoReport are ndependent gven arycalls CS 2750 achne Learnng

CS 2750 achne Learnng ull jont dstrbuton n Ns Rewrte the full jont probablty usng the product rule: CS 2750 achne Learnng ull jont dstrbuton n Ns P P Rewrte the full jont probablty usng the product rule:

CS 2750 achne Learnng ull jont dstrbuton n Ns P P P P Rewrte the full jont probablty usng the product rule: CS 2750 achne Learnng ull jont dstrbuton n Ns P P P P P Rewrte the full jont probablty usng the product rule:

CS 2750 achne Learnng ull jont dstrbuton n Ns P P P P P P P Rewrte the full jont probablty usng the product rule: CS 2750 achne Learnng ull jont dstrbuton n Ns P P P P P P P P P P P Rewrte the full jont probablty usng the product rule:

Parameters: full jont: Parameter complexty problem In the N the full jont dstrbuton s expressed as a product of condtonals of smaller complexty P 1 2.. n 2 5 32 1.. n P pa urglary arthquake N: 2 3 + 22 2 + 22 20 larm Parameters to be defned: full jont: 2 5 1 31 ohncalls arycalls N: 2 2 + 22 + 21 10 CS 2750 achne Learnng