CS 2750 achne Learnng Lecture 12 ayesan belef networks los Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square CS 2750 achne Learnng Densty estmaton Data: D { D1 D2.. Dn} D x a vector of attrbute values ttrbutes: modeled by random varables { 1 2 K d} wth: Contnuous values Dscrete values.g. blood pressure wth numercal values or chest pan wth dscrete values [no-pan mld moderate strong] Underlyng true probablty dstrbuton: p CS 2750 achne Learnng
Data: Densty estmaton D { D1 D2.. Dn} D x a vector of attrbute values Objectve: try to estmate the underlyng true probablty dstrbuton over varables p usng examples n D true dstrbuton n samples p D D D.. D } { 1 2 n estmate pˆ Standard d assumptons: Samples are ndependent of each other come from the same dentcal dstrbuton fxed p CS 2750 achne Learnng Learnng va parameter estmaton In ths lecture we consder parametrc densty estmaton asc settngs: set of random varables { 1 2 K d} model of the dstrbuton over varables n wth parameters Θ : pˆ Θ Data D D D.. D } { 1 2 n Objectve: fnd the descrpton of parameters observed data Θ so they ft the CS 2750 achne Learnng
Parameter estmaton axmum lkelhood L maxmze p D Θ ξ yelds: one set of parameters Θ L the target dstrbuton s approxmated as: pˆ p Θ L ayesan parameter estmaton uses the posteror dstrbuton over possble parameters p D Θ ξ p Θ ξ p Θ D ξ p D ξ Yelds: all possble settngs of Θ and ther weghts he target dstrbuton s approxmated as: p ˆ p D p Θ p Θ D ξ dθ Θ CS 2750 achne Learnng Parameter estmaton. Other possble crtera: axmum a posteror probablty P maxmze p Θ D ξ mode of the posteror Yelds: one set of parameters Θ P pproxmaton: pˆ p Θ P xpected value of the parameter Θˆ Θ mean of the posteror xpectaton taken wth regard to posteror p Θ D ξ Yelds: one set of parameters pproxmaton: p ˆ p Θˆ CS 2750 achne Learnng
Densty estmaton So far we have covered densty estmaton for smple dstrbuton models: ernoull nomal ultnomal Gaussan Posson ut what f: he dmenson of { 1 2 K d} s large xample: patent data Compact parametrc dstrbutons do not seem to ft the data.g.: multvarate Gaussan may not ft We have only a small number of examples to do accurate parameter estmates CS 2750 achne Learnng How to learn complex dstrbutons How to learn complex multvarate dstrbutons number of varables? pˆ wth large One soluton: Decompose the dstrbuton along condtonal ndependence relatons Decompose the parameter estmaton problem to a set of smaller parameter estmaton tasks Decomposton of dstrbutons under condtonal ndependence assumpton s the man dea behnd ayesan belef networks CS 2750 achne Learnng
xample Problem descrpton: Dsease: pneumona Patent symptoms fndngs lab tests: ever Cough Paleness WC whte blood cells count Chest pan etc. Representaton of a patent case: Symptoms and dsease are represented as random varables Our objectves: Descrbe a multvarate dstrbuton representng the relatons between symptoms and dsease Desgn of nference and learnng procedures for the multvarate model CS 2750 achne Learnng ont probablty dstrbuton ont probablty dstrbuton for a set varables Defnes probabltes for all possble assgnments to values of varables n the set P pneumona WCcount Pneumona rue alse 2 3 table WCcount hgh normal low 0.0008 0.0001 0.0001 0.0042 0.9929 0.0019 0.005 0. 993 0. 002 PPneumona 0.001 0.999 PWCcount argnalzaton summng of rows or columns - summng out varables CS 2750 achne Learnng
Varable ndependence he jont dstrbuton over a subset of varables can be always computed from the jont dstrbuton through margnalzaton Not the other way around!!! Only excepton: when varables are ndependent P P P P pneumona WCcount Pneumona rue alse PWCcount WCcount hgh normal low 0.0008 0.0001 0.0001 0.0042 0.9929 0.0019 0.005 0. 993 0. 002 CS 2750 achne Learnng PPneumona 0.001 0.999 Condtonal probablty : Probablty of gven P P P Condtonal probablty Condtonal probablty s defned n terms of jont probabltes ont probabltes can be expressed n terms of condtonal probabltes P P P product rule P K P K 1 2 n n 1 1 1 Condtonal probablty s useful for varous probablstc nferences P Pneumona rue ever rue WCcount hgh Cough rue CS 2750 achne Learnng chan rule
CS 2750 achne Learnng odelng uncertanty wth probabltes ull jont dstrbuton: jont dstrbuton over all random varables defnng the doman t s suffcent to do any type of probablstc nferences CS 2750 achne Learnng Inference ny query can be computed from the full jont dstrbuton!!! ont over a subset of varables s obtaned through margnalzaton Condtonal probablty over set of varables gven other varables values s obtaned through margnalzaton and defnton of condtonals j d j D c C b a P c C a P j j d D c C b a P d D c C b a P c C a P d D c C a P c C a d D P
Inference. ny query can be computed from the full jont dstrbuton!!! ny jont probablty can be expressed as a product of condtonals va the chan rule. P 1 2 K n P n 1 K n 1 P 1 K n 1 n P n 1 K n 1 P n 1 1 K n 2 P 1 K 2 n 1 1 1 P K It s often easer to defne the dstrbuton n terms of condtonal probabltes:.g. P ever Pneumona P ever Pneumona CS 2750 achne Learnng odelng uncertanty wth probabltes ull jont dstrbuton: jont dstrbuton over all random varables defnng the doman t s suffcent to represent the complete doman and to do any type of probablstc nferences Problems: Space complexty. o store full jont dstrbuton requres to remember Od n numbers. n number of random varables d number of values Inference complexty. o compute some queres requres. Od n steps. cquston problem. Who s gong to defne all of the probablty entres? CS 2750 achne Learnng
Pneumona example. Complextes. Space complexty. Pneumona 2 values: ever 2: Cough 2: WCcount 3: hgh normal low paleness 2: Number of assgnments: 2*2*2*3*248 We need to defne at least 47 probabltes. me complexty. ssume we need to compute the probablty of Pneumona from the full jont P Pneumona P ever Cough j k h n l u Sum over 2*2*3*224 combnatons CS 2750 achne Learnng j WCcount k Pale u ayesan belef networks Ns ayesan belef networks. Represent the full jont dstrbuton over the varables more compactly wth a smaller number of parameters. ake advantage of condtonal and margnal ndependences among random varables and are ndependent P P P and are condtonally ndependent gven C P C P C P C P C P C CS 2750 achne Learnng
larm system example. ssume your house has an alarm system aganst burglary. You lve n the sesmcally actve area and the alarm system can get occasonally set off by an earthquake. You have two neghbors ary and ohn who do not know each other. If they hear the alarm they call you but ths s not guaranteed. We want to represent the probablty dstrbuton of events: urglary arthquake larm ary calls and ohn calls Causal relatons urglary arthquake larm ohncalls arycalls CS 2750 achne Learnng ayesan belef network. 1. Drected acyclc graph Nodes random varables urglary arthquake larm ary calls and ohn calls Lnks drect causal dependences between varables. he chance of larm beng s nfluenced by arthquake he chance of ohn callng s affected by the larm urglary P arthquake P larm P ohncalls P P arycalls CS 2750 achne Learnng
ayesan belef network. 2. Local condtonal dstrbutons relate varables and ther parents urglary P arthquake P larm P P ohncalls P arycalls CS 2750 achne Learnng ayesan belef network. urglary ohncalls P P 0.001 0.999 larm P 0.90 0.1 0.05 0.95 arthquake P 0.95 0.05 0.94 0.06 0.29 0.71 0.001 0.999 arycalls 0.002 0.998 P 0.7 0.3 0.01 0.99 CS 2750 achne Learnng
ayesan belef networks general wo components: S ΘS Drected acyclc graph Nodes correspond to random varables ssng lnks encode ndependences Parameters Local condtonal probablty dstrbutons for every varable-parent confguraton P pa Where: pa - stand for parents of P 0.95 0.05 0.94 0.06 0.29 0.71 0.001 0.999 CS 2750 achne Learnng ull jont dstrbuton n Ns ull jont dstrbuton s defned n terms of local condtonal dstrbutons obtaned va the chan rule: P 1 2.. n xample: 1.. n P ssume the followng assgnment of values to random varables CS 2750 achne Learnng pa hen ts probablty s: P P P P P
ayesan belef networks Ns ayesan belef networks Represent the full jont dstrbuton over the varables more compactly usng the product of local condtonals. ut how dd we get to local parameterzatons? nswer: Graphcal structure encodes condtonal and margnal ndependences among random varables and are ndependent P P P and are condtonally ndependent gven C P C P C P C P C P C he graph structure mples the decomposton!!! CS 2750 achne Learnng Independences n Ns 3 basc ndependence structures: 1. 2. 3. urglary urglary arthquake larm larm larm ohncalls arycalls ohncalls CS 2750 achne Learnng
Independences n Ns 1. 2. 3. urglary urglary arthquake larm larm larm ohncalls arycalls ohncalls 1. ohncalls s ndependent of urglary gven larm P P P P P CS 2750 achne Learnng Independences n Ns 1. 2. 3. urglary urglary arthquake larm larm larm ohncalls arycalls ohncalls 2. urglary s ndependent of arthquake not knowng larm urglary and arthquake become dependent gven larm!! P P P CS 2750 achne Learnng
Independences n Ns 1. 2. urglary urglary arthquake 3. larm larm larm ohncalls arycalls ohncalls 3. arycalls s ndependent of ohncalls gven larm P P P P P CS 2750 achne Learnng Independences n N N dstrbuton models many condtonal ndependence relatons relatng dstant varables and sets hese are defned n terms of the graphcal crteron called d- separaton D-separaton n the graph Let Y and Z be three sets of nodes If and Y are d-separated by Z then and Y are condtonally ndependent gven Z D-separaton : s d-separated from gven C f every undrected path between them s blocked Path blockng 3 cases that expand on three basc ndependence structures CS 2750 achne Learnng
Undrected path blockng 1. Wth lnear substructure Z n C 2. Wth wedge substructure Z n C 3. Wth vee substructure Y Y Y Z or any of ts descendants not n C CS 2750 achne Learnng Independences n Ns urglary arthquake larm RadoReport ohncalls arycalls arthquake and urglary are ndependent gven arycalls urglary and arycalls are ndependent not knowng larm urglary and RadoReport are ndependent gven arthquake urglary and RadoReport are ndependent gven arycalls CS 2750 achne Learnng
CS 2750 achne Learnng ull jont dstrbuton n Ns Rewrte the full jont probablty usng the product rule: CS 2750 achne Learnng ull jont dstrbuton n Ns P P Rewrte the full jont probablty usng the product rule:
CS 2750 achne Learnng ull jont dstrbuton n Ns P P P P Rewrte the full jont probablty usng the product rule: CS 2750 achne Learnng ull jont dstrbuton n Ns P P P P P Rewrte the full jont probablty usng the product rule:
CS 2750 achne Learnng ull jont dstrbuton n Ns P P P P P P P Rewrte the full jont probablty usng the product rule: CS 2750 achne Learnng ull jont dstrbuton n Ns P P P P P P P P P P P Rewrte the full jont probablty usng the product rule:
Parameters: full jont: Parameter complexty problem In the N the full jont dstrbuton s expressed as a product of condtonals of smaller complexty P 1 2.. n 2 5 32 1.. n P pa urglary arthquake N: 2 3 + 22 2 + 22 20 larm Parameters to be defned: full jont: 2 5 1 31 ohncalls arycalls N: 2 2 + 22 + 21 10 CS 2750 achne Learnng