Active Sensing. Abstract. 1 Introduction

Size: px

Start display at page:

Download "Active Sensing. Abstract. 1 Introduction"

Curtis Garrison
5 years ago
Views:

1 Atve Sensng Shpeng Yu, Balaj Krshnapuram, Romer Rosales, R. Bharat Rao CAD and Knowledge Solutons, Semens Medal Solutons USA, In. Abstrat Labels are often expensve to get, and ths motvates atve learnng whh hooses the most nformatve samples for label aquston. In ths paper we study atve sensng n a mult-vew settng, motvated from many problems where grouped features are also expensve to obtan and need to be aqured (or sensed atvely (e.g., n aner dagnoss eah patent mght go through many tests suh as CT, Ultrasound and MRI to get valuable features. The strength of ths model s that one atvely sensed (sample, vew par would mprove the jont mult-vew lassfaton on all the samples. For ths purpose we extend the Bayesan o-tranng framework suh that t an handle mssng vews n a prnpled way, and ntrodue two rtera for vew aquston. Experments on one toy data and two real-world medal problems show the effetveness of ths model. 1 Introduton Labeled data an be expensve to obtan n a varety of mahne learnng problems. Atve learnng addresses the problem of effently hoosng data samples to be labeled n order to mprove overall learnng performane. From a aner dagnoss perspetve, ths s equvalent to hoosng patents to do a bopsy suh that the tumor s orretly dagnosed (bengn/malgnant. In ths paper we onsder a related but dfferent problem, now motvated by the fat that features may also be expensve to obtan (muh n the Appearng n Proeedngs of the 1 th Internatonal Conferene on Artfal Intellgene and Statsts (AISTATS 009, Clearwater Beah, Florda, USA. Volume 5 of JMLR: W&CP 5. Copyrght 009 by the authors. Features L Vews L 1 Atve Learnng Atve Sensng Fgure 1: Settngs for atve learnng (left and atve sensng wth vews (rght. The L olumn denotes the labels (outputs. Lght blue bloks denote observed data, and red bloks denote mssng data. same way as labels. More generally we onsder subsets of features; we refer to them as vews. In aner dagnoss, features ould ome from dfferent magng modaltes suh as CT, Ultrasound and MRI. In problems where there exst dfferent vews of the data, some of these vews ould be mssng for ertan samples (due to, e.g., hgh ost or lmted budget. We all atve sensng the proess of effently hoosng what vews and samples to addtonally aqure to mprove the overall learnng performane (f. Fg. 1. Examples of the atve sensng settng desrbed above are abundant. For land mne deteton n a sensor network, we may have dfferent types of sensors (as dfferent vews deployed at one loaton, but some sensors may not be avalable for all loatons due to hgh ost. So the nterest s to dede whh loaton and whh type of sensor we should addtonally onsder to aheve better deteton auray. In the medal dagnoss senaro, our motvatng applaton, spealsts rely on dfferent sets of medal fators, suh as demographs, magng, and bo-markers, to make lnal desons. A patent does not undergo all possble tests 639

2 Atve Sensng at one (due to varous sde effets suh as radaton and ontrast, but these tests are seleted based on the evdene olleted up to a partular pont. 1 It s seen that standard atve learnng would not work n ths settng. When some vews are mssng, one soluton s to learn a model usng those samples for whh all the vews are avalable. Another soluton s to mpute the mssng features usng the observed features. However, as the man motvatng fator for mult-vew learnng approahes, the nformaton provded by the ombned set of vews (taken at one s n general larger than that provded by any algorthm that onsders the vews separately (e.g., vews an renfore eah other. We provde two approahes for effently hoosng the (sample, vew par, based on the mutual nformaton (nvolvng varous random varables and on the predtve unertanty, respetvely. We formalze these wthn the reently proposed Bayesan otranng framework (Yu et al., 008, wth an mportant extenson to aount for data wth mssng vew nformaton. Ths provdes an undreted graphal model representaton of the atve sensng problem. We also provde methods for addressng densty modelng and approxmated nferene sub-problems arsng n ths probablst settng. Empral studes usng one toy data and two real-world medal problems learly show the effetveness of ths model. The rest of the paper s organzed as follows. We survey the related lterature n Seton. The Bayesan o-tranng model s extended to handle mssng vews n Seton 3. Seton 4 desrbes two methods for atve sensng,.e., dedng whh nomplete samples should be further haraterzed, and whh sensors should be deployed on them. Expermental results are provded n Seton 5. We onlude wth a bref dsusson and future work n Seton 6. Related Work Atve sensng provdes a new senaro n atve data aquston. The present formulaton benefts from prevous work n experment desgn (Lndley, 1956; Fedorov, 197, atve learnng (MaKay, 199; Seung et al., 199, and sensor plaement (Krause et al., 008. Whle there exst a onsderable body of work on the general noton of atve data aquston, to the best of our knowledge, ths s the frst paper to fous on ths noton of atve sensng feature aquston spefally for mprovng mult-vew learnng jontly for all unlabeled samples. Feature aquston was addressed n, e.g., (Melvlle 1 Ths s normally referred to as dfferental dagnoss. et al., 004; Blg and Getoor, 007, but there s a lear dfferene to atve sensng. Prevous feature aquston only onsders one sample at a tme,.e., when one sample s n onsderaton, the other samples are not affeted. But n atve sensng, one atvely aqured (sample, vew par wll mprove the lassfaton performane of all the unlabeled samples va a o-tranng settng. A related yet dfferent problem was onsdered to dentfy the optmal spatal loatons for plang a sngle type of sensor to model spatally varyng phenomena (Krause et al., 008; however, ths work addressed the use of a sngle type of sensor, and dd not onsder the senaro of multple vews. Co-tranng (Blum and Mthell, 1998 s based on the dea that the error rate on unseen test samples an be upper bounded by the dsagreement between the lassfaton-desons obtaned from ndependent haraterzatons (vews of the data (Dasgupta et al., 001. Reently, Bayesan o-tranng (Yu et al., 008 was proposed whh defnes an undreted graphal model for o-tranng and provdes a prnpled soluton to mult-vew learnng. However t an only handle data wthout mssng vews, and n ths paper we extend t suh that t provdes the bass for atve sensng. One of our rtera for atve sensng s to hoose the (sample, vew par whh provdes the maxmum mutual nformaton (MI (Cover and Thomas, 1991 about the non-parametr lassfaton funton. In order to aomplsh ths, we use the D-optmalty rteron, whle other hoes suh as A-optmalty and E-optmalty are also avalable (Flaherty et al., 006. Apart from MI maxmzaton, other objetve rtera for atve learnng nlude unertanty samplng (Lews and Gale, 1994; Cohn et al., 1996 and performane optmzaton (e.g., (Roy and MCallum, 001. In ths paper we make two prnpal ontrbutons. Frst, we extend Bayesan o-tranng to allow for mssng vews, aommodatng nompletely haraterzed objets. Deployng addtonal sensors to haraterze an objet would naturally help mprovng lassfaton auray. Our seond ontrbuton s to dentfy whh objets should be haraterzed usng addtonal sensors n order to mprove the lassfaton of all the unlabeled data. Ths s sgnfantly dfferent from prevous feature aquston work. 3 Bayesan Co-Tranng wth Mssng Vews Bayesan o-tranng defnes an undreted graphal model for sem-supervsed mult-vew learnng (Yu 640

3 Yu, Krshnapuram, Rosales, Rao x 1 (1 f(x 1 y 1 x 1 (1 f 1 (x 1 (1 f (x 1 f (x 1 ( x 1 ( x j (1 x k (1 x n (1 f(x j f(x k f(x n y j y k y n x j (1 x k (1 x n (1 f 1 (x j (1 f 1 (x n (1 y 1 f (x j y j f (x n f (x j ( f 1 (x (1 k f (x k f (x ( k y k f (x n ( x j ( x k ( x n ( (a (b y n Fgure : Bayesan o-tranng fator graphs for (a one-vew and (b two-vew problems, wth mssng vews. Observed varables are marked as dark/bold, and unobserved ones are marked as red/non-bold, nludng funtons f 1, f, f (blue/non-bold. Unobserved varables n a dotted box (suh as x (1 j are potental observatons for atve sensng. All labels y are denoted as observed n the graph, but ths s not requred. et al., 008. The orgnal work assumes that the nput data are omplete,.e., all the vews are observed for every data sample, but for atve sensng we need to defne a o-tranng strategy for data wth nomplete or mssng vews. In ths seton we extend Bayesan o-tranng to the ase where there are mssng (sample, vew pars n the nput data. The same notatons as n (Yu et al., 008 are preserved unless otherwse mentoned. Suppose we have m dfferent vews of a set of n data samples. Let x (j R d j be the features for the th sample obtaned usng the jth vew, where d j s the dmensonalty of the nput spae for vew j. Let eah vew j be observed for a subset of n j samples, and let I j denote the ndes of these samples n the whole sample set. Fnally let y = (y 1,..., y n denote the labels for these samples. In ths paper we onsder a bnary lassfaton senaro where eah y { 1, +1}. In Bayesan o-tranng, let f j denote the latent funton for the jth vew, and f j GP(0, κ j be ts GP pror n vew j. The onsensus funton f s defned to ensure ondtonal ndependene between the output y and the m latent funtons {f j } (Yu et al., 008 (f. Fg. for the fator graph. The undreted graphal model leads to the followng jont probablty: p (y, f, f 1,..., f m = 1 n m ψ(y, f (x ψ(f j ψ(f j, f, (1 Z =1 j=1 where f = {f (x } n =1 and f j = {f j (x (j } Ij are Note that subsrpts ndex the data sample, and supersrpts (wth round brakets ndex the vew. olumn vetors of length n and n j, respetvely. Note that unlke n (Yu et al., 008, f j s only realzed on a subset of samples (as denoted n I j and s of length n j (nstead of n. The wthn-vew potental ψ(f j s defned va the GP pror, ψ(f j = exp( 1 f j K 1 j f j, where K j R n j n j s the ovarane matrx for vew j; the onsensus potental ψ(f j, f desrbes how eah latent funton f j s related to the onsensus funton f, whh we defne as follows: ( ψ(f j, f = exp f j f (I j / σj. ( Note that f (I j takes the length-n j subset of vetor f wth ndes gven by I j. The dea here s to defne the onsensus potental for vew j usng only the data samples observed n vew j. As n (Yu et al., 008, σ j > 0 quantfes how far the latent funton f j s apart from f, and the output potental ψ(y, f (x s defned as λ(y f (x wth logst funton λ(z = (1 + exp( z Co-Tranng Kernel wth Mssng Vews As n (Yu et al., 008, we an also derve a otranng kernel K by ntegratng out all the latent funtons {f j } n (1. It s alulated as K = Λ 1, Λ = m j=1 A j, and eah A j s a n n matrx as A j (I j, I j = (K j + σ j I 1, and 0 otherwse. (3 That s, A j s an expanson of one-vew nformaton matrx (K j + σ j I 1 to the full sze n n, wth the 641

4 Atve Sensng other (unndexed entres flled wth 0. It s easly seen that suh a kernel K s ndeed postve defnte as long as eah one-vew kernel K j s postve defnte. Very mportantly, we note that one addtonal observaton of a (sample, vew par wll affet all the elements of the o-tranng kernel. Ths s exatly the property we would lke to have n atve sensng. 3. Co-Regularzaton wth Mssng Vews To be omplete we also gve the margnalzaton result for o-regularzaton. Ignorng the output y for the moment, ntegratng out the onsensus vew f leads to the followng jont pror: { p(f 1,..., f m = 1 Z exp 1 m f j K 1 j f j j=1 [ 1 [f j (x f k (x] / ] } 1. j<k x I j I k σ j σ k σ I l x l The frst part regularzes the funtonal spae of eah vew, and the seond part onstrans that every par of vews need to agree on the outputs for o-observed samples (nversely weghted by vew varanes and the sum of presons of the vews n whh the sample s observed. 4 Atve Sensng In atve sensng, we are nterested n seletng the best unobserved (sample, vew par for sensng, or for vew aquston, whh wll mprove the overall lassfaton performane. In ths seton we manly dsuss an approah based on the mutual nformaton framework, whh measures the expeted nformaton gan after observng an addtonal (sample, vew par. Another approah based on the predtve unertanty s also brefly dsussed n Seton 4.5. In the followng let D O and D U denote the observed and unobserved (sample, vew pars, respetvely. 4.1 Laplae Approxmaton To alulate the mutual nformaton we need to alulate the dfferental entropy of the onsensus vew funton f. Wth o-tranng kernel and the logst regresson loss, Laplae approxmaton an be appled to approxmate the posteror dstrbuton of f as a Gaussan dstrbuton. In partular, let the pror of the onsensus vew take the GP pror wth o-tranng kernel,.e., f N (0, K. Wth the logst regresson loss, the a posteror dstrbuton of f, p(f D O, y, s approxmately N (ˆf, ( post 1, (4 where ˆf s the maxmum a posteror (MAP estmate of f, and the posteror preson matrx post = K 1 + Φ, wth Φ the Hessan of the negatve loglkelhood. It turns out that Φ s a dagonal matrx, wth Φ(, = η (1 η where η = λ(ˆf (x. The dfferental entropy of f under ths Laplae approxmaton s H(f = n log(πe 1 log det( post, where det( denote matrx determnant. 4. Mutual Informaton for Atve Sensng Remnd that x (j denote the features n the jth vew for the th sample. In atve sensng, the mutual nformaton (MI between the onsensus vew funton f and the unobserved (sample, vew par x (j D U s the expeted derease n entropy of f when x (j s observed, I(f, x (j = E[H(f ] E[H(f x (j ] = 1 log det( post + 1 E [log det( x(,j post ], where the expetaton s wth respet to p(x (j D O, y, the dstrbuton of the unobserved (sample, vew par gven all the observed pars and avalable outputs. x(,j post s the a posteror preson matrx, derved from Seton 4.1, after one par x (j s observed. The maxmum MI rteron has been used before to dentfy the best unlabeled sample n atve learnng (MaKay, 199. Here we adopt ths rteron and hoose the unobserved par whh maxmzes MI: (, j = arg max I(f, x (j D U x (j = arg max E [log det( x(,j post ]. (5 D U x (j 4.3 Densty Modelng In order to alulate the expetaton n (5, we need a ondtonal densty model for the unobserved pars,.e., p(x (j D O, y. Ths of ourse depends on the type of the features n eah vew, and n ths paper we use a speal Gaussan mxture model (GMM. Let the jont nput densty be p(x (1,..., x (m = p(y = +1p(x (1,..., x (m y = +1 + p(y = 1p(x (1,..., x (m y = 1, and eah ondtonal densty takes a omponent-wse fatorzed GMM form, e.g., for postve lass, p(x (1,..., x (m y = +1 = π + N (x (j µ +(j, Σ +(j. j 64

5 Yu, Krshnapuram, Rosales, Rao Here µ +(j and Σ +(j are the mean and ovarane for vew j n omponent, and π + > 0, π+ = 1 are the mxture weghts for the postve lass. Note that although the ondtonal densty for eah mxture omponent s deoupled for dfferent vews, the jont ondtonal densty s not. 3 Under ths model, the jont densty p(x (1,..., x (m s also a GMM, and any margnal (ondtoned on y or not densty s stll a GMM, e.g., p(x (j y = +1 = π+ N (x (j µ +(j, Σ +(j. Now t s easy to alulate p(x (j D O, y. Let x (O be the set of observed vews for x, we need to dstngush two dfferent settngs. When the label y s avalable, e.g., y = +1, we have p(x (j D O, y = p(x (j x (O, y = +1 = π +(j (x (O N (x (j µ +(j, Σ +(j, (6 whh s agan a GMM model, wth the mxng weghts π +(j (x (O = π + k O N (x(k µ +(k, Σ +(k p(x (O y = +1 When the label y s not avalable, we need to ntegrate out the labelng unertanty and ompute p(x (j D O, y = p(x (j x (O = p(y = +1p(x (j x (O, y = +1 + p(y = 1p(x (j x (O, y = 1, whh s a GMM model as well, as seen from ( Expetaton Calulaton We are now ready to ompute the expetaton n (5. The a posteror preson matrx after one (sample, vew par x (j s observed, x(,j post, s alulated as x(,j post = Φ + (K x(,j 1 = Φ + A x(,j j + k j A k, (7 where K x(,j are the new K and A j matres after the new par s observed. Based on (3, to alulate A x(,j j and A x(,j j we need to realulate the kernel for the jth vew, K j, after an addtonal par x (j s observed. Ths s smply done by addng one more row and olumn to the old K j as: [ K x(,j Kj b j j = 3 A straghtforward EM algorthm an be derved to estmate all these parameters. When labels are only avalable for a very lmted number of samples, one mght assume a full generatve GMM model negletng the dependeny on labels (nstead of a ondtonal GMM model. b j a j ],. where a j = κ j (x (j, x (j lth entry as κ j (x (j l part of A x(,j j R, and b j R nj has the, x (j. Then from (3, the non-zero s alulated as ( [ 1 K x(,j j + σj Kj + σ I = j I b j b j a j + σj ] 1 (8 [ Γj + λ = j Γ j b j b j Γ j λ j Γ j b j λ j b j Γ j λ j usng the blok-matrx nverse formula, where Γ j = (K j + σ j I 1 and λ j = 1 a j +σ j b j Γ jb j. As seen from (7 and (8, t s dffult to dretly alulate the expetaton n (5. Sne for any matrx Q, E [log det(q] log det(e [Q] due to the onavty of log det(, we alternatvely take the upper bound log det( E [ x(,j post ] as the seleton rtera. From (7 and (8, ths redues to omputng E[λ j ], E[λ j b j ] and E[λ j b j b j ], where the expetatons are wth respet to p(x (j D O, y, a GMM model (f. Seton 4.3. In general one needs to alulate these expetatons numerally, as dfferent kernel funtons lead to dfferent ntegrals. As another approxmaton one mght assume eah of the GMM omponent s a pont-mass suh that the mean s used for the alulaton. 4.5 Dsusson The mutual nformaton based approah dretly measures the expeted nformaton gan for every (sample, vew par. A dfferent (and smpler approah s based on the predtve unertanty, n whh the most unertan sample (after the urrent lassfer s traned s seleted for vew aquston (see also (Melvlle et al., 004. Ths unertanty (.e., predtve varane s estmated as the dagonal entres of the a posteror ovarane matrx ( post 1, as seen from (4. However t s not lear what vew to aqure for ths sample (f more than one vew s mssng for the sample. The advantage of ths approah s that no densty modelng s neessary for unobserved vews. 5 Empral Study For the followng experments we are gven a lassfaton task wth mssng vews. At eah teraton we are allowed to selet an unobserved (sample, vew par for sensng (.e., feature aquston. We ompare the lassfaton performane on unlabeled data usng the followng three sensng approahes: Atve Sensng MI: The par s seleted based on the mutual nformaton rtera (5. Atve Sensng VAR: A sample s seleted frst whh has the maxmal predtve varane and has ], 643

6 Atve Sensng x ( x (1 x ( x (1 Test AUC Sore Atve Sensng MI Atve Sensng VAR Random Sensng Learn wth Full Features Number of aqured (sample, vew pars n order Fgure 3: Toy example for atve sensng (left. Bg red-square/blue-trangular markers denote +1/ 1 labeled ponts; remanng ponts are unlabeled. Data are sampled from two Gaussans wth mean (,, (, and unt varane. After hdng some of the features the data look lke (mddle wth removed features replaed wth 0. Comparson of atve sensng wth random sensng s shown on the rght. mssng vews, and then one of the mssng vews s randomly seleted for sensng. Random Sensng: A random unobserved (sample, vew par s seleted for sensng. After the par s aqured n eah teraton, learnng s done usng the Bayesan o-tranng model. Note that for all the three approahes, the aqured (sample, vew par wll affet all the samples n the next teraton (va the o-tranng kernel. In atve sensng wth MI, we use EM algorthm to learn the GMM struture wth mssng entres, and the GMM model s re-estmated after eah par s seleted and flled n (ths s fast thanks to the nremental updates n the EM algorthm. 5.1 Toy Data We frst llustrate atve sensng wth a toy example. Fg. 3 (left shows a well separated two-lass problem whh was used n (Yu et al., 008, wth bg squares and trangles representng the labeled postve and negatve samples, and blak dots denotng unlabeled ponts. To smulate our atve sensng experment, we randomly hde one of the two features of eah sample wth 40% probablty eah, and wth 0% probablty observe both features. The fnal nomplete tranng data are shown n Fg. 3 (mddle wth the nomplete samples shown along the frst or seond axs. It an be seen that only fully observed postve and negatve samples are avalable. For atve sensng MI we use the Gaussan kernel wth wdth 0.5, and let the GMM hoose the number of lusters automatally. Standard transdutve settng s appled where all the unlabeled data are avalable for o-tranng kernel alulaton. In Fg. 3 (rght we ompare atve sensng wth random sensng, usng the Area Under the ROC Curve (AUC for the unlabeled data. The x-axs labels eah aqured par n order. Ths ndates that atve sensng s muh better than random sensng n mprovng the lassfaton performane. The Bayes optmal auray (reahable when there s no mssng data s reahed by the 16th query by atve sensng whereas random sensng mproves muh slower wth the number of aqured pars. The two atve sensng algorthms show smlar results. 5. Survval Predton for Lung Caner We onsder -year survval predton for advaned non-small ell lung aner (NSCLC patents treated wth (hemo-radotherapy. Ths s urrently a very hallengng problem n lnal researh, sne the prognoss of ths group of patents s very poor (less than 40% survve two years. Currently most models n the lterature rely on varous lnal fators of the patent suh as gender and the WHO performane status. Very reently, magng-related fators suh as the sze of the tumor and the number of postve lymph node statons are shown to be better predtors (Dehng-Oberje et al., 009. However, t s expensve to obtan the mages and to manually measure these fators. Therefore we study how to selet the best set of patents to go through magng to get addtonal features. All the relevant fators are lsted n Fg. 4 (left wth short desrptons. These fators are all known to be predtve (Dehng-Oberje et al., 009. From Bayesan o-tranng pont of vew we have vews, wth 3 features n the frst (lnal feature vew and features n the seond (magng-based feature vew. Our study ontans 33 advaned NSCLC patents treated at the MAASTRO Cln n the Netherlands from 00 to 006, among whh 77 survved years (labeled +1. All the features are avalable for these patents, and are normalzed to have zero mean and 644

7 Yu, Krshnapuram, Rosales, Rao 0.67 Features for NSCLC -years Survval Predton Feature Desrpton Vew GENDER 1-Male, -Female 1st WHO WHO performane status 1st FEV1 Fored expratory volume n 1 seond 1st GTV Gross tumor volume nd NPLN Number of postve lymph node statons nd Test AUC Sore Atve Sensng MI Atve Sensng VAR Random Sensng Number of aqured (sample, vew pars n order Fgure 4: Experments on NSCLC survval predton. The features for the vews are lsted n the left table, and the performane omparson of atve sensng and random sensng s shown n the rght fgure. As baselnes, tranng wth full features (.e., no sensng needed yelds 0.73; tranng wth mean mputaton (.e., usng the mean of eah feature to fll n the mssng entres yelds 0.6. unt varane before tranng. We randomly hoose 30% of the patents as tranng samples (wth labels known, and the rest 70% as unlabeled samples. We use lnear kernel for eah vew, and let the GMM algorthm automatally hoose the number of lusters. As the atve sensng setup, the frst vew s avalable for all the patents, and the seond vew s avalable only for randomly hosen 50% patents. So our goal s to sequentally selet patents to aqure features n vew, suh that the overall lassfer performane s maxmzed. Fg. 4 (rght shows the test AUC sores (wth error-bars of atve sensng and random sensng, wth dfferent number of aqured pars. Performane s averaged over 0 runs wth randomly hosen 50% patents at the start. Atve sensng n general yelds better performane, and s sgnfantly better after 5 frst pars. Atve sensng based on MI and VAR agan yeld very smlar results. We have also tested other expermental settngs, and the omparson s not senstve to ths setup. 5.3 Pathologal Complete Response (pcr Predton for Retal Caner Our seond example s to predt tumor response after hemo-radotherapy for loally advaned retal aner. Ths s mportant n ndvdualzng treatment strateges, sne patents wth a patholog omplete response (pcr after therapy,.e., wth no evdene of vable tumor on patholog analyss, would need less nvasve surgery or another radotherapy strategy nstead of reseton. Most avalable models ombne lnal fators suh as gender and age, and pretreatment magng-based fators suh as tumor length and SUV max (from CT/PET magng, but t s expeted that addng magng data olleted after therapy would lead to a better predtve model (though wth a hgher ost. In ths study we show how to effetvely selet patents to go through pre-treatment and post-treatment magng to better predt pcr. We use the data from (Capr et al., 007 whh ontans 78 prospetvely olleted retal aner patents. All patents underwent a CT/PET san before treatment and 4 days after treatment, and 1 of them had pcr (labeled +1. We splt all the features nto 3 vews (lnal, pre-treatment magng, post-treatment magng, and the features are lsted n Fg. 5 (left. For atve sensng, we assume that all the (labeled or unlabeled patents have vew 1 features avalable, 70% of the patents have vew features avalable, and 40% of the patents have vew 3 features avalable. Ths s to aount for the fat that vew 3 features are most expensve to get. All the other settngs are the same as the NSCLC survval predton study. Fg. 5 (rght shows the performane omparson of atve sensng wth random sensng, and t s seen that after about 18 par aqustons, atve sensng s sgnfantly better than random sensng. Atve sensng MI and VAR share a smlar trend, and the MI based atve sensng s overall better than VAR based atve sensng. The dfferene s however not statstally sgnfant. The optmal AUC (when there are no mssng features s shown as a dotted lne, and we see that wth around 34 atvely aqured pars, atve sensng an almost aheve the optmum. It takes however muh longer for random sensng to reah ths performane. 6 Conluson and Future Work Ths paper makes two prmary ontrbutons. Frst of all, for the purpose of atve sensng we extend 645

8 Atve Sensng Features for pcr Predton n Retal Caner Feature Desrpton Vew GENDER 1-Male, -Female 1st AGE Age n years 1st STAGE Stagng of aner 1st LENGTH Max dameter of the tumor nd SUVPre SUV max before treatment nd SUV Absolute dfferene of SUV max before and after treatment 3rd RI Response Index, SUV n % 3rd Test AUC Sore Atve Sensng MI Atve Sensng VAR Random Sensng Learn wth Full Features Number of aqured (sample, vew pars n order Fgure 5: Experments on pcr predton for retal aner. The features for the 3 vews are lsted n the left table, and the performane omparson of atve sensng and random sensng s shown n the rght fgure. As baselnes, tranng wth full features (.e., no sensng needed yelds 0.74 (shown as a dotted lne; tranng wth mean mputaton (.e., usng the mean of eah feature to fll n the mssng entres yelds 0.55 (not shown. the Bayesan o-tranng framework to handle real-lfe data where objets are often nompletely haraterzed,.e., only a subset of vews are avalable for ertan samples. Seond, we ntrodue two approahes for atve sensng, based on mutual nformaton and predtve varane, respetvely, whh automatally dedes whh (sample, vew par should be aqured further to get the most beneft. Note that one atvely aqured par would mprove the overall mult-vew lassfaton performane for all the unlabeled samples. Expermental results on two real medal lassfaton problems ndate that the proposed approah s ndeed more aurate than randomly aqurng unobserved (sample, vew pars. As part of the future work, we wll take nto aount the atual ost nvolved n the vew aquston for better deson makng. Ths mght be mportant, for nstane, n medal dagnoss where Ultrasound and MRI ndue qute dfferent osts. Another step s to ombne atve sensng wth atve learnng, suh that one an query both an unobserved (sample, vew par, and an unobserved label. Referenes M. Blg and L. Getoor. VOILA: Effent Feature-value Aquston for Classfaton. In AAAI, 007. A. Blum and T. Mthell. Combnng labeled and unlabeled data wth o-tranng. In COLT, C. Capr, L. Rampn, P. Erba, F. Galeott, G. Crepald, E. Bant, M. Gava, S. Fant, G. Maran, P. Muzzo, and D. Rubello. Sequental FDG-PET/CT relably predts response of loally advaned retal aner to neoadjuvant hemo-radaton therapy. Eur J Nul Med Mol Imagng, 34, 007. D. Cohn, Z. Ghahraman, and M. Jordan. Atve learnng wth statstal models. Journal of Artfal Intellgene Researh, 4:19 145, T. Cover and J. Thomas. Elements of Informaton Theory. Wley Intersene, S. Dasgupta, M. Lttman, and D. MAllester. PAC generalzaton bounds for o-tranng. In NIPS, 001. C. Dehng-Oberje, S. Yu, D. De Ruyssher, S. Meershout, K. van Beek, Y. Levens, J. van Meerbeek, W. de Neve, G. Fung, B. Rao, S. Krshnan, H. van der Wede, and P. Lambn. Development and external valdaton of a predton model for -year survval of non-small ell lung aner patents treated wth (hemo radotherapy. To appear n Int J Radat Onol Bol Phys, 009. V. Fedorov. Theory of Optmal Experments. Aadem Press, 197. P. Flaherty, M. Jordan, and A. Arkn. Robust desgn of bologal experments. In NIPS, 006. A. Krause, A. Sngh, and C. Guestrn. Near-optmal sensor plaements n gaussan proesses:theory, effent algorthms and empral studes. JMLR, 9:35 84, 008. D. Lews and W. Gale. A sequental algorthm for tranng text lassfers. In SIGIR, pages 3 1, D. Lndley. On a measure of the nformaton provded by an experment. Ann. Math. Stat, 7: , D. MaKay. Informaton-based objetve funtons for atve data seleton. Neural Comp., 4: , 199. P. Melvlle, M. Saar-Tsehansky, F. Provost, and R. Mooney. Atve feature-value aquston for lassfer nduton. In ICDM, 004. N. Roy and A. MCallum. Toward optmal atve learnng through samplng estmaton of error reduton. In ICML, pages , 001. S. Seung, M. Opper, and H. Sompolnsky. Query by ommttee. In Ffth Workshop on Computatonal Learnng Theory, pages 87 94, 199. S. Yu, B. Krshnapuram, R. Rosales, H. Stek, and B. Rao. Bayesan o-tranng. In NIPS,

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers II. Random Varables Random varables operate n much the same way as the outcomes or events n some arbtrary sample space the dstncton s that random varables are smply outcomes that are represented numercally.