Lkelhood Fts Crag Blocker Brandes August 23, 2004 Outlne I. What s the queston? II. Lkelhood Bascs III. Mathematcal Propertes IV. Uncertantes on Parameters V. Mscellaneous VI. Goodness of Ft VII. Comparson Wth Other Ft Methods
What s the Queston? Often n HEP, we make a seres of measurements and wsh to deduce the value of a fundamental parameter. For example, we may measure the mass of many B f J/y K S decays and then wsh to get the best estmate of the B mass. Or we mght measure the effcency for detectng such events as a functon of momentum and then wsh to derve a functonal form. The queston s: What s the best way to do ths? (And what does best mean?) 2
Lkelhood Method P(X a) Probablty of measurng X on a gven event. a s a parameter or set of parameters on whch P depends. Suppose we make a seres of measurements, yeldng a set of X s. The lkelhood functon s defned as L = = P( X a) The value of a that maxmzes L s known as the Maxmum Lkelhood Estmator (MLE) of a, whch we wll denote as a *. ote that we often work wth ln(l). 3
Example Suppose we have measurements of a varable x, whch we beleve s Gaussan, and wsh to get the best estmate of the mean and wdth. [ote that ths and other examples wll be doable analytcally - usually we must use numercal methods]. P(x ln L m, s ) = = = ln P 2ps e - ( x-m ) 2s 2 2 ( ) ( ) ( x - m) x = - ln 2ps - = Maxmzng wth respect to m and s gves s 2 2 m * = x = x s *2 = ( x - m * ) 2 = x 2 - x 2 Dscussed later. 4
Warnng ether L(a) nor ln(l) s a probablty dstrbuton for a. A frequentst would say that such a statement s just nonsense, snce parameters of nature have defnte values. A Bayesan would say that you can convert L(a) to a probablty dstrbuton n a by applyng Bayes thereom, whch ncludes a pror probablty dstrbuton for a. Bayesan versus Frequentst statstcs s a can of worms that I won t open further n ths talk. 5
Bas, Consstency, and Effcency What does best mean? We want estmator to be close to the true value. Unbased Consstent a * = a 0 Unbased for large Effcent ( a * - a ) 2 0 s mnmal for large Maxmum Lkelhood Estmators are OT necessarly unbased but are consstent and effcent for large. Ths makes MLE s powerful and popular (although we must be aware that we may not be n the large lmt). 6
Bas Example Consder agan, the mean and wdth of a Gaussan. m s * * 2 = = x = ( * ) 2 2 x - m = s x = - m 0 0 = m 0 ote that the MLE of the mean s unbased, but for the wdth s not (although t s consstent). However, s 2 = ( x - m * ) 2 - s unbased. In ths case, we could fnd the bas analytcally - n most cases we must look for t numercally. 7
Bas Example 2 Bas can depend upon choce of parameter. Consder an exponental lfetme dstrbuton. We can use ether the averge lfetme t or the decay wdth G as the parameter. P(t) t - t = e t = Ge -Gt t G * * = = t t s unbased. s based. 8
Uncertanty on Parameters Just as mportant as gettng an estmate of a parameter s knowng the uncertanty of that estmate. The maxmum lkelhood method also provdes an estmate of the uncertanty. For one parameter, L becomes Gaussan for large. Thus, ln L @ ln L * + 2 Da 2 ( a * - a 0 ) 2 = - We usually wrte ths as If 2 ln L ( a - a * ) 2 a 2 a=a * a = a * Da, ln L = ln L * - 2 2 ln L a 2 a =a * α = α * ± α ote that ths s a statement about the probablty of the measurements, not the probablty of the true value. 9
Uncertanty Example 3 ln(l * ) ln(l * ) - 0.5 2 ln(l) 0 a - a * a + 0 2 4 6 8 0 a 0
Asymmetrc Uncertantes Sometmes lnl may not be parabolc and there may be asymmetrc uncertantes. ln(l * ) ln(l * ) - 0.5 3 2 We wrte a = a * + Da -Da + - ln(l) 0 a - a * a + 0 2 3 4 5 6 a ote: the DlnL = /2 nterval does OT always gve a 68% confdence nterval (see counterexample n handout).
Correlatons If there are multple parameters, thngs are more complcated due to possble correlatons. 2.5 3 2 b.5 DlnL =0.5 For example, a ft to the lnear functon y = a x + b wll have correlatons between parameters a and b 0.5 0 a - 0 2 3 4 a + 5 a umercally, a are gven by where Dln(L) = /2 and ln(l) s maxmzed wrt other parameters Covarance matrx V s gven by V j = ( a - a )( a j - a j ) V s equal to U -, where U j = - 2 ln L a a j 2
ormalzaton Sometmes, people wll say they don t need to normalze ther probablty dstrbutons. Ths s sometmes true. For the Gaussan example, f we omtted the normalzaton factor of / 2ps we get the mean correct but not the wdth. In general, f the normalzaton depends on any of the parameters of nterest, t must be ncluded. My advce s always normalze (and always check the normalzaton). 3
Extended Lkelhood Suppose we have a Gaussan mass dstrbuton wth a flat background and wsh to determne the number of events n the Gaussan. P = f S 2ps e- ( M-M 0 ) 2 2s 2 + ( - f S ) DM where f S s the fracton of sgnal events and DM s the mass range of the ft. We can ft for f S and get Df S. f S s a good estmate of the number of events n the Gaussan, but Df S s not a good estmate of the varaton on the number of sgnal events. We can fx ths by addng a Posson term n the total number of events. Ths s called an Extended Lkelhood ft. 2 2 2 2 2 2 2 2 We could also use D = Df + f D = Df + f 4 S S S S S
Extended Lkelhood 2 Instead of f S, we use m S and m BG, the expected number of sgnal and background events. s the observed total number of events. L ln(l) = = e e - - = -m ( ms + mbg ) ( m + m ) ( m + m ) S S! - BG m BG S! = Ø Œm º S BG - ln(!) + = Ø Œ º ( M, s ) G M 0 m S Ø lnœm º ms + m S + m BG BG D mbg ø G + DM œ ß ( M, s ) G M M ø œ ß 0 + m S mbg + m BG D M ø œ ß If you are not nterested n the uncertanty on S (for example, your are measurng a lfetme and not a cross secton), I recommend not dong an extended lkelhood ft. 5
Constraned Fts Suppose there s a parameter n the lkelhood that s somewhat known from elsewhere. Ths nformaton can be ncorporated n the ft. For example, we are fttng for the mass of a partcle decay wth resoluton s. Suppose the Partcle Data Book lsts the mass as M 0 ± s M. We can ncorporate ths nto the lkelhood functon as L = 2ps M e ( - M-M 0) 2 Ø Œ Œ º 2 2s M Ths s known as a constraned ft. 2ps e - ( m -M) 2 2s 2 ø œ œ ß 6
Constraned Fts 2 Let DM be the uncertanty on M that could be determned by the ft alone. If DM >> s M, constrant wll domnate, and you mght as well just fx M to M 0. For example, you never see a constraned ft to h n an HEP experment. If DM << s M, constrant does very lttle. You have a better measurement than the PDG. You should do an unconstraned ft and PUBLISH. Constraned ft s most useful f s M and DM are comparable. 7
Smple Monte Carlo Tests It s possble to wrte smple, short, fast Monte Carlo programs that generate data for fttng. Can then look at ft values, uncertantes, and pulls. These are often called toy Monte Carlos to dfferentate them from complcated event and detector smulaton programs. Tests lkelhood functon. Tests for bas. Tests that uncertanty from ft s correct. Ths does OT test the correctness of the model of the data. For example, f you thnk that some data s Gaussan dstrbuted, but t s really Lorentzan, then the smple Monte Carlo test wll not reveal ths. 8
Smple Monte Carlo Tests 2 Generate exponental (t = 0.5 and = 000). Ft. Repeat many tmes (000 tmes here). Hstrogram t, s t, and pulls. 9
Smple Monte Carlo Tests 3 20
Goodness of Ft Unfortunately, the lkelhood method does not, n general, provde a measure of the goodness of ft (as a c 2 ft does). For example, consder fttng lfetme data to an exponental. L t * L = = ( * t ) e t t t - t = - + ln Ł t ł Thus the value of L at the maxmum depends only on the number of events and average value of the data. 2
Goodness of Ft 2 Ft to exponental Plot log(l*) for () exponental Monte Carlo and (2) Gaussan data 22
Goodness of Ft 3 23
Other Types of Fts Ch-square: If data s bnned and uncertantes are Gaussan, then maxmum lkelhood s equvalent to a c 2 ft. Bnned Lkelhood: If data s bnned and not Gaussan, can stll do a bnned lkelhood ft. Common case s when data are Posson dstrbuted. P ln = L e = -m bns ( m ) n! n ln P 24
Comparson of Fts Ch-square: Goodness of ft. Can plot functon wth bnned data. Data should be Gaussan, n partcular, c 2 doesn t work well wth bns wth a small number of events. Bnned lkelhood: Goodness of ft Can plot functon wth bnned data. Stll need to be careful of bns wth small number of events (don t add n too many zero bns). Unbnned lkelhood: Usually most powerful. Don t need to bn data. Works well for mult-dmensonal data. o goodness of ft estmate. Can t plot ft wth data (unless you bn data). 25
Comparson of Fts 2 Generate 00 values for Gaussan wth m = 0, s =. Ft unbnned lkelhood and c 2 to SAME data. Repeat 0,000 tmes. Both are unbased. Unbnned lkelhood s more effcent. 26
Comparson of Fts 3 Ft values are correlated, but not completely. Dfference s of the order of half of the uncertanty. 27
Comparson of Fts 4 Ft to wdth s based for both. But, unbnned lkelhood wdths tend to true value for large. 28
umercal Methods Even slght complcatons to the probablty make analytc methods ntractable. Also, lkelhood fts often have many parameters (perhaps scores) and can t be done analytcally. However, numercal methods are stll very effectve. MIUIT s a powerful program from CER for dong maxmum lkelhood fts (see references n handout). 29
Systematc Uncertantes When fttng for one parameter, there often are other parameters that are mperfectly known. It s temptng to estmate the systematc uncertanty due to these parameters by varyng them and redong the ft. Because of statstcal varatons, ths overestmates the systematc uncertanty (often called double countng). Best way to estmate such systematcs s probably wth a hgh statstcs Monte Carlo program. 30
Potentally Interestng Web Stes CDF Statstcs Commttee page: www-cdf.fnal.gov/physcs/statstcs/statstcs_home.html Lectures by Lous Lyons: www-ppd.fnal.gov/eppoffce-w/academc_lectures 3
Summary Maxmum Lkelhood methods are a powerful tool for extractng measured parameters from data. However, t s mportant to understand ther proper use and avod potental problems. 32