Likelihood Fits. Craig Blocker Brandeis August 23, 2004

Similar documents
3: Central Limit Theorem, Systematic Errors

MgtOp 215 Chapter 13 Dr. Ahn

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

4. Greek Letters, Value-at-Risk

Introduction to PGMs: Discrete Variables. Sargur Srihari

Physics 4A. Error Analysis or Experimental Uncertainty. Error

Notes on experimental uncertainties and their propagation

Midterm Exam. Use the end of month price data for the S&P 500 index in the table below to answer the following questions.

Tests for Two Correlations

/ Computational Genomics. Normalization

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8

OCR Statistics 1 Working with data. Section 2: Measures of location

Random Variables. b 2.

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/15/2017. Behavioral Economics Mark Dean Spring 2017

Problem Set 6 Finance 1,

A Set of new Stochastic Trend Models

Financial mathematics

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

A Bootstrap Confidence Limit for Process Capability Indices

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

Correlations and Copulas

Chapter 5 Student Lecture Notes 5-1

Survey of Math Test #3 Practice Questions Page 1 of 5

Applications of Myerson s Lemma

OPERATIONS RESEARCH. Game Theory

Linear Combinations of Random Variables and Sampling (100 points)

Cracking VAR with kernels

Jeffrey Ely. October 7, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Multifactor Term Structure Models

Simple Regression Theory II 2010 Samuel L. Baker

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

CHAPTER 3: BAYESIAN DECISION THEORY

Scribe: Chris Berlind Date: Feb 1, 2010

Foundations of Machine Learning II TP1: Entropy

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

Quiz on Deterministic part of course October 22, 2002

Evaluating Performance

The Integration of the Israel Labour Force Survey with the National Insurance File

Economics 1410 Fall Section 7 Notes 1. Define the tax in a flexible way using T (z), where z is the income reported by the agent.

Elements of Economic Analysis II Lecture VI: Industry Supply

Problems to be discussed at the 5 th seminar Suggested solutions

Introduction. Chapter 7 - An Introduction to Portfolio Management

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Data Mining Linear and Logistic Regression

Understanding Annuities. Some Algebraic Terminology.

Capability Analysis. Chapter 255. Introduction. Capability Analysis

Analysis of Variance and Design of Experiments-II

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

Graphical Methods for Survival Distribution Fitting

Tests for Two Ordered Categorical Variables

Chapter 3 Student Lecture Notes 3-1

Bayesian Inference With Log-Fourier Arrival Time Models and Event Location Data

arxiv:cond-mat/ v1 [cond-mat.other] 28 Nov 2004

Module Contact: Dr P Moffatt, ECO Copyright of the University of East Anglia Version 2

PhysicsAndMathsTutor.com

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da *

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Dr. Wayne A. Taylor

Risk and Return: The Security Markets Line

Mathematical Thinking Exam 1 09 October 2017

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator.

Ch Rival Pure private goods (most retail goods) Non-Rival Impure public goods (internet service)

MATHEMATICAL MODELLING METHODS FOR TIME SERIES

Price and Quantity Competition Revisited. Abstract

1 Omitted Variable Bias: Part I. 2 Omitted Variable Bias: Part II. The Baseline: SLR.1-4 hold, and our estimates are unbiased

On estimating the location parameter of the selected exponential population under the LINEX loss function

Chapter 3 Descriptive Statistics: Numerical Measures Part B

S yi a bx i cx yi a bx i cx 2 i =0. yi a bx i cx 2 i xi =0. yi a bx i cx 2 i x

ASSESSING GOODNESS OF FIT OF GENERALIZED LINEAR MODELS TO SPARSE DATA USING HIGHER ORDER MOMENT CORRECTIONS

Introduction. Why One-Pass Statistics?

A Case Study for Optimal Dynamic Simulation Allocation in Ordinal Optimization 1

Natural Resources Data Analysis Lecture Notes Brian R. Mitchell. IV. Week 4: A. Goodness of fit testing

Global sensitivity analysis of credit risk portfolios

The Hiring Problem. Informationsteknologi. Institutionen för informationsteknologi

YORK UNIVERSITY Faculty of Science Department of Mathematics and Statistics MATH A Test #2 November 03, 2014

Supplementary material for Non-conjugate Variational Message Passing for Multinomial and Binary Regression

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

Skewness and kurtosis unbiased by Gaussian uncertainties

Using Conditional Heteroskedastic

Maturity Effect on Risk Measure in a Ratings-Based Default-Mode Model

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 12

Clearing Notice SIX x-clear Ltd

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

Final Examination MATH NOTE TO PRINTER

Increasing the Accuracy of Option Pricing by Using Implied Parameters Related to Higher Moments. Dasheng Ji. and. B. Wade Brorsen*

The convolution computation for Perfectly Matched Boundary Layer algorithm in finite differences

DOUBLE IMPACT. Credit Risk Assessment for Secured Loans. Jean-Paul Laurent ISFA Actuarial School University of Lyon & BNP Paribas

Spatial Variations in Covariates on Marriage and Marital Fertility: Geographically Weighted Regression Analyses in Japan

Discounted Cash Flow (DCF) Analysis: What s Wrong With It And How To Fix It

The Efficiency of Uniform- Price Electricity Auctions: Evidence from Bidding Behavior in ERCOT

MULTIPLE CURVE CONSTRUCTION

Transcription:

Lkelhood Fts Crag Blocker Brandes August 23, 2004 Outlne I. What s the queston? II. Lkelhood Bascs III. Mathematcal Propertes IV. Uncertantes on Parameters V. Mscellaneous VI. Goodness of Ft VII. Comparson Wth Other Ft Methods

What s the Queston? Often n HEP, we make a seres of measurements and wsh to deduce the value of a fundamental parameter. For example, we may measure the mass of many B f J/y K S decays and then wsh to get the best estmate of the B mass. Or we mght measure the effcency for detectng such events as a functon of momentum and then wsh to derve a functonal form. The queston s: What s the best way to do ths? (And what does best mean?) 2

Lkelhood Method P(X a) Probablty of measurng X on a gven event. a s a parameter or set of parameters on whch P depends. Suppose we make a seres of measurements, yeldng a set of X s. The lkelhood functon s defned as L = = P( X a) The value of a that maxmzes L s known as the Maxmum Lkelhood Estmator (MLE) of a, whch we wll denote as a *. ote that we often work wth ln(l). 3

Example Suppose we have measurements of a varable x, whch we beleve s Gaussan, and wsh to get the best estmate of the mean and wdth. [ote that ths and other examples wll be doable analytcally - usually we must use numercal methods]. P(x ln L m, s ) = = = ln P 2ps e - ( x-m ) 2s 2 2 ( ) ( ) ( x - m) x = - ln 2ps - = Maxmzng wth respect to m and s gves s 2 2 m * = x = x s *2 = ( x - m * ) 2 = x 2 - x 2 Dscussed later. 4

Warnng ether L(a) nor ln(l) s a probablty dstrbuton for a. A frequentst would say that such a statement s just nonsense, snce parameters of nature have defnte values. A Bayesan would say that you can convert L(a) to a probablty dstrbuton n a by applyng Bayes thereom, whch ncludes a pror probablty dstrbuton for a. Bayesan versus Frequentst statstcs s a can of worms that I won t open further n ths talk. 5

Bas, Consstency, and Effcency What does best mean? We want estmator to be close to the true value. Unbased Consstent a * = a 0 Unbased for large Effcent ( a * - a ) 2 0 s mnmal for large Maxmum Lkelhood Estmators are OT necessarly unbased but are consstent and effcent for large. Ths makes MLE s powerful and popular (although we must be aware that we may not be n the large lmt). 6

Bas Example Consder agan, the mean and wdth of a Gaussan. m s * * 2 = = x = ( * ) 2 2 x - m = s x = - m 0 0 = m 0 ote that the MLE of the mean s unbased, but for the wdth s not (although t s consstent). However, s 2 = ( x - m * ) 2 - s unbased. In ths case, we could fnd the bas analytcally - n most cases we must look for t numercally. 7

Bas Example 2 Bas can depend upon choce of parameter. Consder an exponental lfetme dstrbuton. We can use ether the averge lfetme t or the decay wdth G as the parameter. P(t) t - t = e t = Ge -Gt t G * * = = t t s unbased. s based. 8

Uncertanty on Parameters Just as mportant as gettng an estmate of a parameter s knowng the uncertanty of that estmate. The maxmum lkelhood method also provdes an estmate of the uncertanty. For one parameter, L becomes Gaussan for large. Thus, ln L @ ln L * + 2 Da 2 ( a * - a 0 ) 2 = - We usually wrte ths as If 2 ln L ( a - a * ) 2 a 2 a=a * a = a * Da, ln L = ln L * - 2 2 ln L a 2 a =a * α = α * ± α ote that ths s a statement about the probablty of the measurements, not the probablty of the true value. 9

Uncertanty Example 3 ln(l * ) ln(l * ) - 0.5 2 ln(l) 0 a - a * a + 0 2 4 6 8 0 a 0

Asymmetrc Uncertantes Sometmes lnl may not be parabolc and there may be asymmetrc uncertantes. ln(l * ) ln(l * ) - 0.5 3 2 We wrte a = a * + Da -Da + - ln(l) 0 a - a * a + 0 2 3 4 5 6 a ote: the DlnL = /2 nterval does OT always gve a 68% confdence nterval (see counterexample n handout).

Correlatons If there are multple parameters, thngs are more complcated due to possble correlatons. 2.5 3 2 b.5 DlnL =0.5 For example, a ft to the lnear functon y = a x + b wll have correlatons between parameters a and b 0.5 0 a - 0 2 3 4 a + 5 a umercally, a are gven by where Dln(L) = /2 and ln(l) s maxmzed wrt other parameters Covarance matrx V s gven by V j = ( a - a )( a j - a j ) V s equal to U -, where U j = - 2 ln L a a j 2

ormalzaton Sometmes, people wll say they don t need to normalze ther probablty dstrbutons. Ths s sometmes true. For the Gaussan example, f we omtted the normalzaton factor of / 2ps we get the mean correct but not the wdth. In general, f the normalzaton depends on any of the parameters of nterest, t must be ncluded. My advce s always normalze (and always check the normalzaton). 3

Extended Lkelhood Suppose we have a Gaussan mass dstrbuton wth a flat background and wsh to determne the number of events n the Gaussan. P = f S 2ps e- ( M-M 0 ) 2 2s 2 + ( - f S ) DM where f S s the fracton of sgnal events and DM s the mass range of the ft. We can ft for f S and get Df S. f S s a good estmate of the number of events n the Gaussan, but Df S s not a good estmate of the varaton on the number of sgnal events. We can fx ths by addng a Posson term n the total number of events. Ths s called an Extended Lkelhood ft. 2 2 2 2 2 2 2 2 We could also use D = Df + f D = Df + f 4 S S S S S

Extended Lkelhood 2 Instead of f S, we use m S and m BG, the expected number of sgnal and background events. s the observed total number of events. L ln(l) = = e e - - = -m ( ms + mbg ) ( m + m ) ( m + m ) S S! - BG m BG S! = Ø Œm º S BG - ln(!) + = Ø Œ º ( M, s ) G M 0 m S Ø lnœm º ms + m S + m BG BG D mbg ø G + DM œ ß ( M, s ) G M M ø œ ß 0 + m S mbg + m BG D M ø œ ß If you are not nterested n the uncertanty on S (for example, your are measurng a lfetme and not a cross secton), I recommend not dong an extended lkelhood ft. 5

Constraned Fts Suppose there s a parameter n the lkelhood that s somewhat known from elsewhere. Ths nformaton can be ncorporated n the ft. For example, we are fttng for the mass of a partcle decay wth resoluton s. Suppose the Partcle Data Book lsts the mass as M 0 ± s M. We can ncorporate ths nto the lkelhood functon as L = 2ps M e ( - M-M 0) 2 Ø Œ Œ º 2 2s M Ths s known as a constraned ft. 2ps e - ( m -M) 2 2s 2 ø œ œ ß 6

Constraned Fts 2 Let DM be the uncertanty on M that could be determned by the ft alone. If DM >> s M, constrant wll domnate, and you mght as well just fx M to M 0. For example, you never see a constraned ft to h n an HEP experment. If DM << s M, constrant does very lttle. You have a better measurement than the PDG. You should do an unconstraned ft and PUBLISH. Constraned ft s most useful f s M and DM are comparable. 7

Smple Monte Carlo Tests It s possble to wrte smple, short, fast Monte Carlo programs that generate data for fttng. Can then look at ft values, uncertantes, and pulls. These are often called toy Monte Carlos to dfferentate them from complcated event and detector smulaton programs. Tests lkelhood functon. Tests for bas. Tests that uncertanty from ft s correct. Ths does OT test the correctness of the model of the data. For example, f you thnk that some data s Gaussan dstrbuted, but t s really Lorentzan, then the smple Monte Carlo test wll not reveal ths. 8

Smple Monte Carlo Tests 2 Generate exponental (t = 0.5 and = 000). Ft. Repeat many tmes (000 tmes here). Hstrogram t, s t, and pulls. 9

Smple Monte Carlo Tests 3 20

Goodness of Ft Unfortunately, the lkelhood method does not, n general, provde a measure of the goodness of ft (as a c 2 ft does). For example, consder fttng lfetme data to an exponental. L t * L = = ( * t ) e t t t - t = - + ln Ł t ł Thus the value of L at the maxmum depends only on the number of events and average value of the data. 2

Goodness of Ft 2 Ft to exponental Plot log(l*) for () exponental Monte Carlo and (2) Gaussan data 22

Goodness of Ft 3 23

Other Types of Fts Ch-square: If data s bnned and uncertantes are Gaussan, then maxmum lkelhood s equvalent to a c 2 ft. Bnned Lkelhood: If data s bnned and not Gaussan, can stll do a bnned lkelhood ft. Common case s when data are Posson dstrbuted. P ln = L e = -m bns ( m ) n! n ln P 24

Comparson of Fts Ch-square: Goodness of ft. Can plot functon wth bnned data. Data should be Gaussan, n partcular, c 2 doesn t work well wth bns wth a small number of events. Bnned lkelhood: Goodness of ft Can plot functon wth bnned data. Stll need to be careful of bns wth small number of events (don t add n too many zero bns). Unbnned lkelhood: Usually most powerful. Don t need to bn data. Works well for mult-dmensonal data. o goodness of ft estmate. Can t plot ft wth data (unless you bn data). 25

Comparson of Fts 2 Generate 00 values for Gaussan wth m = 0, s =. Ft unbnned lkelhood and c 2 to SAME data. Repeat 0,000 tmes. Both are unbased. Unbnned lkelhood s more effcent. 26

Comparson of Fts 3 Ft values are correlated, but not completely. Dfference s of the order of half of the uncertanty. 27

Comparson of Fts 4 Ft to wdth s based for both. But, unbnned lkelhood wdths tend to true value for large. 28

umercal Methods Even slght complcatons to the probablty make analytc methods ntractable. Also, lkelhood fts often have many parameters (perhaps scores) and can t be done analytcally. However, numercal methods are stll very effectve. MIUIT s a powerful program from CER for dong maxmum lkelhood fts (see references n handout). 29

Systematc Uncertantes When fttng for one parameter, there often are other parameters that are mperfectly known. It s temptng to estmate the systematc uncertanty due to these parameters by varyng them and redong the ft. Because of statstcal varatons, ths overestmates the systematc uncertanty (often called double countng). Best way to estmate such systematcs s probably wth a hgh statstcs Monte Carlo program. 30

Potentally Interestng Web Stes CDF Statstcs Commttee page: www-cdf.fnal.gov/physcs/statstcs/statstcs_home.html Lectures by Lous Lyons: www-ppd.fnal.gov/eppoffce-w/academc_lectures 3

Summary Maxmum Lkelhood methods are a powerful tool for extractng measured parameters from data. However, t s mportant to understand ther proper use and avod potental problems. 32