Chapter 7: Estimation Sections

Similar documents
Chapter 7: Estimation Sections

Chapter 7: Estimation Sections

Chapter 8: Sampling distributions of estimators Sections

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Chapter 8: Sampling distributions of estimators Sections

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Non-informative Priors Multiparameter Models

Lecture 17: More on Markov Decision Processes. Reinforcement learning

CS340 Machine learning Bayesian statistics 3

Back to estimators...

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 5. Statistical inference for Parametric Models

Confidence Intervals Introduction

CS340 Machine learning Bayesian model selection

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

6. Genetics examples: Hardy-Weinberg Equilibrium

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

(5) Multi-parameter models - Summarizing the posterior

Lecture 10: Point Estimation

CS 361: Probability & Statistics

Bayesian course - problem set 3 (lecture 4)

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Common one-parameter models

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Conjugate Models. Patrick Lam

Multi-armed bandit problems

Generating Random Numbers

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Metropolis-Hastings algorithm

Point Estimation. Copyright Cengage Learning. All rights reserved.

Extracting Information from the Markets: A Bayesian Approach

A New Hybrid Estimation Method for the Generalized Pareto Distribution

EE641 Digital Image Processing II: Purdue University VISE - October 29,

Chapter 5. Sampling Distributions

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Statistical estimation

STA 532: Theory of Statistical Inference

Chapter 8. Introduction to Statistical Inference

Chapter 4: Asymptotic Properties of MLE (Part 3)

IEOR E4602: Quantitative Risk Management

Bayesian Normal Stuff

Intro to Decision Theory

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Section 0: Introduction and Review of Basic Concepts

GOV 2001/ 1002/ E-200 Section 3 Inference and Likelihood

MVE051/MSG Lecture 7

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

Chapter 6: Point Estimation

STA 114: Statistics. Notes 10. Conjugate Priors

Adaptive Experiments for Policy Choice. March 8, 2019

STAT 111 Recitation 3

1 Bayesian Bias Correction Model

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Chapter 7: Point Estimation and Sampling Distributions

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Machine Learning for Quantitative Finance

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

Simulation Wrap-up, Statistics COS 323

IEOR E4703: Monte-Carlo Simulation

Stochastic Claims Reserving _ Methods in Insurance

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Using Monte Carlo Integration and Control Variates to Estimate π

CPSC 540: Machine Learning

The Bernoulli distribution

Introduction to Sequential Monte Carlo Methods

CSC 411: Lecture 08: Generative Models for Classification

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Estimation after Model Selection

ECE 295: Lecture 03 Estimation and Confidence Interval

Part II: Computation for Bayesian Analyses

STAT 425: Introduction to Bayesian Analysis

START HERE: Instructions. 1 Exponential Family [Zhou, Manzil]

M.Sc. ACTUARIAL SCIENCE. Term-End Examination

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

BIO5312 Biostatistics Lecture 5: Estimations

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

Chapter 7 - Lecture 1 General concepts and criteria

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Learning From Data: MLE. Maximum Likelihood Estimators

15 : Approximate Inference: Monte Carlo Methods

STAT 111 Recitation 4

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Two hours UNIVERSITY OF MANCHESTER. 23 May :00 16:00. Answer ALL SIX questions The total number of marks in the paper is 90.

This is a open-book exam. Assigned: Friday November 27th 2009 at 16:00. Due: Monday November 30th 2009 before 10:00.

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Applied Statistics I

Exercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation

Practice Exam 1. Loss Amount Number of Losses

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Transcription:

1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood Estimators 7.6 Properties of Maximum Likelihood Estimators Skip: p. 434-441 (EM algorithm and Sampling Plans) 7.7 Sufficient Statistics Skip: 7.8 Jointly Sufficient Statistics Skip: 7.9 Improving an Estimator

2 / 40 Chapter 7 7.1 Statistical Inference Statistical Inference We have seen statistical models in the form of probability distributions: f (x θ) In this section the general notation for any parameter will be θ The parameter space will be denoted by Ω For example: Life time of a christmas light series follows the Expo(θ) The average of 63 poured drinks is approximately normal with mean θ The number of people that have a disease out of a group of N people follows the Binomial(N, θ) distribution. In practice the value of the parameter θ is unknown.

3 / 40 Chapter 7 7.1 Statistical Inference Statistical Inference Statistical Inference: Given the data we have observed what can we say about θ? I.e. we observe random variables X 1,..., X n that we assume follow our statistical model and then we want to draw probabilistic conclusions about the parameter θ. For example: If I tested 5 Christmas light series from the same manufacturer and they lasted for 21, 103, 76, 88 and 96 days. Assuming that the life times are independent and follow Expo(θ), what does this data set tell me about the failure rate θ?

4 / 40 Chapter 7 7.1 Statistical Inference Statistical Inference Another example Say I take a random sample of 100 people and test them all for a disease. If 3 of them have the disease, what can I say about θ = the prevalence of the disease in the population? Say I estimate θ as ˆθ = 3/100 = 3%. How sure am I about this number? I want uncertainty bounds on my estimate. Can I be confident that the prevalence of the disease is higher than 2%?

5 / 40 Chapter 7 7.1 Statistical Inference Statistical Inference Examples of different types of inference Prediction Predict random variables that have not yet been observed E.g. If we test 40 more people for the disease, how many people do we predict have the disease? Estimation Estimate (predict) the unknown parameter θ E.g. We estimated the prevalence of the disease as ˆθ = 3%.

6 / 40 Chapter 7 7.1 Statistical Inference Statistical Inference Examples of different types of inference Making decisions Hypothesis testing, decision theory E.g. If the disease affects 2% or more of the population, the state will launch a costly public health campaign. Can we be confident that θ is higher than 2%? Experimental Design What and how much data should we collect? E.g. How do I select people in my clinical trial? How many do I need to be comfortable making decision based on my analysis? Often limited by time and / or budget constraints

7 / 40 Chapter 7 7.1 Statistical Inference Bayesian vs. Frequentist Inference Should a parameter θ be treated as a random variable? E.g. consider the prevalence of a disease. Frequentists: No, the proportion q of the population that has the disease, is not a random phenomenon but a fixed number that is simply unknown Example: 95% confidence interval: Wish to find random variables T 1 and T 2 that satisfy the probabilistic statement P(T 1 q T 2 ) 0.9 Interpretation: P(T 1 q T 2 ) is the probability that the random interval [T 1, T 2 ] covers q

8 / 40 Chapter 7 7.1 Statistical Inference Bayesian vs. Frequentist Inference Should a parameter be treated as a random variable? E.g. consider the prevalence of a disease. Bayesians: Yes, the proportion Q of the population that has the disease is unknown and the distribution of Q is a subjective probability distribution that expresses the experimenters (prior) beliefs about Q Example: 95% credible interval: Wish to find constants t 1 and t 2 that satisfy the probabilistic statement P(t 1 Q t 2 data ) 0.9 Interpretation: P(t 1 Q t 2 ) is the probability that the parameter Q is in the interval [t 1, t 2 ].

Chapter 7 7.2 Prior and Posterior Distributions Bayesian Inference Prior distribution Prior distribution: The distribution we assign to parameters before observing the random variables. Notation for the prior pdf/pf : We will use p(θ), the book uses ξ(θ) Likelihood When the joint pdf/pf f (x θ) is regarded as a function of θ for given observations x 1,..., x n it is called the likelihood function. Posterior distribution Posterior distribution: The conditional distribution of the parameters θ given the observed random variables X 1,..., X n. Notation for the posterior pdf/pf : We will use p(θ x 1,..., x n ) = p(θ x) 9 / 40

Chapter 7 7.2 Prior and Posterior Distributions Bayesian Inference Theorem 7.2.1: Calculating the posterior Let X 1,..., X n be a random sample with pdf/pf f (x θ) and let p(θ) be the prior pdf/pf of θ. The the posterior pdf/pf is p(θ x) = f (x 1 θ) f (x n θ)p(θ) g(x) where g(x) = Ω f (x θ)p(θ)dθ is the marginal distribution of X 1,..., X n 10 / 40

11 / 40 Chapter 7 7.2 Prior and Posterior Distributions Example: Binomial Likelihood and a Beta prior I take a random sample of 100 people and test them all for a disease. Assume that Likelihood: X θ Binomial(100, θ), where X denotes the number of people with the disease Prior: θ Beta(2, 10) I observe X = 3 and I want to find the posterior distribution of θ Generally: Find the posterior distribution of θ when X θ Binomial(n, θ) and θ Beta(α, β) where n, α and β are known.

12 / 40 Chapter 7 7.2 Prior and Posterior Distributions Example: Binomial Likelihood and a Beta prior Notice how the posterior is more concentrated than the prior. After seeing the data we know more about θ

Chapter 7 7.2 Prior and Posterior Distributions Bayesian Inference Recall the formula for the posterior distribution: p(θ x) = f (x 1 θ) f (x n θ)p(θ) g n (x) where g(x) = Ω f (x θ)p(θ)dθ is the marginal distribution g(x) does not depend on θ We can therefore write p(θ x) f (x θ)p(θ) In many cases we can recognize the form of the distribution of θ from f (x θ)p(θ), eliminating the need to calculate the marginal distribution Example: The Binomial - Beta case 13 / 40

14 / 40 Chapter 7 7.2 Prior and Posterior Distributions Sequential Updates If our observations are a random sample, we can do Bayesian Analysis sequentially: Each time we use the posterior from the previous step as a prior: p(θ x 1 ) f (x 1 θ)p(θ) p(θ x 1, x 2 ) f (x 2 θ)p(θ x 1 ) p(θ x 1, x 2, x 3 ) f (x 3 θ)p(θ x 1, x 2 ). p(θ x 1,... x n ) f (x n θ)p(θ x 1,..., x n 1 ) For example: Say I test 40 more people for the disease and 2 tested positive. What is the new posterior?

15 / 40 Prior distributions Chapter 7 7.2 Prior and Posterior Distributions The prior distribution should reflect what we know a priori about θ For example: Beta(2, 10) puts almost all of the density below 0.5 and has a mean 2/(2 + 10) = 0.167, saying that a prevalence of more then 50% is very unlikely Using Beta(1, 1), i.e. the Uniform(0, 1) indicates that a priori all values between 0 and 1 are equally likely.

16 / 40 Choosing a prior Chapter 7 7.2 Prior and Posterior Distributions We need to choose prior distributions carefully We need a distribution (e.g. Beta) and its hyperparameters (e.g. α, β) When hyperparameters are difficult to interpret we can sometimes set a mean and a variance and solve for parameters E.g: What Beta prior has mean 0.1 and variance 0.1 2? If more than one option seems sensible, we perform sensitivity analysis: We compare the posteriors we get when using the different priors.

Chapter 7 7.2 Prior and Posterior Distributions Sensitivity analysis Binomial-Beta example Notice: The posterior mean is always between the prior mean and the observed proportion 0.03 17 / 40

18 / 40 Chapter 7 7.2 Prior and Posterior Distributions Effect of sample size and prior variance The posterior is influenced both by sample size and the prior variance Larger sample size less the prior influences the posterior Larger prior variance the less the prior influences the posterior

Chapter 7 7.2 Prior and Posterior Distributions Example - Normal distribution Let X 1,..., X n be a random sample from N(θ, σ 2 ) where σ 2 is known Let the prior distribution of θ be N(µ 0, ν 2 0 ) where µ 0 and ν 2 are known. Show that the posterior distribution p(θ x) is N(µ 1, ν 2 1 ) where µ 1 = σ2 µ 0 + nν 2 0 x n σ 2 + nν 2 0 and ν 2 1 = σ2 ν 2 0 σ 2 + nν 2 0 The posterior mean is a linear combination of the prior mean µ 0 and the observed sample mean. What happens when ν 2 0? What happens when ν 2 0 0? What happens when n? 19 / 40

20 / 40 Chapter 7 Example - Normal distribution 7.2 Prior and Posterior Distributions

21 / 40 Conjugate Priors Chapter 7 7.3 Conjugate Prior Distributions Def: Conjugate Priors Let X 1, X 2,... be a random sample from f (x θ). A family Ψ of distributions is called a conjugate family of prior distributions if for any prior distribution p(θ) in Ψ the posterior distribution p(θ x) is also in Ψ Likelihood Bernoulli(θ) Poisson(θ) N(θ, σ 2 ), σ 2 known Exponential(θ) Conjugate Prior for θ The Beta distributions The Gamma distributions The Normal distributions The Gamma distributions Have already see the Bernoulli-Beta and Normal-Normal cases

22 / 40 Conjugate prior families Chapter 7 7.3 Conjugate Prior Distributions The Gamma distributions are a conjugate family for the Poisson(θ) likelihood: If X 1,..., X n i.i.d. Poisson(θ) and θ Gamma(α, β) then the posterior is ( ) n Gamma α + x i, β + n The Gamma distributions are a conjugate family for the Expo(θ) likelihood: i=1 If X 1,..., X n i.i.d. Expo(θ) and θ Gamma(α, β) then the posterior is ( ) n Gamma α + n, β + x i i=1

23 / 40 Chapter 7 7.3 Conjugate Prior Distributions Improper priors Improper Prior: A pdf p(θ) where p(θ)dθ = Used to try to put more emphasis on data and down play the prior Used when there is little or no prior information about θ. Caution: We always need to check that the posterior pdf is proper! (Integrates to 1) Example: Let X 1,..., X n be i.i.d. N(θ, σ 2 ) and p(θ) = 1, for θ R. Note: Here the prior variance is Then the posterior is N(x n, σ 2 /n)

24 / 40 Chapter 7 continued Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood Estimators 7.6 Properties of Maximum Likelihood Estimators Skip: p. 434-441 (EM algorithm and Sampling Plans) 7.7 Sufficient Statistics Skip: 7.8 Jointly Sufficient Statistics Skip: 7.9 Improving an Estimator

Chapter 7 continued 7.4 Bayes Estimators Bayes Estimator In principle, Bayesian inference is the posterior distribution However, often people wish to estimate the unknown parameter θ with a single number A statistic: Any function of observable random variables X 1,..., X n, T = r(x 1, X 2,..., X n ). Example: The sample mean X n is a statistic Def: Estimator / Estimate Suppose our observable data X 1,..., X n is i.i.d. f (x θ), θ Ω R. Estimator of θ: A real valued function δ(x 1,..., X n ) Estimate of θ: δ(x 1,..., x n ), i.e. estimator evaluated at the observed values An estimator is a statistic and a random variable 25 / 40

26 / 40 Chapter 7 continued 7.4 Bayes Estimators Bayes Estimator Def: Loss Function Loss function: A real valued function L(θ, a) where θ Ω and a R. L(θ, a) = what we loose by using a as an estimate when θ is the true value of the parameter. Examples: Squared error loss function: L(θ, a) = (θ a) 2 Absolute error loss function: L(θ, a) = θ a

27 / 40 Bayes Estimator Chapter 7 continued 7.4 Bayes Estimators Idea: Choose an estimator δ(x) so that we minimize the expected loss Def: Bayes Estimator Minimum expected loss An estimator is called the Bayesian estimator of θ if for all possible observations x of X the expected loss is minimized. For given X = x the expected loss is E (L(θ, a) x) = L(θ, a)p(θ x)dθ Let a (x) be the value of a where the minimum is obtained. Then δ (x) = a (x) is the Bayesian estimate of θ and δ (X) is the Bayesian estimator of θ. Ω

28 / 40 Chapter 7 continued 7.4 Bayes Estimators Bayes Estimator For squared error loss: The posterior mean δ (X) = E(θ X) min a E (L(θ, a) x) = min a E ( (θ a) 2 x ). The mean of θ x minimizes this, i.e. the posterior mean. For absolute error loss: The posterior median min a E (L(θ, a) x) = min a E ( θ a x). The median of θ x minimizes this, i.e. the posterior median. The Posterior mean is a more common estimator because it is often difficult to obtain a closed expression of the posterior median.

29 / 40 Examples Chapter 7 continued 7.4 Bayes Estimators Normal Bayes Estimator, with respect to squared error loss: If X 1,..., X n are N(θ, σ 2 ) and θ N(µ 0, ν0 2 ) then the Bayesian estimator of θ is δ (X) = σ2 µ 0 + nν 2 0 X n σ 2 + nν 2 0 Binomial Bayes Estimator, with respect to squared error loss: If X Binomial(n, θ) and θ Beta(α, β) then the Bayesian estimator of θ is δ (X) = α + X α + β + n

30 / 40 Chapter 7 continued 7.4 Bayes Estimators Bayesian Inference Pros and cons Pros: Cons: Gives a coherent theory for statistical inference such as estimation. Allows for incorporation of prior scientific knowledge about parameters Selecting a scientifically meaningful prior distributions (and loss functions) is often difficult, especially in high dimensions

Chapter 7 continued 7.5 Maximum Likelihood Estimators Frequentist Inference Likelihood When the joint pdf/pf f (x θ) is regarded as a function of θ for given observations x 1,..., x n it is called the likelihood function. Maximum Likelihood Estimator Maximum likelihood estimator (MLE): For any given observations x we pick the θ Ω that maximizes f (x θ). Given X = x, the maximum likelihood estimate (MLE) will be a function of x. Notation: ˆθ = δ(x) Potentially confusing notation: Sometimes ˆθ is used for both the estimator and the estimate. Note: The MLE is required to be in the parameter space Ω. Often it is easier to maximize the log-likelihood L(θ) = log (f (x θ) 31 / 40

32 / 40 Examples Chapter 7 continued 7.5 Maximum Likelihood Estimators Let X Binomial(n, θ) where n is given. Find the maximum likelihood estimator of θ. Say we observe X = 3, what is the maximum likelihood estimate of θ? Let X 1,..., X n be i.i.d. N(µ, σ 2 ). Find the MLE of µ when σ 2 is known Find the MLE of µ and σ 2 (both unknown) Let X 1,..., X n be i.i.d. Uniform[0, θ], where θ > 0. Find ˆθ Let X 1,..., X n be i.i.d. Uniform[θ, θ + 1]. Find ˆθ

33 / 40 Chapter 7 continued 7.5 Maximum Likelihood Estimators MLE Intuition: We pick the parameter that makes the observed data most likely But: The likelihood is not a pdf/pf: If the likelihood of θ 1 is larger than the likelihood of θ 1, i.e. f (x θ 2 ) > f (x θ 1 ) it does NOT mean that θ 2 is more likely Recall: θ is not random here Limitations: Does not always exist Not always appropriate - we cannot incorporate external (prior) knowledge May not be unique

34 / 40 Chapter 7 continued Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood Estimators 7.6 Properties of Maximum Likelihood Estimators Skip: p. 434-441 (EM algorithm and Sampling Plans) Skip: 7.7 Sufficient Statistics Skip: 7.8 Jointly Sufficient Statistics Skip: 7.9 Improving an Estimator

35 / 40 Properties of MLE s Chapter 7 continued 7.6 Properties of Maximum Likelihood Estimators Theorem 7.6.2: MLE s are invariant If ˆθ is the MLE of θ and g(θ) is a function of θ then g(ˆθ) is the MLE of g(θ) Example: Let ˆp be the MLE of a probability parameter, e.g. the p in Binomial(n, p). Then the MLE of the odds, p 1 p is ˆp 1 ˆp In general this does not hold for Bayes estimators. E.g. for square error loss E(g(θ) x) g(e(θ x))

36 / 40 Chapter 7 continued 7.6 Properties of Maximum Likelihood Estimators Computation For MLE s In many practical situations the maximization we need is not available analytically or too cumbersome There exist many numerical optimization methods, Newton s Method (see definition 7.6.2) is one example. For Bayesian estimators In many practical situations the posterior distribution is not available in closed form This happens if we cannot evaluate the integral for the marginal distribution In stead people either approximate the posterior distribution or take random samples from it, e.g. using Markov Chain Monte Carlo (MCMC) methods

37 / 40 Chapter 7 continued Method of Moments (MOM) 7.6 Properties of Maximum Likelihood Estimators Let X 1,..., X n be i.i.d. from f (x θ) where θ is k dimensional. The j th sample moment is defined as m j = 1 n n i=1 X j i Method of moments (MOM) estimator: match the theoretical moments and the sample moments and solve for parameters: Example: m 1 = E(X 1 θ), m 2 = E(X 2 1 θ),..., m k = E(X k 1 θ) Let X 1,..., X n be i.i.d. Gamma(α, β). Then E(X) = α β and E(X 2 ) = α(α + 1) β 2 Find the MOM estimator of α and β

Chapter 7 continued 7.7 Sufficient Statistics Sufficient Statistics A statistic: T = r(x 1,..., X n ) Def: Sufficient Statistics Let X 1,..., X n be a random sample form f (x θ) and let T be a statistic. If the conditional distribution of X 1,..., X n T = t does not depend on θ then T is called a sufficient statistic The idea: Just as good to have the observed sufficient statistic as it is to have the individual observations of X 1,..., X n Can limit our search for a good estimator to sufficient statistics 38 / 40

Chapter 7 continued 7.7 Sufficient Statistics Sufficient Statistics Theorem 7.7.1: Factorization Criterion Let X 1,..., X n be a random sample form f (x θ) where θ Ω is unknown. A statistic T = r(x 1,..., X n ) is a sufficient statistic for θ if and only if for all x R n and all θ Ω, the joint pdf/pf f n (x θ) can be factored as f n (x θ) = u(x)v (r(x), θ) where function u and v are nonnegative. The function u may depend on x but not on θ The function v depends on θ but depends on x only through the value of the statistic r(x) Both MLEs and Bayesian estimators depend on data only through sufficient statistics. 39 / 40

40 / 40 Chapter 7 continued END OF CHAPTER 7