Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

Similar documents
Lecture 5 Point Es/mator and Sampling Distribu/on

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

Sampling Distributions and Estimation

Lecture 5: Sampling Distribution

Estimating Proportions with Confidence

Topic-7. Large Sample Estimation

5. Best Unbiased Estimators

. (The calculated sample mean is symbolized by x.)

Statistics for Economics & Business

14.30 Introduction to Statistical Methods in Economics Spring 2009

Sampling Distributions and Estimation

CHAPTER 8 Estimating with Confidence

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

Confidence Intervals Introduction

Introduction to Probability and Statistics Chapter 7

Sampling Distributions & Estimators

Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

Lecture 4: Probability (continued)

B = A x z

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

Chapter 8 Interval Estimation. Estimation Concepts. General Form of a Confidence Interval

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Parametric Density Estimation: Maximum Likelihood Estimation

x satisfying all regularity conditions. Then

5 Statistical Inference

A point estimate is the value of a statistic that estimates the value of a parameter.

Chapter 8: Estimation of Mean & Proportion. Introduction

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

1 Random Variables and Key Statistics

Unbiased estimators Estimators

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

Math 124: Lecture for Week 10 of 17

0.1 Valuation Formula:

NOTES ON ESTIMATION AND CONFIDENCE INTERVALS. 1. Estimation

ii. Interval estimation:

A Bayesian perspective on estimating mean, variance, and standard-deviation from data

Quantitative Analysis

AY Term 2 Mock Examination

Exam 2. Instructor: Cynthia Rudin TA: Dimitrios Bisias. October 25, 2011

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

Exam 1 Spring 2015 Statistics for Applications 3/5/2015

Basic formula for confidence intervals. Formulas for estimating population variance Normal Uniform Proportion

1. Suppose X is a variable that follows the normal distribution with known standard deviation σ = 0.3 but unknown mean µ.

Monetary Economics: Problem Set #5 Solutions

1 Estimating the uncertainty attached to a sample mean: s 2 vs.

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

Statistics for Business and Economics

18.S096 Problem Set 5 Fall 2013 Volatility Modeling Due Date: 10/29/2013

Point Estimation by MLE Lesson 5

FINM6900 Finance Theory How Is Asymmetric Information Reflected in Asset Prices?

Outline. Populations. Defs: A (finite) population is a (finite) set P of elements e. A variable is a function v : P IR. Population and Characteristics

Chapter 10 - Lecture 2 The independent two sample t-test and. confidence interval

BIOSTATS 540 Fall Estimation Page 1 of 72. Unit 6. Estimation. Use at least twelve observations in constructing a confidence interval

Quantitative Analysis

ST 305: Exam 2 Fall 2014


STAT 135 Solutions to Homework 3: 30 points

Notes on Expected Revenue from Auctions

These characteristics are expressed in terms of statistical properties which are estimated from the sample data.

Point Estimation by MLE Lesson 5

The Idea of a Confidence Interval

4.5 Generalized likelihood ratio test

CHAPTER 8 CONFIDENCE INTERVALS

BASIC STATISTICS ECOE 1323

ECON 5350 Class Notes Maximum Likelihood Estimation

Maximum Empirical Likelihood Estimation (MELE)

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Topic 14: Maximum Likelihood Estimation

Bayes Estimator for Coefficient of Variation and Inverse Coefficient of Variation for the Normal Distribution

1 Estimating sensitivities

SCHOOL OF ACCOUNTING AND BUSINESS BSc. (APPLIED ACCOUNTING) GENERAL / SPECIAL DEGREE PROGRAMME

The material in this chapter is motivated by Experiment 9.

Department of Mathematics, S.R.K.R. Engineering College, Bhimavaram, A.P., India 2

1. Find the area under the standard normal curve between z = 0 and z = 3. (a) (b) (c) (d)

Simulation Efficiency and an Introduction to Variance Reduction Methods

Research Article The Probability That a Measurement Falls within a Range of n Standard Deviations from an Estimate of the Mean

Probability and statistics

5 Decision Theory: Basic Concepts

Lecture 9: The law of large numbers and central limit theorem

We analyze the computational problem of estimating financial risk in a nested simulation. In this approach,

Control Charts for Mean under Shrinkage Technique

Confidence Intervals based on Absolute Deviation for Population Mean of a Positively Skewed Distribution

Systematic and Complex Sampling!

= α e ; x 0. Such a random variable is said to have an exponential distribution, with parameter α. [Here, view X as time-to-failure.

Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

Elementary Statistics and Inference. Elementary Statistics and Inference. Chapter 20 Chance Errors in Sampling (cont.) 22S:025 or 7P:025.

Parameter Uncertainty in Loss Ratio Distributions and its Implications

Data Analysis and Statistical Methods Statistics 651

Dr. Maddah ENMG 624 Financial Eng g I 03/22/06. Chapter 6 Mean-Variance Portfolio Theory

Subject CT5 Contingencies Core Technical. Syllabus. for the 2011 Examinations. The Faculty of Actuaries and Institute of Actuaries.

Annual compounding, revisited

Correlation possibly the most important and least understood topic in finance

INTERVAL GAMES. and player 2 selects 1, then player 2 would give player 1 a payoff of, 1) = 0.

AMS Portfolio Theory and Capital Markets

An Empirical Study of the Behaviour of the Sample Kurtosis in Samples from Symmetric Stable Distributions

Online appendices from The xva Challenge by Jon Gregory. APPENDIX 10A: Exposure and swaption analogy.

Summary. Recap. Last Lecture. .1 If you know MLE of θ, can you also know MLE of τ(θ) for any function τ?

Chpt 5. Discrete Probability Distributions. 5-3 Mean, Variance, Standard Deviation, and Expectation

Transcription:

Lecture 4: Parameter Estimatio ad Cofidece Itervals GENOME 560 Doug Fowler, GS (dfowler@uw.edu) 1

Review: Probability Distributios Discrete: Biomial distributio Hypergeometric distributio Poisso distributio 2

Review: Probability Distributios Discrete: Biomial distributio Hypergeometric distributio Poisso distributio Cotiuous: Uiform distributio Expoetial distributio Gamma distributio Normal distributio 3

Review: Probability Distributios Discrete: Biomial distributio Hypergeometric distributio Poisso distributio Cotiuous: Uiform distributio Expoetial distributio Gamma distributio Normal distributio The sums or meas of samples draw from ay dist are ormally distributed 4

Goals Basic cocepts of parameter estimatio Cofidece itervals 5

What Is Parameter? 6

What Is Parameter? Variables vs. Parameters Accordig to Bard & Yoatha (1974) * Usually a probabilistic model is desiged to explai the relatioships that exist amog quatities which ca be measured idepedetly i a experimet; these are the variables of the model. To formulate these relatioships, however, oe frequetly itroduces "costats" which stad for iheret properties of ature. These are the parameters. We ofte deote by θ * Bard, Yoatha (1974). Noliear Parameter Estimatio. New York 7

Which are parameters, variables? Biomial distributio (coi tossig) X: umber of Heads after coi tosses P æö k -k { X = k} = ç p (1 - p) èk ø variable parameter θ = p Poisso distributio X: umber of experimets withi a week P variable { X = k} = e k -l l k! parameter θ = λ 8

Parameters Ca Tell Us About Samples If we ca describe a populatio usig a parametric pdf or pmf f(x θ), ad we kow the parameter values, the we ca say what typical samples from the populatio will look like 9

ad Samples Ca Tell Us About Parameters If we ca describe a populatio usig a parametric pdf or pmf f(x θ), ad we kow the parameter values, the we ca say what typical samples from the populatio will look like We ca use sample data to estimate parameter values If we are tossig a coi we would like to estimate the parameter p If we are coutig the umber of experimets per week, we would like to estimate λ 10

Cetral Dogma of Statistics If we ca describe the populatio usig a parametric distributio, ad we kow the parameter values, the we ca say what typical samples from the populatio will look like 11

Parameter Estimatio Estimator: Statistic whose calculated value is used to estimate a parameter, θ Estimate: A particular realizatio of a estimator, θ Types of estimates: Poit estimate: sigle umber that ca be regarded as the most plausible value of θ, give the data we have Iterval estimate: a rage of umbers, called a cofidece iterval, that iforms us about the quality of our estimate 12

Simple Example Estimators Suppose we take a sample from a biomial distributio whose parameters are ukow. We get m successes from samples. How ca we estimate the parameter π (the populatio p)? 13

Simple Example Estimators Suppose we take a sample from a biomial distributio whose parameters are ukow. We get m successes from samples. How ca we estimate the parameter π (the populatio p)? Method 1: We could just use m ad 14

Simple Example Estimators Suppose we take a sample from a biomial distributio whose parameters are ukow. We get m successes from samples. How ca we estimate the parameter π (the populatio p)? Method 1: We could just use m ad Method 2: Alterately, we could just look i the literature for similar experimets. We could igore our data ad set 15

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Would our example estimators be cosistet? 16

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Would our example estimators be cosistet? Estimator 1, yes (m/ will approach π, law of large umbers) Estimator 2, o (our data does t matter) 17

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Ubiased: 18

What do we mea by ubiased? A biased estimator diverges systematically from the true parameter value 19

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Ubiased: the expected value of is equal to θ 20

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Ubiased: the expected value of is equal to θ Are our example estimators biased? 21

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Ubiased: the expected value of is equal to θ Are our example estimators biased? Estimator 1 turs out to be ubiased Estimator 2 is has a bias 22

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Ubiased: Precise: 23

What do we mea by precise? A imprecise estimator is subject to large amouts of radom variability 24

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Ubiased: Precise: the variace of should be miimal Estimator 1 Estimator 2? 25

Good Estimators Are: Cosistet: as sample size icreases, gets closer to θ Ubiased: Precise: the variace of should be miimal Estimator 1 Estimator 2 has zero variace Bias ad variace are itertwied, ad ofte you will have to chose to miimize oe or the other 26

Estimators for ormally distributed data Sice we kow that much experimetal data is ormally distributed, let s start here Geeral methods for estimatig parameters (MLE, Bayesia) will be covered later. 27

Estimators for ormally distributed data F(x) 0.0 0.1 0.2 0.3 0.4 What two parameters defie a ormal distributio? 3 2 1 0 1 2 3 x 28

Estimators for ormally distributed data F(x) 0.0 0.1 0.2 0.3 0.4 What two parameters defie a ormal distributio? mea = μ stadard deviatio = σ 3 2 1 0 1 2 3 x 29

Estimators for ormally distributed data F(x) 0.0 0.1 0.2 0.3 0.4 μ σ What two parameters defie a ormal distributio? mea = μ stadard deviatio = σ 3 2 1 0 1 2 3 x 30

Estimators for ormally distributed data Give a sample from a ormally distributed populatio, what estimators would you use for μ,σ? ˆµ = ˆ = 31

Estimators for ormally distributed data Give a sample from a ormally distributed populatio, what estimators would you use for μ,σ? ˆµ = ˆ = 32

Cofidece itervals: how good is my parameter estimate? 33

Back to our fluorescet yeast Let s say we measure the fluorescece of 25 yeast cells ad fid x = 89.1; s = 24.25 How good is our estimate ˆµ = 89.1? 34

Back to our fluorescet yeast Let s say we measure the fluorescece of 25 yeast cells ad fid x = 89.1; s = 24.25 How good is our estimate ˆµ = 89.1? O what will the goodess of the estimate deped? 35

Back to our fluorescet yeast Let s say we measure the fluorescece of 25 yeast cells ad fid x = 89.1; s = 24.25 How good is our estimate ˆµ = 89.1? O what will the goodess of the estimate deped? Sample size Variability of the populatio from which the samples were draw 36

A simple startig poit What is the probability that a secod sample from the culture is withi 79.1 ad 99.1? P ( x 2 is withi 10 of x) Recall that x is a RV with its ow samplig distributio 0.0 0.1 0.2 0.3 0.4 ˆ = µ x The samplig distributio of the sample mea is: Normal (by cetral limit theorem) Has µ x = µ = x Has x = p = s p 37

Stadard Error of the Mea SEM is the stadard deviatio of the samplig distributio of the mea Ofte cofused with stadard deviatio of a sample i the literature. The stadard deviatio is descriptive of the sample we took, but SEM describes the spread of the samplig distributio of the mea itself. SD of the sample is the degree to which idividuals withi a sample differ from the sample mea SEM reflects ucertaity about where the populatio mea might be located, give our sample 38

A simple startig poit What is the probability that a secod sample from the culture is withi 79.1 ad 99.1? P ( x 2 is withi 10 of x) Recall that x is a RV with a samplig distributio 0.0 0.1 0.2 0.3 0.4 ˆ = µ x The samplig distributio of the sample mea is: Normal Has µ x = µ = x Has x = p = s p Give that we have a estimate of the PDF of the samplig distributio of the sample mea, how might we try to we calculate the probability that a secod sample is withi some distace of the mea of the samplig distributio? 39

A simple startig poit What is the probability that a secod sample from the culture is withi 79.1 ad 99.1? P ( x 2 is withi 10 of x) Recall that x is a RV with a samplig distributio 0.0 0.1 0.2 0.3 0.4 ˆ = µ x The samplig distributio of the sample mea is: Normal Has µ x = µ = x Has x = p = s p We just eed to fid the area uder the samplig distributio of the sample mea correspodig to the mea +/- 10 This will be the probability that μ is withi 10 of µ x 40

A simple startig poit What is the probability that a secod sample from the culture is withi 79.1 ad 99.1? P ( x 2 is withi 10 of x) Recall that x is a RV with a samplig distributio 0.0 0.1 0.2 0.3 0.4 ˆ = µ x The samplig distributio of the sample mea is: Normal Has µ x = µ = x Has x = p = How ca we do this? s p = 24.25/5 =4.85 41

A simple startig poit What is the probability that a secod sample from the culture is withi 79.1 ad 99.1? P ( x 2 is withi 10 of x) Recall that x is a RV with a samplig distributio 0.0 0.1 0.2 0.3 0.4 ˆ = µ x The samplig distributio of the sample mea is: Normal Has µ x = µ = x Has x = p = s p = 24.25/5 =4.85 > porm(c(79.1, 99.1), mea = 89.1, sd = 4.85) [1] 0.01961074 0.98038926 > 0.9804-0.01961 [1] 0.9607 42

Geeralizatio We wat to set a cofidece iterval such that 95% of sample meas from the distributio are withi the iterval Give that we ca estimate the mea ad stadard deviatio of the samplig distributio of the sample mea, how do we do this? 43

Geeralizatio We fid the umber of stadard deviatios (z) we must move away from the mea to ecompass 95% of the samplig distributio of the sample mea 5% of total area 0.0 0.1 0.2 0.3 0.4 z µ x 44

Geeralizatio Sice the distributio is symmetric, we ca just use the CDF to accomplish this 97.5% of total area 0.0 0.1 0.2 0.3 0.4 z To fid z such that CI 95% = µ x ± s p z We ca use the ormal cumulative distributio fuctio µ x 45

Geeralizatio Now we ca set a 95% CI for our fluorescece data 97.5% of total area 0.0 0.1 0.2 0.3 0.4 µ x z To fid z such that CI 95% = µ x ± s p z We ca use the ormal cumulative distributio fuctio > mi(which(porm(seq(-3,3,0.01))>=0.975)) [1] 497 > seq(-3,3,0.01)[497] [1] 1.96 > 89.1-4.85 * 1.96 [1] 79.594 > 89.1 + 4.85 * 1.96 [1] 98.606 46

A practical ote Whe sample sizes are greater tha ~30, the samplig distributio of the sample mea is ormal ad x = p s is a good estimate Whe sample sizes are smaller tha ~30, is a uderestimate x = s p Thus, i practice we use the t-distributio as opposed to the ormal distributio (more o this later) 47

Wait a miute We bega by takig a sample ad usig it to estimate the samplig distributio of the sample mea. The, usig the cetral limit theorem ad the ormal CDF, we computed the iterval withi which 95% of the area of our sample-based estimate of the samplig distributio of the sample mea falls. We might coclude that there was a 95% chace that this iterval cotaied the true populatio mea. Does ayoe have a problem with this? 48

Wait a miute Philosophically, this makes o sese. The populatio mea is a fixed quatity, so it is either iside or outside the iterval we calculated. Period. Additioally, the process of samplig is subject to samplig variability. So, we might have draw a really weird sample that poorly represets the populatio. 49

Iterpretatio of cofidece itervals I fact, it s better to build the idea of samplig variatio ito our iterpretatio of a cofidece iterval If you repeatedly sample the same populatio, the CI (which differs for each sample) would cotai the true populatio parameter X% of the time 50

Iterpretatio of cofidece itervals If you repeatedly sample the same populatio, the CI (which differs for each sample) would cotai the true populatio parameter X% of the time NOT the probability that this particular CI from this particular sample actually cotais the populatio parameter NOT that there is a X% probability of a sample mea from a repeat experimet fallig withi the iterval 51

R Sessio Goals Cofidece iterval calculatios User-defied fuctios 52

Stadard Error of the Mea are idepedet obs from a pop. with mea μ ad stdev σ 53

Stadard Error of the Mea are idepedet obs from a pop. with mea μ ad stdev σ 54

Stadard Error of the Mea are idepedet obs from a pop. with mea μ ad stdev σ Is a property of RV 55

Stadard Error of the Mea are idepedet obs from a pop. with mea μ ad stdev σ Is a property of RV 56

Stadard Error of the Mea are idepedet obs from a pop. with mea μ ad stdev σ Is a property of RV 57

Stadard Error of the Mea are idepedet obs from a pop. with mea μ ad stdev σ Is a property of RV 58

Stadard Error of the Mea are idepedet obs from a pop. with mea μ ad stdev σ Is a property of RV 59

Multivariate Hypergeometric Dist The HGD ca be geeralized to pickig a sample of size where there are exactly (k 1, k 2 k c ) items from each of c classes from a populatio of N items of c classes where there are K i items of of class i pmf: Q c i=1 N K i k i Example: There are 5 black, 10 white ad 15 red balls i a ur. If you draw six without replacemet, what is the probability that you pick 2 of each color? 5 2 10 2 30 6 15 2 =0.08 60