Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Similar documents
Intro to GLM Day 2: GLM and Maximum Likelihood

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

The method of Maximum Likelihood.

Final Exam Suggested Solutions

Maximum Likelihood Estimation

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Logit Models for Binary Data

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Log-linear Modeling Under Generalized Inverse Sampling Scheme

6. Genetics examples: Hardy-Weinberg Equilibrium

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

ECON 5350 Class Notes Maximum Likelihood Estimation

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Solutions to Final Exam.

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Maximum Likelihood Estimation

1. You are given the following information about a stationary AR(2) model:

MODEL SELECTION CRITERIA IN R:

Gamma Distribution Fitting

MVE051/MSG Lecture 7

Multiple regression - a brief introduction

BIO5312 Biostatistics Lecture 5: Estimations

Financial Mathematics III Theory summary

Lecture 10: Point Estimation

PASS Sample Size Software

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

Generalized Linear Models

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Econometric Methods for Valuation Analysis

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Variance clustering. Two motivations, volatility clustering, and implied volatility

Chapter 7 - Lecture 1 General concepts and criteria

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

TOPIC: PROBABILITY DISTRIBUTIONS

Modelling Returns: the CER and the CAPM

Non-informative Priors Multiparameter Models

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

Parameter estimation in SDE:s

Time series: Variance modelling

Unit 5: Sampling Distributions of Statistics

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Unit 5: Sampling Distributions of Statistics

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

Risk Management and Time Series

Chapter 7: Estimation Sections

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Chapter 8: Sampling distributions of estimators Sections

Regression and Simulation

Chapter 5. Statistical inference for Parametric Models

Chapter 4: Asymptotic Properties of MLE (Part 3)

Lecture 21: Logit Models for Multinomial Responses Continued

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Final Exam

Financial Econometrics

Amath 546/Econ 589 Univariate GARCH Models

Practice Exam 1. Loss Amount Number of Losses

IEOR E4703: Monte-Carlo Simulation

12. Conditional heteroscedastic models (ARCH) MA6622, Ernesto Mordecki, CityU, HK, 2006.

R is a collaborative project with many contributors. Type contributors() for more information.

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Final Exam

Tests for One Variance

Chapter 7: Estimation Sections

Random Variables Handout. Xavier Vilà

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Alastair Hall ECG 790F: Microeconometrics Spring Computer Handout # 2. Estimation of binary response models : part II

Outline. Review Continuation of exercises from last time

Chapter 7: Point Estimation and Sampling Distributions

Financial Econometrics Lecture 5: Modelling Volatility and Correlation

GMM Estimation. 1 Introduction. 2 Consumption-CAPM

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Final Exam

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Loss Simulation Model Testing and Enhancement

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

GOV 2001/ 1002/ E-200 Section 3 Inference and Likelihood

Bayesian Normal Stuff

Financial Econometrics Notes. Kevin Sheppard University of Oxford

The Two-Sample Independent Sample t Test

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

MTH6154 Financial Mathematics I Stochastic Interest Rates

Empirical Analysis of the US Swap Curve Gough, O., Juneja, J.A., Nowman, K.B. and Van Dellen, S.

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Stochastic Models. Statistics. Walt Pohl. February 28, Department of Business Administration

Statistical estimation

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Lecture 5a: ARCH Models

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Computer Exercise 2 Simulation

Learning From Data: MLE. Maximum Likelihood Estimators

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Economics 424/Applied Mathematics 540. Final Exam Solutions

Assicurazioni Generali: An Option Pricing Case with NAGARCH

Non-linearities in Simple Regression

GPD-POT and GEV block maxima

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Transcription:

Gov 2001: Section 5 I. A Normal Example II. Uncertainty Gov 2001 Spring 2010

A roadmap We started by introducing the concept of likelihood in the simplest univariate context one observation, one variable. Then we moved forward with more than one observation and multiplied likelihoods together. Now, we are introducing covariates.

A roadmap (ctd.) Key to all of this is the distinction between stochastic and systematic components: Stochastic - the probability distribution of the data; key to identifying what model (Poisson, binomial, etc.) you should use., E.g., Y i f (y i γ). Systematic - how the parameters of the probability distribution vary over your covariates; key to incorporating covariates into your model. E.g., γ = g(x i, θ). You ll need both parts to model the likelihood, and you ll need a more sophisticated systematic component to include interesting covariates.

Normal Example Let s work through an example of how this all works. I m going to create some fake data: > x <- rnorm(1000,.5,6) > z <- rnorm(1000,100,.5) > Y <- 14 + 6.4*x +.25*z + rnorm(1000,0,1)

Normal Example (ctd.) So y will be normally distributed. Why? > hist(y, col = "goldenrod", main = "Distribution of y") Distribution of Y Frequency 0 50 100 150 200 50 0 50 100 150 200 Y

Normal Example (ctd.) Since Y is continuous and normally distributed, we could use OLS: > Y <- 14 + 6.4*x +.25*z + rnorm(1000,0,1) > my.lm <- lm(y ~ x + z) > my.lm Call: lm(formula = Y ~ x + z) Coefficients: (Intercept) x z 16.6723 6.3997 0.2228

Normal Example (ctd.) But we can also calculate the same results using likelihood techniques. How? Stochastic : Y i N(µ, σ 2 ) Systematic : µ = B 0 + B 1 X 1 +... + B k X k This leaves us with the following likelihood for the ith observation: L(µ i, σ 2 y) N(y i µ i, σ 2 ) (2πσ 2 ) 1 (y i µ i ) 2 2 e 2σ 2

Normal Example (ctd.) To calculate the full log likelihood, we assume independence among observations and multiply; then take the natural log; then introduce our parameterization. L(β, σ 2 y) = L(y i µ i, σ 2 ) lnl(β, σ 2 y) = lnl(y i µ i, σ 2 ) = 1 2 [lnσ2 + (y i µ) 2 σ 2 ] = 1 2 [lnσ2 + (y i X i β) 2 σ 2 ]

Normal Example (ctd.) This log likelihood is too complicated to analyze analytically. So we aim for a numeric solution. We can implement the log likelihood in R using the commands from Monday s lecture notes: ll.normal <- function(par,y,x){ beta <- par[1:ncol(x)] sigma2 <- exp(par[ncol(x)+1]) -1/2 * (sum(log(sigma2) + (y -(X%*%beta))^2/sigma2)) }

Normal Example (ctd.) The Zelig package will calculate the MLE estimates automatically. > install.packages("zelig") > library(zelig) > ex <- data.frame(y,x,z) > my.z <- zelig(y ~ x + z, model = "normal", data = ex) > my.z Coefficients: (Intercept) x z 13.079 6.394 0.259

Normal Example (ctd.) But we will tackle this manually: ll.normal <- function(par,y,x){ beta <- par[1:ncol(x)] sigma2 <- exp(par[ncol(x)+1]) -1/2 * (sum(log(sigma2) + (y -(X%*%beta))^2/sigma2)) } where the inputs will be par - a vector of parameters you want the likelihood for y - a vector for the dependent variable X - a matrix of covariates plus a row of 1 s for the intercept (Why do you need a vector of 1 s? Because µ = X i β.)

Normal Example (ctd.) Note: X must be in matrix form so that you can do the matrix multiplication. > ll.normal(par = c(14,6.4,.25, 40), + y = Y, X = cbind(1,x,z)) [1] -20000 > ll.normal(par = c(0,0,0,0), y = Y, X = cbind(1,x,z)) [1] -1591275 Which potential parameters are better? Why?

Normal Example (ctd.) At the end of the day, we don t want the absolute value of the likelihood. We want to optimize the likelihood across different values of the parameters and check which values maximize the likelihood. We have four parameters: an intercept, a coefficient on x, a coefficient on z, and a value of σ 2. To calculate automatically the likelihood across different possible values of these, we use optim.

Normal Example (ctd.) Here s how we can use optim: > my.optim <- optim(par = c(0,0,0,0), fn = ll.normal, + y = Y, X = cbind(1,x,z), + method = "BFGS", control=list(fnscale=-1), hessian=t) The inputs to optim include a par argument. These should be your proposed starting values. Choose starting values that substantively make sense otherwise, the optimizing algorithm might get lost! Also remember to include starting values for your intercept and for ancillary parameters.

Normal Example (ctd.) So let s look at the optim output: > my.optim$par [1] 16.67015681 6.39974960 0.22284211-0.02824721 We can cross-check our answers with the lm function. > my.lm Coefficients: (Intercept) x z 16.6723 6.3997 0.2228 Look good!

Gov 2001: Section 5 I. A Normal Example II. Uncertainty Gov 2001 Spring 2010

Intro to Uncertainty Once an ML estimates are calculated, we ll want to know how good they are. How much information does the MLE contain about the underlying parameter? How good a summary of the entire likelihood is this one point? The MLE alone isn t satisfying we need a way to quantify uncertainty.

Intro to Uncertainty (ctd.) Common ways to think about uncertainty: Likelihood ratio tests useful for comparing restricted versus unrestricted models. (UPM p. 84-86) Estimating standard errors Use normal approximation to get the standard errors of the coefficients; may be calculated by estimating only the unrestricted model (more like what Gary was talking about in class). (UPM 87-92)

Likelihood Ratio Tests Useful for when you are comparing two models. We ll call these restricted and unrestricted: Unrestricted : β 0 + β 1 X 1 Restricted : β We want to test the usefulness of the parameters in the unrestricted model but omitted in the restricted model.

Likelihood Ratio Tests (ctd.) Here s how to operationalize this: Let L be the maximum of the unrestricted likelihood, and let L r the maximum of the restricted likelihood. But adding more variables can only increase the likelihood. Thus, L L r, or L r L 1. If the likelihood ratio is exactly 1, then there s no effect of the extra parameters at all.

Likelihood Ratio Tests (ctd.) Now, let s define a test statistic: define : R = 2ln L r L = 2(lnL lnl r ) R will always be greater than zero. By definition it follows a χ 2 distribution with m degrees of freedom, where m is the number of restrictions. Key question: how much greater than zero does R have to be in order to convice us that the difference is due to systematic differences between the two models?

Likelihood Ratio Test Example Let s go back to our example. > unrestricted <- optim(par = c(0,0,0,0), fn = ll.normal, + y = Y, X = cbind(1,x,z), + method = "BFGS", control=list(fnscale=-1), hessian=t) > unrestricted$value [1] -485.8741 versus > restricted <- optim(par = c(0,0,0), fn = ll.normal, + y = Y, X = cbind(1,x), + method = "BFGS", control=list(fnscale=-1), hessian=t) > restricted$value [1] -492.2747

Likelihood Ratio Test Example (ctd.) Under the null that the restrictions are valid, the test statistic would be distributed χ 2 with one degree of freedom: > r <- 2*(unrestricted$value-restricted$value) > 1-pchisq(r, df = 1) [1] 0.0003464178 So the probability of getting this test statistic under the null is extremely small. We reject.

Using Standard Errors We can also move forward using the curvature of the likelihood curve around the MLE, which is a measure of the precision of the ML estimate. Measure of curvature: Fisher Information Matrix I (ˆθ) = 2 lnl(θ) 2 (ˆθ) θ Inverse of the Fisher Information gives us Var(ˆθ) [I (ˆθ)] 1 Var(ˆθ) Square root of Var(ˆθ) gives us SE(ˆθ) SE(ˆθ) = Var(ˆθ)

Using Standard Errors (ctd.) I (ˆθ) is based on a quadratic approximation of lnl(θ y) at ˆθ If ˆθ is normal, then the quadratic approximation will be exactly true If ˆθ is not exactly normal, then the quadratic approximation holds as n Why? Central limit theorem and sampling distribution of ˆθ

Using Standard Errors (ctd.) We can use the standard errors in a variety of ways, including to do hypothesis testing and to calculate confidence intervals. Wald s test is a generalization from t-tests from regression analysis. Here s how it works Choose a null hypothesis, H0 : θ = θ 0 ; Use that to calculate a test statistic, Z: Z = ˆθ θ 0 N(0, 1) SE(ˆθ) Then see how likely it is to see that test statistic given that the null is true.

Using Standard Errors (ctd.) Let s go back to our example: > my.opt <- optim(par = c(0,0,0,0), fn = ll.normal, + y = Y, X = cbind(1,x,z), + method = "BFGS", control=list(fnscale=-1), hessian=t) Let s get the Hessian matrix out: > my.opt$hessian [,1] [,2] [,3] [,4] [1,] -1.069522e+03-8.922910e+02-1.069601e+05 3.008466e-03 [2,] -8.922910e+02-3.939036e+04-8.900873e+04 8.031144e-02 [3,] -1.069601e+05-8.900873e+04-1.069708e+07 3.145044e-01 [4,] 3.008466e-03 8.031144e-02 3.145044e-01-4.999940e+02

Using Standard Errors (ctd.) To calculate the variance-covariance matrix: > opt.vcv <- solve(-1*my.opt$hessian) > opt.vcv [,1] [,2] [,3] [,4] [1,] 3.663626e+01-2.173033e-03-3.663080e-01-1.032219e-05 [2,] -2.173033e-03 2.600229e-05 2.151180e-05 4.632738e-09 [3,] -3.663080e-01 2.151180e-05 3.662629e-03 1.032319e-07 [4,] -1.032219e-05 4.632738e-09 1.032319e-07 2.000024e-03

Using Standard Errors (ctd.) To calculate the variances and standard errors: > vars <- diag(opt.vcv) > vars [1] 3.663626e+01 2.600229e-05 3.662629e-03 2.000024e-03 > ses <- sqrt(vars) > ses [1] 6.052789312 0.005099244 0.060519658 0.044721627

Using Standard Errors (ctd.) And, lastly, to compare this with the lm output: > results <- data.frame(my.opt$par, ses) > results my.opt.par ses 1 17.92300722 6.052789312 2 6.39448686 0.005099244 3 0.21106796 0.060519658 4-0.06721196 0.044721627 > summary(my.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 17.928169 6.061845 2.958 0.00317 ** x 6.394485 0.005107 1252.132 < 2e-16 *** z 0.211016 0.060610 3.482 0.00052 *** Rock and Roll!