STA 532: Theory of Statistical Inference

Similar documents
درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Chapter 7: Estimation Sections

Generating Random Numbers

2 of PU_2015_375 Which of the following measures is more flexible when compared to other measures?

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Bias Reduction Using the Bootstrap

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

Introduction to Algorithmic Trading Strategies Lecture 8

Business Statistics 41000: Probability 3

ECE 295: Lecture 03 Estimation and Confidence Interval

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Computational Statistics Handbook with MATLAB

Chapter 7: Estimation Sections

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Jackknife Empirical Likelihood Inferences for the Skewness and Kurtosis

Calibration of Interest Rates

Much of what appears here comes from ideas presented in the book:

Asymptotic results discrete time martingales and stochastic algorithms

TABLE OF CONTENTS - VOLUME 2

An Improved Skewness Measure

Statistics 431 Spring 2007 P. Shaman. Preliminaries

ELEMENTS OF MONTE CARLO SIMULATION

STAT/MATH 395 PROBABILITY II

MATH 3200 Exam 3 Dr. Syring

2.1 Mathematical Basis: Risk-Neutral Pricing

Statistical analysis and bootstrapping

Moments and Measures of Skewness and Kurtosis

Probability and Random Variables A FINANCIAL TIMES COMPANY

Chapter 8: Sampling distributions of estimators Sections

Chapter 5. Statistical inference for Parametric Models

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL

Chapter 7: Point Estimation and Sampling Distributions

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

CPSC 540: Machine Learning

CPSC 540: Machine Learning

1. You are given the following information about a stationary AR(2) model:

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

On Complexity of Multistage Stochastic Programs

Relevant parameter changes in structural break models

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Lecture Data Science

1/2 2. Mean & variance. Mean & standard deviation

Probability Models.S2 Discrete Random Variables

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Limit Theorems for the Empirical Distribution Function of Scaled Increments of Itô Semimartingales at high frequencies

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

IEOR E4602: Quantitative Risk Management

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Lecture outline. Monte Carlo Methods for Uncertainty Quantification. Importance Sampling. Importance Sampling

BIO5312 Biostatistics Lecture 5: Estimations

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Application of MCMC Algorithm in Interest Rate Modeling

Course information FN3142 Quantitative finance

Market Risk Analysis Volume I

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

Introduction to Sequential Monte Carlo Methods

Chapter 7: Estimation Sections

Computer Statistics with R

Equity correlations implied by index options: estimation and model uncertainty analysis

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Commonly Used Distributions

Evidence from Large Indemnity and Medical Triangles

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Lecture 2. Probability Distributions Theophanis Tsandilas

PROBABILITY. Wiley. With Applications and R ROBERT P. DOBROW. Department of Mathematics. Carleton College Northfield, MN

Modelling financial data with stochastic processes

IEOR 165 Lecture 1 Probability Review

UPDATED IAA EDUCATION SYLLABUS

Institute of Actuaries of India Subject CT6 Statistical Methods

An Information Based Methodology for the Change Point Problem Under the Non-central Skew t Distribution with Applications.

GENERATION OF STANDARD NORMAL RANDOM NUMBERS. Naveen Kumar Boiroju and M. Krishna Reddy

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

Statistics for Business and Economics

Confidence Intervals Introduction

NCSS Statistical Software. Reference Intervals

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Stochastic Volatility (SV) Models

Chapter 5: Statistical Inference (in General)

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES

A Hidden Markov Model Approach to Information-Based Trading: Theory and Applications

Modern Methods of Data Analysis - SS 2009

Statistical estimation

Chapter 2 Uncertainty Analysis and Sampling Techniques

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Review: Population, sample, and sampling distributions

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

Some Characteristics of Data

Value at Risk Ch.12. PAK Study Manual

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Value at Risk with Stable Distributions

Transcription:

STA 532: Theory of Statistical Inference Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 2 Estimating CDFs and Statistical Functionals Empirical CDFs Let {X i : i n} be a simple random sample, i.e., let the {X i } be n iid replicates from the same probability distribution. We can t know that distribution exactly from only a sample, but we can estimate it by the empirical distribution that puts mass /n at each of the locations X i (if the same value is taken more than once, its mass will be the sum of its /n s so everything still adds up to one). The CDF ˆF n (x) = n n [Xi, )(x) i= of the empirical distribution will be piecewise-constant, with jumps of size /n at each observation point (or k/n in the event of k-way ties). Since #{i n : X i x} is just a Binomial random variable with p = F(x) for the real PDF for the {X i }, with mean np and variance np( p), it is clear that for each x R EˆF n (x) = F(x) and VˆF n (x) = F(x)[ F(x)]/n, so ˆF n (x) is an unbiased and MS consistent estimator of F(x). In fact something stronger is true not only does ˆF n (x) converge to F(x) pointwise in x, but also the supremum sup x ˆF n (x) F(x) converges to zero. There are many ways a sequence of random variables might converge (studying those is the main topic of a measure-theoretic probability course like Duke s STA 7); the Glivenko-Cantelli theorem asserts that this maximum converges with probability one. Either Hoeffding s inequality (Wassily Hoeffding was a UNC statistics professor) or the DKW inequality of Dvoetzsky, Kiefer, and Wolfowitz give the strong bound P [ sup ˆF n (x) F(x) > ǫ ] 2e 2nǫ2 x for every ǫ > 0. It follows that, for any 0 < γ <, P [ L(x) F(x) U(x) for all x R ] γ is a non-parametric confidence set for F, for L(x) := 0 (ˆFn (x) ǫ n ), U(x) := (ˆFn (x) +ǫ n ), and ǫ n := log(2/( γ))/2n. Here a b denotes the maximum of a,b R, a b the minimum.

Statistical Functionals Usually we don t want to estimate all of the CDF F for X, but rather some feature of it like its mean EX = xf(dx) or variance VX := E ( X (EX) ) 2 = x 2 F(dx) (EX) 2 or the probability [F(B) F(A)] that X lies in some interval (A,B]. Examples of Statistical Functionals Commonly-studied or quoted functionals of a univariate distribution F( ) include: The mean E[X] = µ := R xf(dx) = 0 [ F(x)]dx 0 F(x)dx, quantifying location; The qth quantile z q := inf{x < : F(x) q}, especially The median z /2, another way to quantify location; The variance V[X] = σ 2 := R (x µ)2 F(dx) = E[X 2 ] E[X] 2, quantifying spread; The skewness γ := R (x µ)3 F(dx)/σ 3, quantifying asymmetry; The (excess) kurtosis γ 2 := R (x µ)4 F(dx)/σ 4 3, quantifying peakedness. Lepto is Greek for skinny, Platy for fat, and Meso for middle; distributions are called leptokurtic (t, Poisson, exponential), platykurtic (uniform, Bernoulli), or mesokurtic (normal) as γ 2 is positive, negative, or zero, respectively. The expectation E[g(X)] = Rg(x)F(dx) for any specified problem-specific function g( ). Not all of these exist for some distributions for example, the mean, variance, skewness, and kurtosis are all undefined for heavy-tailed distributions like the Cauchy or α-stable. There are quantilebased alternative ways to quantify location, spread, asymmetry, and peakedness, however for example, the interquartile range IQR := [z 3/4 z /4 ] for spread, for example. Any of these can be estimated by the same expression computed with the empirical CDF ˆF n (x) replacing F(x), without specifying a parametric model for F. There are methods (one is the jackknife ; another, the bootstrap, is described below) for trying to estimate the mean and variance of any of these functionals from a sample {X,,X n }. Later we ll see ways of estimating the functionals that do require the assumption of particular parametric statistical models. There s something of a trade-off in deciding which approach to take. The parametric models typically give more precise estimates and more powerful tests, if their underlying assumptions are correct. BUT, the non-parametric approach will give sensible (if less precise) answers even if those assumptions fail. In this way they are said to be more robust. Simulation The Bootstrap One way to estimate the probability distribution of a functional T n (X) = T(X,...,X n ) of n iid replicates of a random variable X F(dx), called the bootstrap (Efron, 979; Efron and Page 2

Tibshirani, 993), is to approximate it by the empirical distribution of T n ( ˆX) based on draws with replacement fromasample{x,...,x n }ofsizen. Theunderlyingideaisthatthesewouldbedrawn from exactly the right distribution of T(X) if we could possibly repeat draws of X = (X,...,X n ) from the population; if the sample is large enough, we can hope that the empirical distribution will be close to the population distribution, and so the bootstrap sample will be much like a true random sample from the population (but without the expense of drawing new data). Bootstrap Variance For example, the population median M = T(F) := inf{x R : F(x) /2} might be estimated by the sample median M n = T(ˆF n ), but how precise is that estimate? One measure would be its standard error se(m n ) := { E M n M 2} /2 but to calculate that would require knowing the distribution of X, while we only have a sample. The Bootstrap approach is to use some number B of repeated draws with replacement of size n from this sample as if they were draws from the population, and estimate { } /2 B ŝe(m n ) Mn b B ˆM n 2 b= where ˆM n is the sample average of the B medians {M b n}. Bootstrap Confidence Interval estimates [L,U] of a real-valued parameter θ, intended to cover θ with probability at least 00γ% for any θ, can also be constructed using a bootstrap approach. One way to do that is to begin with an iid samplex = {X,...,X n } from theuncertain distributionf; draw B independent size-n draws with replacement from the sample X; for each, compute the statistic T n (X b ); and set L and U to the (α/2) and ( α/2) quantiles of {T n (X b )}, respectively, for α = ( γ). Wasserman (2004, 8.3) argues why this should work and gives two alternatives. Bayesian Simulation Bayesian Bootstrap Rubin (98) introduced the Bayesian bootstrap (BB), a minor variation on the bootstrap that leads to a simulation of the posterior distribution of the parameter vector θ governing a distribution F( θ) in a parametric family, from a particular (and, in Rubin s view, implausible) improper prior distribution. This five-page paper is a good read, and argues that neither the BB nor the original bootstrap is suitable as a general inferential tool because of its implicit use of this prior. Page 3

Importance Sampling Most Bayesian analyses require the evaluation of one or more integrals, often in moderately highdimensional spaces. For example: if π(θ) is a prior density function on Θ R d, and if L(θ X) is the likelihood function for some observed quantity X X, then the posterior expectation of any function g : Θ R is given by the ratio E[g(θ) X] = Θg(θ)L(θ X)π(θ)dθ Θ L(θ X)π(θ)dθ. (a) Often the integrals in both numerator and denominator are intractable analytically, so we must resorttonumericalapproximation. Letf(θ)beanypdfsuchthattheratiow(θ) := L(θ X)π(θ)/f(θ) is bounded (for this, f(θ) must have fatter tails than L(θ X)π(θ)), and let {θ m } be iid replicates from the distribution with pdf f(θ). Then = Θg(θ)w(θ)f(θ)dθ w(θ)f(θ)dθ = lim Θ M M m= g(θ m)w(θ m ) M m= w(θ m ) so E[g(θ) X] can be evaluate as the limit of weighted averages of g( ) at the simulated points {θ m }. Provided that Θ g(θ)2 f(θ)dθ <, the mean-square error of the sequence of approximations in (b) will be bounded by σ 2 /M for a number σ 2 that can also be estimated from the same Monte Carlo sample {θ m }, giving a simple measure of precision for this estimate. This simulation-based approach to estimating integrals, called Monte Carlo Importance sampling, works well in dimensions up to six or seven or so. A number of ways have been discovered and exploited to reduce the stochastic error bound σ/ M. These include antithetic variables, in which the iid sequence {θ m } is replaced by a sequence of negatively-correlated pairs; control variates, in which one tries to estimate [g(θ) h(θ)] for some quantity h whose posterior mean is known; and sequential MC, in which the sampling function f(θ) is periodically replaced by a better one. (b) MCMC A similar approach to () that succeeds in many higher-dimensional problems is Markov Chain Monte Carlo, based on sample averages of {g(θ m ) : m < } for an ergodic sequence {θ m } constructed so that it has stationary distribution π(θ X). You ll see much more about that in other courses at Duke, so we won t focus on it here. Particle Methods, Adaptive MCMC, Variational Bayes,... There are a number of variations on MCMC methods, as well. Some of these involve averaging {g(θ m (k) ) : m < } for a number of streams θ m (k) (here the streams are indexed by k), possibly by a variable number of streams whose distributions may evolve through the computation. This is an area of active research; ask any Duke statistics faculty member if you re interested. Page 4

References Efron, B. (979), Bootstrap methods: Another look at the jackknife, Annals of Statistics, 7, 26, doi:0.24/aos/76344552.40. Efron, B. and Tibshirani, R. J.(993), An Introduction to the Bootstrap, Boca Ratan, FL: Chapman & Hall/CRC. Rubin, D. B. (98), The Bayesian Bootstrap, Annals of Statistics, 9, 30 34. Wasserman, L. (2004), All of Statistics, New York, NY: Springer-Verlag. Last edited: October 20, 207 Page 5