STATS 200: Introduction to Statistical Inference. Lecture 4: Asymptotics and simulation

Similar documents
Business Statistics 41000: Probability 4

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Elementary Statistics Lecture 5

STAT Chapter 7: Central Limit Theorem

STAT 241/251 - Chapter 7: Central Limit Theorem

Discrete Random Variables

Central Limit Theorem, Joint Distributions Spring 2018

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

6 Central Limit Theorem. (Chs 6.4, 6.5)

Sampling and sampling distribution

Probability. An intro for calculus students P= Figure 1: A normal integral

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Lecture 8 - Sampling Distributions and the CLT

Probability Models. Grab a copy of the notes on the table by the door

Section The Sampling Distribution of a Sample Mean

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

4 Random Variables and Distributions

Chapter 8: The Binomial and Geometric Distributions

Data Analysis and Statistical Methods Statistics 651

The Binomial Distribution

Business Statistics 41000: Probability 3

Homework Assignments

INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics

Chapter 7 Study Guide: The Central Limit Theorem

4.3 Normal distribution

5.3 Statistics and Their Distributions

STA 320 Fall Thursday, Dec 5. Sampling Distribution. STA Fall

2011 Pearson Education, Inc

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

CPSC 540: Machine Learning

STAT/MATH 395 PROBABILITY II

STOR Lecture 7. Random Variables - I

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Chapter 7: Point Estimation and Sampling Distributions

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

18.05 Problem Set 3, Spring 2014 Solutions

Ch4. Variance Reduction Techniques

(Practice Version) Midterm Exam 1

Statistics and Probability

Data Analysis and Statistical Methods Statistics 651

Activity #17b: Central Limit Theorem #2. 1) Explain the Central Limit Theorem in your own words.

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Statistics, Their Distributions, and the Central Limit Theorem

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

5.2 Random Variables, Probability Histograms and Probability Distributions

Chapter 3 - Lecture 4 Moments and Moment Generating Funct

Probability Distributions for Discrete RV

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Statistical Methods for NLP LT 2202

Binomial distribution

BIO5312 Biostatistics Lecture 5: Estimations

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

Probability mass function; cumulative distribution function

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

Module 4: Probability

Chapter 5. Statistical inference for Parametric Models

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

A.REPRESENTATION OF DATA

Engineering Statistics ECIV 2305

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Chapter 7. Sampling Distributions and the Central Limit Theorem

Lecture 10 - Confidence Intervals for Sample Means

Math 489/Math 889 Stochastic Processes and Advanced Mathematical Finance Dunbar, Fall 2007

Chapter 5. Sampling Distributions

4.2 Bernoulli Trials and Binomial Distributions

Chapter 7. Sampling Distributions and the Central Limit Theorem

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Random Variables. 6.1 Discrete and Continuous Random Variables. Probability Distribution. Discrete Random Variables. Chapter 6, Section 1

MAS115: R programming Lecture 3: Some more pseudo-code and Monte Carlo estimation Lab Class: for and if statements, input

Business Statistics 41000: Homework # 2

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Hand and Spreadsheet Simulations

CS 237: Probability in Computing

HUDM4122 Probability and Statistical Inference. February 23, 2015

Welcome to Stat 410!

CPSC 540: Machine Learning

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

STAT 830 Convergence in Distribution

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Business Statistics Midterm Exam Fall 2013 Russell

Probability Distributions: Discrete

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Statistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015

Statistical Computing (36-350)

Section Distributions of Random Variables

Discrete Random Variables

STAT 201 Chapter 6. Distribution

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

A useful modeling tricks.

Lecture 9 - Sampling Distributions and the CLT

ECON 214 Elements of Statistics for Economists 2016/2017

Transcription:

STATS 200: Introduction to Statistical Inference Lecture 4: Asymptotics and simulation

Recap We ve discussed a few examples of how to determine the distribution of a statistic computed from data, assuming a certain probability model for the data. For example, last lecture we showed the following results: If X 1,...,X n IID N(0, 1), then X N 0, 1 n, X 2 1 +...+ X 2 n 2 n.

Reality check For many (seemingly simple) statistics, it s di PMF or PDF exactly. For example: cult to describe its 1. Suppose X 1,...,X 100 IID Uniform( 1, 1). What is the distribution of X? 1 2. Suppose (X 1,...,X 6 ) Multinomial 500, 6,..., 1 6.What is the distribution of T = X1 500 1 2 X6 +...+ 6 500 1 2? 6 For questions that we don t know how to answer exactly, we ll try to answer them approximately.

Sample mean of IID uniform If we fully specify the distribution of data, then we can always simulate the distribution of any statistic: nreps = 10000 sample.mean = numeric(nreps) n = 100 for (i in 1:nreps) { X = runif(n, min=-1, max=1) sample.mean[i] = mean(x) } hist(sample.mean)

Sample mean of IID uniform Histogram of sample.mean Frequency 0 500 1500 2500 0.2 0.1 0.0 0.1 0.2 sample.mean

Is your friend cheating you in dice? nreps = 10000 T = numeric(nreps) n = 500 p = c(1/6,1/6,1/6,1/6,1/6,1/6) for (i in 1:nreps) { X = rmultinom(1,n,p) T[i] = sum((x/n-p)^2) } hist(t)

Is your friend cheating you in dice? Histogram of T Frequency 0 500 1000 1500 2000 0.000 0.002 0.004 0.006 0.008 T

Asymptotic analysis Oftentimes, a very good approximate answer emerges when n is large (in other words, you have many samples). We call results that rely on this type of approximation asymptotic. If we can just simulate, why do asymptotic analysis? 1. Better understanding of the behavior. (Understanding the assumptions: What if X i are not uniform? What if I don t really know the distribution of X i? Understanding the scaling: What if n = 1000 instead of 100? What if n =1,000,000?) 2. Faster to get an answer.

(Weak) Law of Large Numbers Theorem (LLN) Suppose X 1,...,X n are IID, with E[X 1 ]=µ and Var[X 1 ] < 1. Let X n = 1 n (X 1 +...+ X n ).Then,foranyfixed">0, asn!1, P[ X n µ >"]! 0.

(Weak) Law of Large Numbers Theorem (LLN) Suppose X 1,...,X n are IID, with E[X 1 ]=µ and Var[X 1 ] < 1. Let X n = 1 n (X 1 +...+ X n ).Then,foranyfixed">0, asn!1, P[ X n µ >"]! 0. A sequence of random variables {T n } 1 n=1 converges in probability to a constant c 2 R if, for any fixed ">0, as n!1, P[ T n c >"]! 0. So the LLN says X n! µ in probability.

Central Limit Theorem Theorem (CLT) Suppose X 1,...,X n are IID, with E[X 1 ]=µ and Var[X 1 ]= 2 < 1. Let X n = 1 n (X 1 +...+ X n ).Then,forany fixed x 2 R, asn!1, apple pn Xn µ P apple x! (x), where is the CDF of the N (0, 1) distribution.

Theorem (CLT) Central Limit Theorem Suppose X 1,...,X n are IID, with E[X 1 ]=µ and Var[X 1 ]= 2 < 1. Let X n = 1 n (X 1 +...+ X n ).Then,forany fixed x 2 R, asn!1, where P apple pn Xn µ apple x! (x), is the CDF of the N (0, 1) distribution. {T n } 1 n=1 converges in distribution to a probability distribution with CDF F if, for every x 2 R where F is continuous, as n!1, P[T n apple x]! F (x). We sometimes write T n! Z in distribution, where Z is a random variable having this distribution F.So the CLT says p n X n µ! Z in distribution where Z N(0, 1).

The Di erence is in Scaling How can the same statistic X n converge both in probability and in distribution? The di erence is in scaling: X 1,...,X 100 Uniform( 1, 1). X 100 across 10000 simulations: Histogram of sample.mean Frequency 0 2000 4000 6000 8000 3 2 1 0 1 2 3 sample.mean This illustrates the LLN, that is, X n! 0 in probability.

The Di erence is in Scaling Here s the exact same histogram, on a di erent scale: Histogram of sample.mean Frequency 0 500 1500 2500 0.2 0.1 0.0 0.1 0.2 sample.mean This illustrates the CLT, that is, p 3n X n!n(0, 1) in distribution. (Here Var[X 1 ]= 1 3.)

Sample mean of IID uniform By the CLT, the distribution of X n is approximately N 0, 1 3n. How good is this approximation? Here s a comparison of CDF values, for sample size n = 10: Normal Exact 0.01 0.009 0.25 0.253 0.50 0.500 0.75 0.747 0.99 0.991 It s already very close! In general, accuracy depends on I Sample size n, I Skewness of the distribution of X i,and I Heaviness of tails of the distribution of X i Using www.math.uah.edu/stat/apps/specialcalculator.html

Multivariate generalizations Consider X =(X 1,...,X k ) 2 R k (with some k-dimensional joint distribution), and let µ i = E[X i ], ii =Var[X i ], ij = Cov[X i, X j ]. Let X (1),...,X (n) 2 R k be IID, each with the same joint distribution as X. Let X n = 1 n (X(1) +...+ X (n) ) 2 R k. For example: We measure the height and weight of n randomly chosen people. X (l) 2 R 2 is the height and weight of person l. Height is not independent of weight for the same person, but let s assume they are IID across di erent people. X n 2 R 2 is the average height and average weight of the n people.

Multivariate generalizations Theorem (LLN) As n!1, X n converges in probability to µ. Theorem (CLT) As n!1, p n( X n µ) converges in distribution to the multivariate normal distribution N (0, ). (We say a sequence {T n} 1 n=1 of random vectors in R k converges in probability to µ 2 R k if P[kT n µk >"]! 0 for any ">0, where k k is the vector length. We say {T n} 1 n=1 converges in distribution to Z if, for any set A R k such that Z belongs to the boundary of A with probability 0, P[T n 2 A]! P[Z 2 A].)

Approximating the multinomial distribution for large n Suppose (Y 1,...,Y 6 ) Multinomial n, 1 6,..., 1 6. Y represents the number of times we obtain 1 through 6 when rolling a 6-sided die n times. For each l =1,...,n, letx (l) =(1, 0, 0, 0, 0, 0) if we got 1 on the l th roll, (0, 1, 0, 0, 0, 0) if we got 2 on the l th roll, etc. Then (Y 1,...,Y 6 )=X (1) +...+ X (n). Let s apply the (multivariate) LLN and CLT!

Approximating the multinomial distribution for large n Let s write X (1) =(X 1,...,X 6 ), so X 1,...,X 6 are random variables where exactly one them equals 1 (and the rest equal 0). Then: E[X i ] = P[X i = 1] = 1 6,

Approximating the multinomial distribution for large n Let s write X (1) =(X 1,...,X 6 ), so X 1,...,X 6 are random variables where exactly one them equals 1 (and the rest equal 0). Then: E[X i ] = P[X i = 1] = 1 6, Var[X i ] = E[X 2 i ] (E[X i ]) 2 = 1 6 1 2 = 5 6 36,

Approximating the multinomial distribution for large n Let s write X (1) =(X 1,...,X 6 ), so X 1,...,X 6 are random variables where exactly one them equals 1 (and the rest equal 0). Then: E[X i ] = P[X i = 1] = 1 6, Var[X i ] = E[X 2 i ] (E[X i ]) 2 = 1 6 Cov[X i, X j ] = E[X i X j ] E[X i ]E[X j ]=0 for i 6= j 1 2 = 5 6 36, 1 2 = 1 6 36.

Approximating the multinomial distribution for large n By the LLN, as n!1, Y1 n,...,y 6 1! n 6,...,1 6 in probability. By the CLT, as n!1, p n Y1 n 1 6,...,Y 6 n 1 6!N(0, ) in distribution, where 0 = B @ 5 36 1 36. 1 36 1 1 36 36 5 1 36 36..... 1 5 36 36 1 C A 2 R6 6. (The negative values of ij for i 6= j mean Y i and Y j are, as expected, slightly anti-correlated.)

Continuous mapping The LLN and CLT can be used as building blocks to understand other statistics, via the Continuous Mapping Theorem: Theorem If T n! c in probability, then g(t n )! g(c) in probability for any continuous function g. If T n! Z in distribution, then g(t n )! g(z) in distribution for any continuous function g. (These hold in both the univariate and multivariate settings.)

Is your friend cheating you in dice? Recall Y1 nt n = n n 1 2 Y6 +...+ n 6 n The function g(x 1,...,x 6 )=x 2 1 +...+ x 2 6 nt n! Z 2 1 +...+ Z 2 6. in distribution, where (Z 1,...,Z 6 ) N(0, ). 1 2. 6 is continuous, so Hence, when n is large, the distribution of T n is approximately that of 1 n (Z 2 1 +...+ Z 2 6 ).

Is your friend cheating you in dice? Recall Y1 nt n = n n 1 2 Y6 +...+ n 6 n The function g(x 1,...,x 6 )=x 2 1 +...+ x 2 6 nt n! Z 2 1 +...+ Z 2 6. in distribution, where (Z 1,...,Z 6 ) N(0, ). 1 2. 6 is continuous, so Hence, when n is large, the distribution of T n is approximately that of 1 n (Z 2 1 +...+ Z 2 6 ). Finally, what is the distribution of Z 2 1 +...+ Z 2 6?

Is your friend cheating you in dice? Using bilinearity of covariance, it is easy to show that if W 1,...,W 6 IID N(0, 1), then 1 p 6 (W 1 W,...,W 6 W ) N(0, ). (Here W = 1 6 (W 1 +...+ W 6 ).)

Is your friend cheating you in dice? Using bilinearity of covariance, it is easy to show that if W 1,...,W 6 IID N(0, 1), then 1 p 6 (W 1 W,...,W 6 W ) N(0, ). (Here W = 1 6 (W 1 +...+ W 6 ).) So Z 2 1 +...+ Z 2 6 has the same distribution as 1 6 (W 1 W ) 2 +...+(W 6 W ) 2. This is the sample variance of 6 IID standard normals, which we will show next week has distribution 1 6 2 5. Conclusion: T n has approximate distribution 1 6n 2 5.

Is your friend cheating you in dice? Here s our simulated histogram of T n, overlaid with the (appropriately rescaled) PDF of the 1 6n 2 5 distribution: Histogram of T Frequency 0 500 1000 1500 2000 2500 0.000 0.002 0.004 0.006 0.008 T