Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Similar documents
Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Confidence Intervals Introduction

χ 2 distributions and confidence intervals for population variance

Statistics and Probability

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Chapter 7: Estimation Sections

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Statistics for Business and Economics

. (i) What is the probability that X is at most 8.75? =.875

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Statistical Intervals (One sample) (Chs )

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

The Bernoulli distribution

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Chapter 7: Estimation Sections

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Lecture 2. Probability Distributions Theophanis Tsandilas

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Business Statistics 41000: Probability 4

Central Limit Theorem, Joint Distributions Spring 2018

MATH 3200 Exam 3 Dr. Syring

Studio 8: NHST: t-tests and Rejection Regions Spring 2014

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Chapter 4 Continuous Random Variables and Probability Distributions

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Commonly Used Distributions

Part V - Chance Variability

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

Conjugate Models. Patrick Lam

Chapter 8 Statistical Intervals for a Single Sample

Chapter 7: Point Estimation and Sampling Distributions

Chapter 7. Inferences about Population Variances

Chapter 4 Continuous Random Variables and Probability Distributions

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

BIO5312 Biostatistics Lecture 5: Estimations

Common one-parameter models

Describing Uncertain Variables

1/2 2. Mean & variance. Mean & standard deviation

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Chapter 7 Sampling Distributions and Point Estimation of Parameters

MVE051/MSG Lecture 7

Business Statistics 41000: Probability 3

8.1 Estimation of the Mean and Proportion

Sampling Distribution

Chapter 8: Sampling distributions of estimators Sections

Chapter 9: Sampling Distributions

2011 Pearson Education, Inc

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Engineering Statistics ECIV 2305

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Chapter 5. Sampling Distributions

Converting to the Standard Normal rv: Exponential PDF and CDF for x 0 Chapter 7: expected value of x

18.05 Problem Set 3, Spring 2014 Solutions

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

IOP 201-Q (Industrial Psychological Research) Tutorial 5

STAT 825 Notes Random Number Generation

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

1 Introduction 1. 3 Confidence interval for proportion p 6

Homework Problems Stat 479

What was in the last lecture?

Statistical Tables Compiled by Alan J. Terry

Normal Probability Distributions

Data Analysis and Statistical Methods Statistics 651

Discrete Probability Distribution

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Qualifying Exam Solutions: Theoretical Statistics

Chapter 7 Study Guide: The Central Limit Theorem

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

1 Bayesian Bias Correction Model

Populations and Samples Bios 662

Tests for One Variance

Chapter 7. Sampling Distributions

Statistics for Business and Economics

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Chapter 7 1. Random Variables

The topics in this section are related and necessary topics for both course objectives.

Chapter 3 Discrete Random Variables and Probability Distributions

Data Analysis and Statistical Methods Statistics 651

Transcription:

18.443 Exam 2 Spring 2015 Statistics for Applications 4/9/2015 1. True or False (and state why). (a). The significance level of a statistical test is not equal to the probability that the null hypothesis is true. (b). If a 99% confidence interval for a distribution parameter θ does not include θ 0, the value under the null hypothesis, then the corresponding test with significance level 1% would reject the null hypothesis. (c). Increasing the size of the rejection region will lower the power of a test. (d). The likelihood ratio of a simple null hypothesis to a simple alternate hypothesis is a statistic which is higher the stronger the evidence of the data in favor of the null hypothesis. (e). If the p-value is 0.02, then the corresponding test will reject the null at the 0.05 level. Solution: T, T, F, T, T 2. Testing Goodness of Fit. Let X be a binomial random variable with n trials and probability p of success. (a). Suppose n = 100 and X = 38. Compute the Pearson chi-square statistic for testing the goodness of fit to the multinomial distribution with two cells with H 0 : p = p 0 = 0.5. (b). What is the approximate distribution of the test statistic in (a), under the null Hypothesis H 0. (c). What can you say about the P -value of the Pearson chi-square statistic in (a) using the following table of percentiles for chi-square random variables? (i.e., P (χ 2 3 q.90 = 6.25) =.90 ) df q.90 q.95 q.975 q.99 q.995 1 2.71 3.84 5.02 6.63 9.14 2 4.61 5.99 7.38 9.21 11.98 3 6.25 7.81 9.35 11.34 14.32 4 7.78 9.49 11.14 13.28 16.42 (d). Consider the general case of the Pearson chi-square statistic in (a), where the outcome X = x is kept as a variable (yet to be observed). Show that the Pearson chi-square statistic is an increasing function of x n/2. (e). Suppose the rejection region of a test of H 0 is {X : X n/2 > k} for some fixed known number k. Using the central limit theorem (CLT) 1

as an approximation to the distribution of X, write an expression that approximates the significance level of the test for given k. (Your answer can use the cdf of Z N(0, 1) : Φ(z) = P (Z z).) Solution: (a). The Pearson chi-square statistic for a multinomial distribution with (m = 2) cells is X 2 = m (O i E i ) 2 j=1 E i where the observed counts are O 1 = x = 38 and O 2 = n x = 62, and the expected counts under the null hypothesis are E 1 = n p 0 = n 1/2 = 50 and E 2 = (n x) (1 p 0 ) = (n x) (1 1/2) = 50 Plugging these in gives m X 2 (O i E i ) 2 = j=1 E i (38 50) 2 (62 50) 2 = + 50 50 144 144 288 = + = = 5.76 50 50 50 (b). The approximate distribution of X 2 is chi-squared with degrees of freedom q = dim({p, 0 p 1}) dim({p : p = 1/2}) = (m 1) 0 = 1. (c). The P -value of the Pearson chi-square statistic is the probability that a chi-square random variable with q = 1 degrees of freedom exceeds the 5.76, the observed value of the statistic. Since 5.76 is greater than q.975 = 5.02 and less than q.99 = 6.63, (the percentiles of the chi-square distribution with q = 1 degrees of freedom) we know that the P -value is smaller than (1.975) =.025 but larger than (1.99) =.01. (d). Substituing O 1 = x and O 2 = (n x) and E 1 = n p 0 = n/2 and E 2 = n (1 p 0 ) = n/2 in the formula from (a) we get X 2 = m (O i E i ) 2 j=1 E i (x n/2) 2 ((n x) n/2) 2 = + n/2 n/2 (x n/2) 2 ((n/2 x)) 2 = + n/2 n/2 (x n/2) = 2 2 n/2 4 = x n/2 2 n 2

(e). Since X is the sum of n independent Bernoulli(p) random variables, so by the CLT E[X] = np and V ar(x) = np(1 p) X N(np, np(1 p)) (approximately) n which is N( n, ) when the null hypothesis (p = 0.5) is true. 2 4 The significance level of the test is the probability of rejecting the null hypothesis when it is true which is given by: α = P (Reject H 0 H 0 ) = P ( X n/2 > k H 0 ) = P ( X n/2 > k H 0 ) n/4 n/4 k n/4 k n/4 P ( N(0, 1) > ) = 2 [1 Φ( )] 3

3. Reliability Analysis Suppose that n = 10 items are sampled from a manufacturing process and S items are found to be defective. A beta(a, b) prior 1 is used for the unknown proportion θ of defective items, where a > 0, and b > 0 are known. (a). Consider the case of a beta prior with a = 1 and b = 1. Sketch a plot of the prior density of θ and of the posterior density of θ given S = 2. For each density, what is the distribution s mean/expected value and identify it on your plot. Solution: The random variable S Binomial(n, θ). If θ beta(a = 1, b = 1), then because the beta distribution is a conjugate prior for the binomial distribution, the posterior distribution of θ given S is beta(a = a + S, b = b + (n s)) For S = 2, the posterior distribution of θ is thus beta(a = 3, b = 9) Since the mean of a beta(a, b) distribution is a/(a + b), the prior mean is 1/2 = 1/(1 + 1), and the posterior mean is 3/12 = (a + s)/(a + b + n) These densities are graphed below Γ(a + b) 1 A beta(a, b) distribution has density f Θ (θ) = θ a 1 (1 θ) b 1, 0 < θ < 1. Γ(a)Γ(b) Recall that for a beta(a, b) distribution, the expected value is a/(a + b), the variance is ab. Also, when both a > 1 and b > 1, the mode of the probability density is (a + b) 2 (a + b + 1) at (a 1)/(a + b 2), 4

Prior beta(1,1) with mean = 1/2 Posterior beta(1+2, 1+8) with mean = 3/12 density 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 theta (b). Repeat (a) for the case of a beta(a = 1, b = 10) prior for θ. Solution: The random variable S Binomial(n, θ). If θ beta(a = 1, b = 10), then because the beta distribution is a conjugate prior for the binomial distribution, the posterior distribution of θ given S is beta(a = a + S, b = b + (n s)) For S = 2, the posterior distribution of θ is thus beta(a = 3, b = 18) Since the mean of a beta(a, b) distribution is a/(a + b), the prior mean is 1/11 = 1/(10 + 1), and the posterior mean is 3/21 = (a + s)/(a + b + n) These densities are graphed below 5

Prior beta(1,10) with mean = 1/11 Posterior beta(1+2, 10+8) with mean = 3/21 density 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 theta (c). What prior beliefs are implied by each prior in (a) and (b); explain how they differ? Solution: The prior in (a) is a uniform distribution on the interval 0 < θ < 1. It is a flat prior and represents ignorance about θ such that any two intervals of θ have the same probability if they have the same width. The prior in (b) gives higher density to values of θ closer to zero. The mean value of the prior in (b) is 1/11 which is much smaller than the mean value of the uniform prior in (a) which is 1/2. (d). Suppose that X S = 1 or 0 according to whether an item is defective (X=1). For the general case of a prior beta(a, b) distribution with fixed a and b, what is the marginal distribution of X before the n = 10 sample is taken and S is observed? (Hint: specify the joint distribution of X and θ first.) Solution: The joint distribution of X and θ has pdf/cdf: f(x, θ) = f(x θ)π(θ) where f(x θ) is the pmf of a Bernoulli(θ) random variable and π(θ) is the pdf of a beta(a, b) distribution. The marginal distribution of X has pdf 6

J 1 J 1 J 1 f(x) = f(x, θ)dθ 0 = θ x (1 θ) 1 x π(θ)dθ 0 = θπ(θ)dθ, for x = 1 0 J 1 and = 1 θπ(θ)dθ for x = 0 0 J 1 That is, X is Bernoulli(p) with p = θπ(θ)dθ = E[θ prior] = a/(a+b). 0 (e). What is the marginal distribution of X after the sample is taken? (Hint: specify the joint distribution of X and θ using the posterior distribution of θ.) Solution: The marginal distribution of X afer the sample is computed using the same argument as (c), replacing the prior distribution with the posterior distribution for θ given S = s. with X is Bernoulli(p) p = J 1 0 θπ(θ S)dθ = E[θ S] = (a + s)/(a + b + n). 7

4. Probability Plots Random samples of size n = 100 were simulated from four distributions: Uniform(0, 1) Exponential(1) Normal(50, 10) Student s t (4 degrees of freedom). The quantile-quantile plots are plotted for each of these 4 samples: Uniform QQ Plot Normal QQ Plot Ordered Observations 0.0 0.4 0.8 0.0 0.2 0.4 0.6 0.8 1.0 Uniform Quantiles 30 40 50 60 70 Exponential QQ Plot Ordered Observations 0 1 2 3 4 5 Ordered Observations 40 50 60 70 Normal(0,1) Quantiles t Dist. QQ Plot Ordered Observations 4 0 2 4 6 8 0 1 2 3 4 5 Exponential(1) Quantiles 4 2 0 2 4 t (df=4) Quantiles For each sample, the values were re-scaled to have sample mean zero and sample standard deviation 1 x {x i x i, i = 1,..., 100} = {Z i = s x, i = 1,..., 100} 1 n 2 1 n where x = 1 x i and s = (x i x) 2 n x n 1 8

The Normal QQ plot for each set of standardized sample values is given in the next display but they are in a random order. For each distribution, identify the corresponding Normal QQ plot, and explain your reasoning. Uniform(0, 1) = Plot Exponential(1) = Plot Normal(50, 10) = Plot Student s t (4 degrees of freedom) = Plot Plot A Plot C Sample Quantiles 2 0 2 4 2 1 0 1 2 Theoretical Quantiles 2 1 0 1 2 Plot B Sample Quantiles 1 0 1 2 3 4 Sample Quantiles 1.5 0.5 0.5 1.5 Theoretical Quantiles Plot D Sample Quantiles 2 1 0 1 2 2 1 0 1 2 Theoretical Quantiles 2 1 0 1 2 Theoretical Quantiles Solution: The Student s t sample has two extreme high values and one extreme low value which are evident in Plot A, so Plot A = t distribution Plot B is the only plot that has a bow shape which indicates larger observations are higher than would be expected for a normal sample and smaller observations are less small than would be expected for a normal sample. This is true for the Exponential distribution which is asymmetric with a right-tail that is heavier than a normal distribution. Plot B = Exponential. 9

The Uniform(0, 1) sample has true mean 0.5 and true variance equal to E[X 2 ] (E[X]) 2 = 1/3 (1/2) 2 = 1/12. For a typical sample, the standardized sample values will be bounded (using the true mean and standard d deviation to standardize, the values would no larger than +(1.5)/ 1/12 = 1.73). For Plot C the range of the standardized values is smallest, consistent with what would be expected for a sample from a uniform distribution. Plot C = Uniform distribution. The QQ Plot for the normal distribution is unchanged and follows a straight-line pattern indicating consistency of the ordered observations with the theoretical quantiles distribution Plot D = Normal 10

5. Betas for Stocks in S&P 500 Index. In financial modeling of stock returns, the Capital Asset Pricing Model associates a Beta for any stock which measures how risky that stock is compared to the market portfolio. (Note: this name has nothing to do with the beta(a,b) distribution!) Using monthly data, the Beta for each stock in the S&P 500 Index was computed. The following display gives an index plot, histogram, Normal QQ plot for these Beta values. Index Plot of 500 Stock Betas (X) Beta (x) 0.0 1.5 0 100 200 300 400 500 Index Histogram Density 0.0 0.4 0.8 0.0 0.5 1.0 1.5 2.0 2.5 Stock Beta (x) Normal Q Q Plot Sample Quantiles 0.0 1.5 3 2 1 0 1 2 3 Theoretical Quantiles For the sample of 500 Beta values, x =1.0902 and s x =0.5053. (a). On the basis of the histogram and the Normal QQ plot, are the values consistent with being a random sample from a Normal distribution? Solution: Yes, the values are consistent with being a random sample from a Normal distribution. The normal QQ-plot is quite straight. (b). Refine your answer to (a) focusing separately on the extreme low values (smallest quantiles) and on the extreme large values (highest quantiles). Solution: Consider the extremes of the distribution. The high positive points appear a bit higher than would be expected for a normal sample suggesting there are some outlier stocks with higher betas than would be expected under a normal model. The lowest values near zero appear a bit above the straight line through most of the ordered points, suggesting 11

that the stocks with lowest beta values aren t as low as might be expected under a normal model. 12

Bayesian Analysis of a Normal Distribution. For a stock that is similar to those that are constituents of the S&P 500 index above, let X = 1.6 be an estimate of the Beta coefficient θ. Suppose that the following assumptions are reasonable: The conditional distribution X given θ is Normal with known variance: X θ Normal(θ, σ 0 2 ), where σ 0 2 = (0.2) 2. As a prior for θ, assume that θ is Normal with mean and variance equal to those in the sample θ Normal(µ prior, σ 2 ) prior where µ prior = 1.0902 anad σ prior = 0.5053 (c). Determine the posterior distribution of θ given X = 1.6. Solution: This is the case of a normal conjugate prior distribution for the normal sample observation. The posterior distribution of θ is given by where and [θ X = x] N(µ, σ 2 ) 1 1 1 σ2 = 2 + σ 0 σprior 2 1 1 ( )x+( )µ prior σ2 σ 0 prior 2 + σ2 σ 2 1 µ = 1 0 prior Plugging in values we get τ 2 = (0.186) 2 1 1 ( )1.6 + ( )1.0902 µ.2 2.5053 2 = 1 1 = 1.531 ( ) + ( ).2 2.5053 2 (d). Is the posterior mean between X and µ prior? Would this always be the case if a different value of X had been observed? (e). Is the variance of the posterior distribution for θ given X greater or less than the variance of the prior distribution for θ? Does your answer depend on the value of X? Solution: (d). Yes, the posterior mean is a weighted average of X and µ prior which will always be between the two values. (e). The variance of the posterior distribution τ 2 = (0.186) 2 is less than (.5053) 2 = σprior 2. From part (c), the posterior variance does not vary with the outcome X = x. 13

MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2015 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.