Computer Statistics with R - PDF Free Download

MAREK GAGOLEWSKI KONSTANCJA BOBECKA-WESO LOWSKA PRZEMYS LAW GRZEGORZEWSKI Computer Statistics with R 5. Point Estimation Faculty of Mathematics and Information Science Warsaw University of Technology [] Copyright 2009 2012 Marek G agolewski This work is licensed under a Creative Commons Attribution 3.0 Unported License.

CONTENTS 0 Contents 5.1 Examples..................................... 1 5.2 Exercises..................................... 5 5.3 Hints and Answers................................ 7 Info These tutorials are likely to contain bugs and typos. hesitate to contact us! Thanks in advance! In case you find any don t

5.1. EXAMPLES 1 5.1. Examples Remarks. MVUE = minimum-variance unbiased estimator, MME = method-of-moments estimator, MQE = method-of-quantiles estimator, MLE = maximal-likelihood estimator. Ex. 5.1. Generate a random sample Y = (Y 1,..., Y n, n = 500, from the standard Cauchy distribution C(a, b with a = 0, b = 1. 1. Compute mean(y i and median(y i of each subsample Y i = (Y 1,..., Y i, i = 1,..., n, and plot the following sets of points: {(i, mean(y i : i = 1,..., n}, and {(i, median(y i : i = 1,..., n}. Comment whether the sample mean and the sample median may be used as reasonable estimators of the a parameter. 2. Compute the standard deviation sd(y i and the half interquartile range IQR(Y i /2. May these sample statistics be considered as well-behaving estimators of the b parameter? Solution. First let us compare the two estimators of the location parameter a = 0. n <- 500 X <- rcauchy(n # sample of size n from the standardized Cauchy distribution mn <- numeric(n # means of all subsamples md <- numeric(n # medians of all subsamples for (i in 1:n { } mn[i] <- mean(x[1:i] md[i] <- median(x[1:i] plot(1:i, mn, type="l", col="blue", xlab="", ylab="", lty=1 lines(1:i, md, col="red", lty=2 abline(h=0, col="gray", lty=3 legend("bottom", c("mean.i", "median.i", "a=0", col=c("blue", "red", "gray", lty=1:3

5.1. EXAMPLES 2-8 -6-4 -2 0 mean.i median.i a=0 0 100 200 300 400 500 Next we compare the two estimators of the scale parameter b = 1. n <- 500 X <- rcauchy(n s <- numeric(n-1 r <- numeric(n-1 for (i in 2:n { s[i-1] <- sd(x[1:i] r[i-1] <- IQR(X[1:i]*0.5 } plot(2:i, s, type="l", col="blue", xlab="", ylab="", ylim=c(0,max(s lines(2:i, r, col="red", lty=2 abline(h=1, col="gray", lty=3 legend("topleft", c("s.i", "r.i", "b=1", col=c("blue", "red", "gray", lty=1:3 0 2 4 6 8 10 s.i r.i b=1 0 100 200 300 400 500 Draw conclusions.

5.1. EXAMPLES 3 Ex. 5.2. We are given an i.i.d. random sample of size n = 20 from the uniform distribution U ([0, θ], where θ = 1. Determine method of moments (MME, and maximum likelihood (MLE estimators of θ. Compare their bias and mean square errors. Solution. Let X = (X 1,..., X n be a sample from the uniform distribution U ([0, θ]. It may be shown that the estimators of θ are of the form: The MME estimator: The MLE estimator: ˆθ 1 = 2 X, (5.1 ˆθ 2 = X n:n, (5.2 The minimum variance unbiased (MVUE estimator: ˆθ 3 = n + 1 n X n:n = n + 1 n ˆθ 2. (5.3 Moreover, the bias b (ˆθ = E (ˆθ θ = E (ˆθ θ (5.4 and the mean square errors (MSE of the above estimators are as follows: 2 [ (ˆθ] 2 MSE (ˆθ = E (ˆθ θ = Var (ˆθ + b (5.5 i b (ˆθi MSE (ˆθi 1 0 θ 2 3n 2 θ n+1 2θ 2 (n+1(n+2 3 0 θ 2 n(n+2 Let θ = 1. We will illustrate how to approximate these values using MC simulation. Let us generate m = 10000 random samples of size n = 20 from the uniform distribution U ([0, 1] and calculate the value of the three estimators of θ. The results will be stored in a m 3 matrix. m <- 10000 n <- 20 theta <- 1 results <- replicate(m, { } X <- runif(n, 0, theta c(2 * mean(x, max(x, max(x * (n + 1/n # result of a single experiment results[, 1:5] # show the results of the first 5 experiments: ## [,1] [,2] [,3] [,4] [,5] ## [1,] 0.9671 1.0245 1.1168 1.0611 0.9359 ## [2,] 0.9430 0.9965 0.9438 0.9623 0.8970 ## [3,] 0.9902 1.0464 0.9910 1.0104 0.9418 Let us compute the empirical bias and mean square errors of our estimators. empirical bias: The

5.1. EXAMPLES 4 mean(results[1, ] - theta ## [1] 0.0009342 mean(results[2, ] - theta ## [1] -0.04826 mean(results[3, ] - theta ## [1] -0.0006718 The empirical MSEs: var(results[1, ] + (mean(results[1, ] - theta^2 ## [1] 0.01642 var(results[2, ] + (mean(results[2, ] - theta^2 ## [1] 0.004485 var(results[3, ] + (mean(results[3, ] - theta^2 ## [1] 0.002377 Compare them to the theoretical (exact bias and mean square errors of our estimators: The bias of b (ˆθ2 : -theta/(n + 1 ## [1] -0.04762 MSE (ˆθ1 : (theta^2/(3 * n ## [1] 0.01667 MSE (ˆθ2 : 2 * (theta^2/(n + 1/(n + 2 ## [1] 0.004329 MSE (ˆθ3 : (theta^2/n/(n + 2 ## [1] 0.002273 Monte Carlo simulation methods are often used to examine the properties of estimators in cases where analytic solutions are unavailable. By the law of large numbers, we expect that as m increases, the approximation error becomes smaller.

5.2. EXERCISES 5 5.2. Exercises Preliminaries Ex. 5.3. Generate a random sample Y = (Y 1,..., Y n, n = 500, from the standard Normal distribution N(µ, σ with µ = 0, σ = 1. 1. Compute mean(y i and median(y i of each subsample Y i = (Y 1,..., Y i, i = 1,..., n, and plot the following sets of points: {(i, mean(y i : i = 1,..., n}, and {(i, median(y i : i = 1,..., n}. Comment whether we can use the sample mean and the sample median as reasonable estimators of µ. 2. Compute the scaled interquartile range IQR(Y i /1.35 and the standard deviation sd(y i. May these sample statistics be considered as well-behaving estimators of σ? Ex. 5.4. Given a statistical space (X, σ(x, P = {P θ : θ Θ} n, show that if θ is an unbiased estimator of θ, then MSE θ θ = Var θ ( θ. Ex. 5.5. Given a sequence (X 1,..., X n of i.i.d. random variables from the Exponential distribution Exp(λ, show that the statistic T (X 1,..., X n = n i=1 Xi 2 /2n is an unbiased estimator of the distribution variance. Ex. 5.6. Given a sequence (X 1,..., X n of i.i.d. random variables from the Uniform distribution U[a, a + 1], check whether the statistic T (X 1,..., X n = n i=1 X i /n 0.5 is an unbiased estimator of the parameter a. Ex. 5.7. Let X = (X 1, X 2,..., X n denote an i.i.d. random sample with finite (known expectation µ and finite (unknown variance σ 2. Let s 2 = 1 n (X i µ 2. (5.6 n i=1 Show that s 2 is an unbiased estimator of variance σ 2. Ex. 5.8. Let X = (X 1, X 2,..., X n denote a sample of i.i.d. random variables with finite variance σ 2. Show that 1. s 2 n = 1 ( ni=1 2 n X i X is a biased estimator of σ 2, and 2. s 2 = 1 ( ni=1 2 n 1 X i X is an unbiased estimator of σ 2. Ex. 5.9. Let T 1 = T 1 (X 1, X 2,..., X n and T 2 = T 2 (X 1, X 2,..., X n denote two independent unbiased estimators of a parameter θ. 1. Show that for each γ [0, 1], the statistic T = γt 1 + (1 γt 2 is also an unbiased estimator of θ. 2. Find such γ that leads to the smallest mean squared error of θ. Ex. 5.10. Let X = (X 1, X 2,..., X n denote a sample of independent Bern(θ-distributed random variables, θ (0, 1. Show that: 1. θ(x = i X i/n is an unbiased estimator of θ, 2. T (X = n n 1 ( θ(x θ(x 2 is an unbiased estimator of g(θ = θ(1 θ.

5.2. EXERCISES 6 Ex. 5.11. Let X = (X 1, X 2,..., X n denote an i.i.d. sample from the Poisson distribution with a parameter λ > 0. Show that X = 1 ni=1 n X i is a consistent MVUE estimator of λ. Ex. 5.12. Let X = (X 1, X 2,..., X n denote an i.i.d. random sample from the normal distribution with finite (known expectation µ and finite (unknown variance σ 2. Show that s 2 is a MVUE estimator of σ 2. Show that s 2 is an unbiased, but only asymptotically effective estimator of σ 2. Ex. 5.13. Let X = (X 1, X 2,..., X n denote an i.i.d. random sample from the exponential distribution with a parameter λ > 0. Let g(λ = 1/λ. Show that X is a consistent UMVU estimator of g(λ. Ex. 5.14. Let X = (X 1, X 2,..., X n denote an i.i.d. random sample from the exponential distribution with a parameter λ > 0. Let g(λ = 1/λ. Show that although T (X = nx 1:n is an unbiased estimator of g(λ, it is not consistent. Ex. 5.15. Let X = (X 1, X 2,..., X n denote an i.i.d. random sample from the normal distribution N(µ, σ. Compare the effectiveness of two estimators of µ the sample average and sample median. MLE, MME, MQE Ex. 5.16. A terrorist tapped the line of a very important person (V.I.P.. He noted down the times between successive calls [in minutes] of his victim: 1.74, 21.26, 11.19, 6.64, 0.07, 8.67, 43.25, 55.03, 7.64, 12.54, 39.19, 3.42, 4.89, 2.32, 71.24. The attacker suspects that the V.I.P. is going to make a very important call (V.I.C.. Unfortunately, he needs to go to the toilet!!! He calls you because he is truly in state of emergency and you are famous of being a great statistician. You assume the data follow exponential distribution with an unknown parameter λ > 0. 1. Find estimators of λ using MME, MLE and MQE. 2. Calculate the probability that the V.I.P. will make a V.I.C. in next 5 minutes. Ex. 5.17. Find the MME and MLE of the parameter λ > 0 of the Poisson distribution. Ex. 5.18. Find the MME and MLE of the probability of success θ, in n independent Bernoulli trials. Ex. 5.19. Generate a random sample of size 25 for the uniform distribution U[0, θ], where θ = 1. Find the MME, MLE and a unbiased estimator of θ. Compare their bias, MSE and variance. Evaluate the estimators for the sample. Ex. 5.20. Find the MME, MLE and MQE of mean and variance of the normal distribution. Ex. 5.21. Find the MME and MLE of the shape a and scale b parameters of the gamma distribution Γ(a, b. Ex. 5.22. Let X = (X 1, X 2,..., X n denote an i.i.d. random sample from the Rayleigh distribution with unknown parameter σ > 0. The probability distribution function is: f(x = x exp( x2 for x 0. EX σ 2 2σ 2 i = σ 0.5π for i = 1,..., n. Find the MME and MLE of σ and compute their values for the following data: 4.93, 2.79, 7.19, 10.50, 9.19, 13.13, 0.47, 5.89, 9.90, 11.70, 6.83, 9.36, 2.04, 6.33. Ex. 5.23. Let (X 1,..., X n denote the sequence of observations of proper operation times of n devices, each working independently. We assume that the time of proper work is exponentially distributed with an unknown parameter θ. The devices are not observed continuously, but the measurements are performed at discrete moments 1, 2,..., k. Hence, we observe only Y 1,..., Y n where { i if i 1 < Xj i, for some i = 1,..., k, Y j = k + 1 if X j > k,

5.3. HINTS AND ANSWERS 7 where j = 1,..., n. Let N i = # {j : Y j = i}, i = 1,..., k+1. Find the maximum likelihood estimator of the parameter θ. Perform the calculations for n = 10, k = 2 and N 1 = 5, N 2 = 2 i N 3 = 3. 5.3. Hints and Answers Hint to Ex. 5.12. n s 2 σ 2 χ 2 n Γ(n/2, 1/2. Hint to Ex. 5.12. (n 1s 2 σ 2 χ 2 n 1. Hint to Ex. 5.12. X N(µ, σ Y = X µ σ N(0, 1. Then EY 2k+1 = 0 and EY 2k = 1 3... (2k 1. Hint to Ex. 5.15. πσ If X N(µ, σ, then Med X N(µ, 2 2n.