(# of die rolls that satisfy the criteria) (# of possible die rolls)

Similar documents
STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

LAB 2 Random Variables, Sampling Distributions of Counts, and Normal Distributions

Probability and distributions

Lab #7. In previous lectures, we discussed factorials and binomial coefficients. Factorials can be calculated with:

Module 4: Probability

4 Random Variables and Distributions

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

CHAPTER 5 Sampling Distributions

MATH 264 Problem Homework I

Chapter 11: Inference for Distributions Inference for Means of a Population 11.2 Comparing Two Means

Chapter 5. Sampling Distributions

Lecture 39 Section 11.5

Binomal and Geometric Distributions

Binomial and Geometric Distributions

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions:

STAT Lab#5 Binomial Distribution & Midterm Review

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

MATH 10 INTRODUCTORY STATISTICS

Central Limit Theorem

Statistics and Probability

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

One sample z-test and t-test

Lecture 4: The binomial distribution

Distribution. Lecture 34 Section Fri, Oct 31, Hampden-Sydney College. Student s t Distribution. Robb T. Koether.

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Binomial and Normal Distributions

Statistics for Business and Economics

Binomial Distributions

Chapter 6: Random Variables and Probability Distributions

χ 2 distributions and confidence intervals for population variance

ECON 214 Elements of Statistics for Economists 2016/2017

Math 361. Day 8 Binomial Random Variables pages 27 and 28 Inv Do you have ESP? Inv. 1.3 Tim or Bob?

11.5: Normal Distributions

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

MATH 446/546 Homework 1:

Statistical Methods in Practice STAT/MATH 3379

Class 11. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 6: Discrete Probability Distributions

Math 160 Professor Busken Chapter 5 Worksheets

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

The normal distribution is a theoretical model derived mathematically and not empirically.

The Binomial Probability Distribution

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Lecture 8 - Sampling Distributions and the CLT

Financial Economics. Runs Test

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

BIOS 4120: Introduction to Biostatistics Breheny. Lab #7. I. Binomial Distribution. RCode: dbinom(x, size, prob) binom.test(x, n, p = 0.

S = 1,2,3, 4,5,6 occurs

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

4.2 Bernoulli Trials and Binomial Distributions

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Statistics 13 Elementary Statistics

Lecture 10 - Confidence Intervals for Sample Means

Midterm Exam III Review

The Two-Sample Independent Sample t Test

1. Confidence Intervals (cont.)

Stat 20: Intro to Probability and Statistics

Statistics 6 th Edition

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

AP Statistics Ch 8 The Binomial and Geometric Distributions

Exercise Questions: Chapter What is wrong? Explain what is wrong in each of the following scenarios.

Section Introduction to Normal Distributions

Binomial Random Variables

Have you ever wondered whether it would be worth it to buy a lottery ticket every week, or pondered on questions such as If I were offered a choice

I. Standard Error II. Standard Error III. Standard Error 2.54

5.1 Personal Probability

4.1 Probability Distributions

Lecture 7 Random Variables

Lecture 9. Probability Distributions. Outline. Outline

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

MAS187/AEF258. University of Newcastle upon Tyne

Central Limit Theorem (cont d) 7/28/2006

Part V - Chance Variability

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Lecture 9. Probability Distributions

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

It is common in the field of mathematics, for example, geometry, to have theorems or postulates

ECON 214 Elements of Statistics for Economists

STAT 1220 FALL 2010 Common Final Exam December 10, 2010

E509A: Principle of Biostatistics. GY Zou

MAKING SENSE OF DATA Essentials series

Chapter 4. Section 4.1 Objectives. Random Variables. Random Variables. Chapter 4: Probability Distributions

x is a random variable which is a numerical description of the outcome of an experiment.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Honor Code: By signing my name below, I pledge my honor that I have not violated the Booth Honor Code during this examination.

Lecture 8: Single Sample t test

C.10 Exercises. Y* =!1 + Yz

(j) Find the first quartile for a standard normal distribution.

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Fall 2011 Exam Score: /75. Exam 3

ECON 214 Elements of Statistics for Economists 2016/2017

INFERENTIAL STATISTICS REVISION

1 PMF and CDF Random Variable PMF and CDF... 4

Transcription:

BMI 713: Computational Statistics for Biomedical Sciences Assignment 2 1 Random variables and distributions 1. Assume that a die is fair, i.e. if the die is rolled once, the probability of getting each of the six numbers is 1/6. Calculate the probability of the following events. Rolling the die once, what is the probability of getting a number less than 3? Sol n. Let X be a random variable denoting the value of a die roll. particular die roll satisfies some criteria is: (# of die rolls that satisfy the criteria). (# of possible die rolls) The probability that one We can easily count both of these values. X < 3 means X = 1 or X = 2. So the number of die rolls satisfying X < 3 is 2. There are 6 possible die rolls. Hence P (X < 3) = 2/6 = 1/3. (Optional) Rolling the die twice, what is the probability that the sum of two rolling numbers is less than 3? Sol n. Let X and Y be random variables such that X is the value of the first die roll and Y is the value of the second. We wish to find the number of rolls where X + Y < 3. It is clear that X + Y < 3 occurs only when X = 1 and Y = 1. That is, only a single roll satisfies the criteria. There are 6 possible values for both X and Y, hence there are 6 6 = 36 possible rolls. It follows that the probability of rolling a sum less than 3 is (# of rolls with X + Y < 3) (total # of possible rolls) = 1 36. 2. Let p be the probability of obtaining a head when flipping a coin. Suppose that Bob flipped the coin n (n 1) times. Let X be the total number of head he obtained. What distribution does the random variable X follow? Is X a discrete or continuous random variable? Sol n. X follows the binomial distribution with n trials and probability of success p. This is a discrete distribution. What is the probability of X = k, i.e. what is P r(x = k) (0 k n)? (Write down the mathematical formula for calculating this probability.) Sol n. In general, the probability that a binomially distributed random variable X Bin(n, p) is equal to some value k is ( ) n P (X = k) = p k (1 p) n k. k What is the probability of X k, i.e. what is P r(x k)? Sol n. P (X k) = P (X = k) + P (X = k + 1) +... + P (X = n). Equivalently, P (X k) can be written as 1 P (X < k): P (X k) = 1 P (X < k) = 1 [P (X = 0) + P (X = 1) +... + P (X = k 1)]. 1

Suppose p = 0.4 and n =. Calculate the probabilities P r(x = 3) and P r(x 3). (You may need the functions dbinom and pbinom in R to calculate these two probabilities. Use?dbinom and?pbinom to get help information of these two functions). Sol n. Using the general formula above with n = and p = 0.4: ( ) P (X = 3) = 0.4 3 0.6 7 0.215. 3 Using R s dbinom: > dbinom(x=3, size=, prob=0.4) [1] 0.2149908 As shown above, we may compute P (X 3) either by summing the probabilities for X = 3, X = 4,..., X = or we could subtract the probabilities for X = 0, X = 1 and X = 2 from 1. The latter method is more convenient since it requires fewer terms: P (X 3) = 1 P (X < 3) = 1 P (X = 0) P (X = 1) P (X = 2) [( ) ] [( ) ] [( ) ] = 1 0.4 0 0.6 0.4 1 0.6 9 0.4 2 0.6 8 0 1 2 Again, using R: 1 0.006 0.040 0.121 0.833. > 1 - pbinom(q=2, size=, prob=0.4) [1] 0.83272 3. In a population of certain type of fish, the lengths of the individual fish follow a normal distribution. The mean length of the fish is 54.00 mm and the standard deviation is 4.50 mm. Answer the following questions. What percentage of the fish are less than 63 mm? Sol n. We know that the lengths of the fish follow a normal distribution with mean µ = 54 and standard deviation σ = 4.50. More succinctly: if X is a random variable denoting the length of a randomly selected fish, then: X N(54, 4.50 2 ). The solution we are searching for is then P (X < 63). We can compute this directly using R: > pnorm(q=63, mean=54, sd=4.5) [1] 0.9772499 However, the typical method for computing such a probability is to standardize the normal variable and then determine the probability from a table. That is, let Z = X 54 4.50. Then Z is a standard normal random variable: Z N(0, 1). We also know that: ( ) X 54 63 54 P (X < 63) = P < = P (Z < 2). 4.50 4.50 Since Z follows the standard normal distribution, we can look up the value for P (Z < 2). The table at: http://en.wikipedia.org/wiki/standard_normal_table#partial_table gives P (Z < 2) = 0.9772. What percentage of the fish are more than 50 mm? 2

Sol n. In this case, we wish to find P (X > 50). ( X 54 P (X > 50) = P > 4.50 ) 50 54 = P (Z > 8/9). 4.50 Since the standard normal distribution is symmetric around 0, we know that P (Z > 8/9) = P (Z < 8/9). Using a table we find: P (X > 50) = P (Z < 8/9) 0.811. In R, we may use the following commands to calculate the probability > 1-pnorm(50,mean=54,sd=4.5) [1] 0.8129686 or > pnorm(50, mean=54, sd=4.5, lower.tail=f) [1] 0.8129686 With the standardized value, we may use > pnorm(-8/9, lower.tail=f) [1] 0.8129686 Suppose that you randomly selected fish from the population. What is the probability that the average length of the fish is between 52 mm to 56 mm? Sol n. Let X 1, X 2,..., X denote the lengths of randomly selected fish. Then the average is given by: Y = 1 X i, which is itself a random variable. As we saw in lecture, Y approximately follows N(54, 4.502 ). Now we can compute the probability P (52 < Y < 56) = P (Y < 56) P (Y < 52) by standardizing. Let Z = and Y 54 4.50/ : P (Y < 56) = P P (Y < 52) = P ( Z < i=1 ) 56 54 4.50/ P (Z < 1.41) 0.9207. ( Z < ) 52 54 4.50/ = P (Z < 1.41). Again, we use symmetry to handle the negative value since it is often not available in tables: Finally, P (Z < 1.41) = P (Z > 1.41) = 1 P (Z < 1.41) 1 0.9207 = 0.0793. P (52 < Y < 56) = P (Y < 56) P (Y < 52) 0.9207 0.0793 = 0.8414. We may also calculate the probability using the following command in R: > pnorm(56, mean=54, sd=4.5/sqrt()) - pnorm(52, mean=54, sd=4.5/sqrt()) [1] 0.8401145 3

2 Hypothesis testing 1. In each of the following situations, state an appropriate null hypothesis H 0 and alternative hypothesis H 1. Be sure to identify the parameters that you use to state the hypotheses. (a) An experiment on learning in animals measures how long it takes a mouse to find its way through a maze. The mean time is 18 seconds for one particular maze. A researcher thinks that a loud noise will cause the mice to complete the maze slower. She measures how long each of mice takes with a noise as stimulus. Sol n. H 0 : The loud noise does not affect the speed of the mice, i.e. the mean completion time with noise is still 18s. H 1 : The loud noise causes the mice to complete the maze slower, i.e. the mean completion time with noise is greater than 18s. (b) A pharmaceutical company developed a new drug for certain type of cancer and the company believed that the drug can significantly increase patient s survival time after surgery. The mean survival time after surgery is 16 months. The company selected 20 volunteers who had the surgery, treated them with the drug and records their survival time. Sol n. H 0 : The mean survival time of the treated patients is 16 months. H 1 : The drug increases patients survival time, i.e. the mean survival time of the treated patients is greater than 16 months. 2. Determine if the following statements are true. (a) P-value is the probability that the null hypothesis is true. False. (b) P-value is the probability, computed assuming that H 0 is true, that the test statistics will take a value at least as extreme as that actually observed. True. (c) Standard normal distribution has fatter tail than student t-distribution. False. The t-distribution has a fatter tail. (d) When the sample size is small, t-test is more accurate than z-test. True. 3. The mean level of calcium in the blood in healthy young adults is about 9.5 milligrams per deciliter. A clinic in Boston measures the blood calcium level of healthy pregnant women as follows 9.09, 9.82, 9.58, 9.03,.48, 9.35, 9.85, 9.36, 9.64, 9.43. Is this an indication that the mean calcium level in the population from which these women come differs from 9.5? State the null hypothesis H 0 and and the alternative hypothesis H 1. Sol n. H 0 : The women are sampled from a population with mean blood calcium level 9.5. H 1 : The women are sampled from a population with mean blood calcium level greater or less than 9.5. Calculate the mean and standard deviation of the blood calcium level of the women. Sol n. The mean is: X = 9.09 + 9.82 + 9.58 + 9.03 +.48 + 9.35 + 9.85 + 9.36 + 9.64 + 9.43 = 9.563. Let X 1, X 2,..., X denote the blood calcium levels. Then the sample standard deviation is: s = 1 (X i 9.563) 1 2 = 0.422. i=1 4

Calculate the z-score and perform the z-test based on the z-score. What is the P-value? Sol n. As we have seen in lecture, the z-score is z = X µ 0 σ/ n, where µ 0 = 9.5 is the population mean, n = is the sample size and σ is the population standard deviation. However, we do not know σ. We use the sample standard deviation s as an estimate for σ. This gives us the z-score z = X µ 0 σ/ n = 9.563 9.5 0.422/ = 0.472. The z-score approximately follows standard normal distribution (where the approximation improves as the number of samples grows). Therefore, the p-value given by the z-test is P ( Z > z ) = P ( Z > 0.472) = P (Z > 0.472) + P (Z < 0.472). Using the standard normal distribution, we get and P (Z > 0.472) = 1 P (Z < 0.472) = 0.3184 P (Z < 0.472) = 0.3184. Therefore, the p-value is P ( Z > z) = 0.3184 + 0.3184 = 0.6369. Note that since the standard normal distribution is symmetric about 0, we have P (Z > z ) = P (Z < z ). In R, we may use the following command to compute the p-value > 2*pnorm(-0.472) [1] 0.6369268 If we choose a significance level of 0.05, we will fail to reject the null hypothesis H 0. Calculate the t-statistic and perform the t-test. What is the P-value? (Optional) What is the degree of freedom of this t-test? Sol n. If X denotes the sample mean, µ0 denotes the mean of the null hypothesis, s denotes the sample standard deviation and n is the number of samples, the t-statistic is given by t = X µ 0 s/ n = 9.563 9.5 0.422/ = 0.472. The probability of obtaining a more extreme t-statistic is P (T < 0.472) + P (T > 0.472), where T follows the t-distribution with 1 = 9 degrees of freedom. Using R, we can find these probabilities: # P(T < -0.472) > pt(-0.472, df=9) [1] 0.3240804 # P(T > 0.472) > 1 - pt(0.472, df=9) [1] 0.3240804 5

P (T < 0.472) + P (T > 0.472) 0.648. Thus, under the null hypothesis, the probability of selecting random individuals with mean blood calcium level more extreme than the observed mean 9.563 is roughly 65%. As with the z-test, we would not be able to reject the null hypothesis if we select a significance level of 0.05. To perform the t-test in R: > v = c(9.09, 9.82, 9.58, 9.03,.48, 9.35, 9.85, 9.36, 9.64,9.43) > t.test(v,mu=9.5,alternative=c("two.sided")) One Sample t-test data: v t = 0.4714, df = 9, p-value = 0.6486 alternative hypothesis: true mean is not equal to 9.5 95 percent confidence interval: 9.260663 9.865337 sample estimates: mean of x 9.563 6