Chapter Four: Introduction To Inference 1/50

Similar documents
Statistics 13 Elementary Statistics

The normal distribution is a theoretical model derived mathematically and not empirically.

Making Sense of Cents

1 Sampling Distributions

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Introduction to Statistical Data Analysis II

CH 5 Normal Probability Distributions Properties of the Normal Distribution

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

Stat 213: Intro to Statistics 9 Central Limit Theorem

MidTerm 1) Find the following (round off to one decimal place):

ECON 214 Elements of Statistics for Economists

Chapter 7. Sampling Distributions

Chapter 9: Sampling Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

Central Limit Theorem

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

AMS7: WEEK 4. CLASS 3

Chapter 7 1. Random Variables

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Distribution of the Sample Mean

Lecture 6: Chapter 6

Normal Curves & Sampling Distributions

Chapter 5. Sampling Distributions

Data Analysis and Statistical Methods Statistics 651

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture 9. Probability Distributions

Estimation Y 3. Confidence intervals I, Feb 11,

Section 7-2 Estimating a Population Proportion

Statistics for Business and Economics: Random Variables:Continuous

Statistics for Managers Using Microsoft Excel 7 th Edition

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Chapter 8 Estimation

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Normal Probability Distributions

Section The Sampling Distribution of a Sample Mean

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

The Normal Distribution

Review. Preview This chapter presents the beginning of inferential statistics. October 25, S7.1 2_3 Estimating a Population Proportion

Lecture 9. Probability Distributions. Outline. Outline

Statistics Class 15 3/21/2012

χ 2 distributions and confidence intervals for population variance

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Review: Population, sample, and sampling distributions

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

STA215 Confidence Intervals for Proportions

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

Section 6.5. The Central Limit Theorem

Theoretical Foundations

Statistics 6 th Edition

Confidence Intervals and Sample Size

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

Sampling & Confidence Intervals

STA 320 Fall Thursday, Dec 5. Sampling Distribution. STA Fall

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

8.1 Estimation of the Mean and Proportion

Homework: (Due Wed) Chapter 10: #5, 22, 42

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

CHAPTER 5 SAMPLING DISTRIBUTIONS

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

Section 7.2. Estimating a Population Proportion

The Central Limit Theorem

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Estimation. Focus Points 10/11/2011. Estimating p in the Binomial Distribution. Section 7.3

Sampling and sampling distribution

Section Introduction to Normal Distributions

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Math 227 Elementary Statistics. Bluman 5 th edition

Expected Value of a Random Variable

Sampling Distributions and the Central Limit Theorem

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Confidence Interval and Hypothesis Testing: Exercises and Solutions

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

6 Central Limit Theorem. (Chs 6.4, 6.5)

The Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis

Chapter Seven. The Normal Distribution

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Midterm Test 1 (Sample) Student Name (PRINT):... Student Signature:... Use pencil, so that you can erase and rewrite if necessary.

Commonly Used Distributions

The Central Limit Theorem

Statistics 251: Statistical Methods Sampling Distributions Module

Continuous Probability Distributions & Normal Distribution

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

The Normal Probability Distribution

Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions:

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

MTH 245: Mathematics for Management, Life, and Social Sciences

INFERENTIAL STATISTICS REVISION

4.2 Probability Distributions

The Binomial Probability Distribution

Elementary Statistics Lecture 5

Chapter 5 Basic Probability

Discrete Probability Distribution

Nonparametric Statistics Notes

Probability. An intro for calculus students P= Figure 1: A normal integral

Transcription:

Chapter Four: Introduction To Inference 1/50

4.1 Introduction 2/50 4.1 Introduction In this chapter you will learn the rationale underlying inference. You will also learn to apply certain inferential techniques. The methods introduced in this chapter are not commonly employed in research but are important pedagogically. They are relatively simple and their mastery will open the way for understanding the more complex methods dealt with in following chapters.

4.1 Introduction 3/50 4.1 Introduction (continued) The techniques you will learn may be divided into two broad categories. 1 Tests of hypotheses. 2 Confidence Intervals. Before you can begin their study you must understand the concept of sampling distributions.

4.2 Sampling Distributions 4/50 4.2 Sampling Distributions: Definition A sampling distribution is a distribution of sample statistics obtained from samples repeatedly drawn from one or more populations.

4.2 Sampling Distributions 5/50 Sampling Distribution of x The sampling distribution of x can be formed by taking repeated samples from some population, calculating x for each sample, and forming the resultant sample means into a relative frequency distribution.

4.2 Sampling Distributions 6/50 Characteristics of the Sampling Dist. of x The following characteristics of the sampling distribution of x should be noted. 1 The mean of the sampling distribution is equal to the mean of the population from which the samples were drawn. 2 The mean of the sampling distribution of some statistic is referred to as the expected value of the statistic and is symbolized by E [] where [] contains an identifier of the statistic.

4.2 Sampling Distributions 7/50 Characteristics (continued) 3 E [ x] = µ. This is a restatement of characteristic (1) given above. 4 The standard deviation of the sampling distribution of x is termed the standard error of the mean and is symbolized by σ x. 5 σ x = σ n where σ is the population standard deviation and n is the sample size.

4.2 Sampling Distributions 8/50 Characteristics (continued) 6 When the population from which samples are drawn is normally distributed, the sampling distribution of x will also be normally distributed. 7 When the population from which samples are drawn is not normally distributed, the sampling distribution of x will approach normality as sample size (n) increases. This is an expression of the central limit theorem. 8 Roughly speaking, the central limit theorem states that the sampling distributions of certain classes of statistics will approach normality as sample size (n) increases regardless of the shape of the sampled population.

4.2 Sampling Distributions 9/50 Example Given a population with standard deviation 5.293, find the standard deviations of sampling distributions generated from this population when samples are of sizes 10, 30 and 50.

4.2 Sampling Distributions 10/50 Solution Using equation 4.1 on page 76, we calculate the standard errors of the mean for sample sizes 10, 30 and 50 as follows, 5.293 10 = 1.67. 5.293 30 =.97. 5.293 50 =.75.

4.2 Sampling Distributions 11/50 Using The Normal Curve Just as you used the normal curve model to estimate probabilities associated with the selection of a single observation from a population, so too you can use this model to estimate probabilities associated with the means of samples selected from a population.

4.2 Sampling Distributions 12/50 Z Score Z scores associated with sample means are calculated as follows. Z = x µ σ n

4.2 Sampling Distributions 13/50 Example Given a population with mean 110.023 and standard deviation 4.970, estimate the probability of randomly selecting a sample of 15 observations and finding that the mean of the sample is greater than 111.

4.2 Sampling Distributions 14/50 Solution The Z score for a mean of 111.0 is Z = 111.0 110.023 =.76 4.970 15 The associated tail area is.2236 which is our estimated probability.

4.2 Sampling Distributions 15/50 Example Suppose 100 observations are randomly selected from a population whose mean and standard deviation are respectively 100 and 20. What is the probability that the mean of these observations will be between 99 and 103?

4.2 Sampling Distributions 16/50 Solution The area of a normal curve with mean 100 and standard deviation 20/ 100 that lies between 99 and 103 is the sum of the areas between 99 and 100 and 100 and 103. The Z score and area between 99 and 100 are respectively, Z = 99.0 100.0 20.0 =.50 and.1915. 100 The same values for the area between 103 and 100 are Z = 103.0 100.0 20.0 = 1.50 and.4332. 100 The probability estimate is then.1915 +.4332 =.6247.

4.2 Sampling Distributions 17/50 Distribution Of ˆp A dichotomous population is made up of some dichotomous characteristic such as lived died, tumor remission no tumor remission, pain no pain etc. Traditionally, when speaking in a general sense, one of the two dichotomous outcomes is termed success and the other failure.

4.2 Sampling Distributions 18/50 Distribution Of ˆp (continued) If the members of the population with the characteristic success are assigned the number one and those with a failure characteristic a zero, then the mean of the population will be the sum of the ones and zeros divided by the total number of observations in the population which is also the proportion of successes in the population.

4.2 Sampling Distributions 19/50 Distribution Of ˆp (continued) We designate the proportion of successes in the population as π and the proportion in a sample drawn from the population as ˆp.

4.2 Sampling Distributions 20/50 Distribution Of ˆp (continued) It can be shown that the standard deviation of the sampling distribution of ˆp, termed the standard error of ˆp is given by σˆp = π (1 π) n

4.2 Sampling Distributions 21/50 Example Given a dichotomous population where the proportion of successes is.10, find the standard deviation of the sampling distribution of ˆp if sample size is 5. Recalculate the standard error assuming samples of size 50.

4.2 Sampling Distributions 22/50 Solution The standard error of ˆp for samples of size 5 is σˆp = π (1 π) (.10) (.90) = n 5 The standard error of ˆp for samples of size 50 is =.134 σˆp = (.10) (.90) 50 =.042.

4.2 Sampling Distributions 23/50 The Binomial Distribution If the population is large and certain other conditions are met, the binomial distribution can be used to model the sampling distribution of ˆp.

4.2 Sampling Distributions 24/50 The Binomial Distribution (continued) The binomial distribution is generated by the equation P (y) = n! y! (n y)! πy (1 π) n y where P(y) is the probability of y successes in a sample of size n taken from a population where the proportion of successes is π.

4.2 Sampling Distributions 25/50 Example Calculate the sampling distribution of ˆp for samples of size 5 drawn from a population in which the proportion of successes is.10.

4.2 Sampling Distributions 26/50 Solution P (0) = 5! 0! (5 0)!.100 (1.10) 5 0 = 5! 0! 5!.100.90 5 =.90 5 =.59049.

Solution (continued) P (1) = 5! 1! (5 1)!.101 (1.10) 5 1 = 5 4! 1! 4!.101.90 4 = (5) (.10) (.6561) =.32805. 4.2 Sampling Distributions 27/50

4.2 Sampling Distributions 28/50 Solution (continued) P (2) = 5! 2! (5 2)!.102 (1.10) 5 2 = 5 4 3!.10 2.90 3 2! 3! = (10) (.01) (.729) =.0729

4.2 Sampling Distributions 29/50 Solution (continued) P (3) = 5! 3! (5 3)!.103 (1.10) 5 3 = 5 4 3 2!.10 3.90 2 3! 2! = (10) (.001) (.81) =.0081.

Solution (continued) P (4) = 5! 4! (5 4)!.104 (1.10) 5 4 = 5 4! 4! 1!.104.90 1 = (5) (.0001) (.90) =.00045. 4.2 Sampling Distributions 30/50

4.2 Sampling Distributions 31/50 Solution (continued) P (5) = 5! 5! (5 5)!.105 (1.10) 5 5 = 5! 5! 0!.105.90 0 =.10 5 =.00001.

Solution (continued) Table: Sampling distributions of ˆp for n = 5 and π =.10. Number of Proportion Successes Probability ˆp y P (y).00 0.59049.20 1.32805.40 2.07290.60 3.00810.80 4.00045 1.00 5.00001 4.2 Sampling Distributions 32/50

4.2 Sampling Distributions 33/50 Example Given that 10% of the residents of the United States would test positive for a certain antibody, what is the probability of randomly selecting five residents of the United States and finding that all five test positive for the antibody? at least four (i.e., four or more) will test positive? at least one will be positive?

Solution Number of Proportion Successes Probability ˆp y P (y).00 0.59049.20 1.32805.40 2.07290.60 3.00810.80 4.00045 1.00 5.00001 The probability that all five residents test positive is P (5) =.00001. The probability that at least four test positive is P (4) + P (5) =.00045 +.00001 =.00046 The probability that at least one tests positive is P (1)+P (2)+P (3)+P (4)+P (5) = 1 P (0) = 1.59049 =.40951. 4.2 Sampling Distributions 34/50

4.2 Sampling Distributions 35/50 Example A researcher believes that the proportion of blood donors in Iceland with type O positive blood is greater than.38 which is the proportion in the US. f the researcher assesses the blood types of 10 randomly selected donors in Iceland, what is the probability that 9 or 10 of the selected donors will have this blood type if the proportion is.38? If the number of subjects with type O positive blood is in fact 9 or 10, what implications would this have for the researcher s belief?

4.2 Sampling Distributions 36/50 Solution Given a population proportion of.38, the probability that the sample will contain 9 or 10 donors with type O positive blood is P (9) + P (10).

Solution (continued) P (9) = 10! 9! (10 9)!.389 (1.38) 10 9 = 10 9! 9! 1!.389.62 1 = (10) (.00017) (.62) =.00105 P (10) =.38 10 =.00006. 4.2 Sampling Distributions 37/50

4.2 Sampling Distributions 38/50 Solution (continued) The probability that 9 or 10 donors in the sample will have type O positive blood is then 0.00105 + 0.00006 = 0.00111. If the number of donors in the sample with type O positive blood is 9 or 10 the researcher s theory is supported because the probability of achieving such a result from a population where the proportion is.38 is so small. It is likely, though not proven, that the proportion of type O positives in the Islandic blood donor population is greater than.38.

4.2 Sampling Distributions 39/50 Normal Curve Approximation When sample size is sufficiently large, the normal curve can be used to approximate the sampling distribution of ˆp. The question as to how large a sample must be in order to obtain an adequate approximation cannot be answered definitively. An often used rule of thumb states that the normal curve approximation will be satisfactory so long as both nπ and n (1 π) are greater than or equal to five though some authors maintain that these values should be greater than or equal to 10.

4.2 Sampling Distributions 40/50 Normal Curve Approximation (continued) The normal curve model is used to approximate probabilities associated with the distribution of ˆp by means of the following equation. Z = ˆp π π(1 π) n where ˆp is the sample proportion of successes, π is the population proportion and n is the sample size.

4.2 Sampling Distributions 41/50 Example Suppose a random sample of 50 observations is taken from a dichotomous population in which the proportion of successes is.10. What is the probability that the proportion of successes in the sample will be greater than.12?

4.2 Sampling Distributions 42/50 Solution The estimated probability will be the area under a normal curve with mean.10 that lies above.13. Because the proportion of successes can only take values.00,.02,.04,...,.12,.14,..., 1.00, the upper real limit of the.12 interval (i.e.,.13) is used rather than.12. The upper limit is employed because the problem is to find the probability that the proportion of successes is greater than.12. The lower limit would have been used if the problem required the probability of obtaining a proportion of.12 or greater.

4.2 Sampling Distributions 43/50 Solution (continued) Upper and lower real limits of binomial proportions can be computed directly by adding and subtracting.5/n. For the present case the upper real limit is.12 +.5/50 =.13. Using upper and lower limits in this fashion when using a continuous distribution to approximate probabilities associated with a discrete variable is an example of what is referred to as a continuity correction

4.2 Sampling Distributions 44/50 Solution (continued) We now wcalculate Z = ˆp π π(1 π) n =.13.10 (.10)(.90) 50 =.03.0424 =.71. Reference to the normal curve table in Appendix A gives an associated area of.2389. The value as calculated by the binomial method is P (7) + P (8) + + P (50) =.2298.

4.2 Sampling Distributions 45/50 Example Suppose a random sample of 50 observations is taken from a dichotomous population in which the proportion of successes is.10. What is the probability that the proportion of successes in the sample will be.12?

4.2 Sampling Distributions 46/50 Solution The estimate will be the area between the lower real limit of.11 and the upper real limit of.13. As calculated previously, the Z score for.13 is.71 while that for.11 is Z =.11.10 (.10)(.90) 50 =.01.0424 =.24. Using these values in the normal curve table shows that the areas between.13 and.10 and.11 and.10 are.2611 and.0948 respectively. The area between.11 and.13 is then.2611.0948 =.1663.

4.2 Sampling Distributions 47/50 Example Approximately 16 percent of men in the United States aged 60 to 64 who exhibit a particular risk profile will have a heart attack in the next 10 years. If a random sample of 300 such men are observed over the next 10 years, what is the probability that less than 5% will experience a heart attack?

4.2 Sampling Distributions 48/50 Solution Because the problem specifies that less than 5% will experience a heart attack, the lower real limit of the five percent interval will be used. This limit is.05.5/300 =.048. The Z score is then Z =.048.16 (.16)(.84) 300 =.112.021 = 5.33. The normal curve table does not contain Z values of this magnitude but it can be safely concluded that the probability is less than.0002. (This is the tail area associated with Z = 3.50 which is the most extreme score in the table.)

4.2 Sampling Distributions 49/50 Example Suppose it is believed that a large community is evenly divided in its opinion as to whether a cap should be placed on the amount that can be recovered in medical malpractice law suits. If this supposition is correct, what is the probability that a random poll of 200 community members will produce 55 percent or more favorable responses? Compute the probability with and without continuity correction.

4.2 Sampling Distributions 50/50 Solution The continuity correction is.5/200 =.0025. Because the task is to find the probability that 55 percent or more will be favorable, the lower real limit of the 55 percent category or.55.0025 =.5475 will be used. Because the community is assumed evenly divided, the proportion favorable in the population is taken to be.50. The Z score is then Z =.5475.50 (.50)(.50) 200 The area above 1.36 is.0869. =.0475.035 = 1.36. Without continuity correction the Z score is.05/.035 = 1.43 which has an upper tail area of.0764. The probability as computed by the binomial equation is.0895.