Chapter Four: Introduction To Inference 1/50

4.1 Introduction 2/50 4.1 Introduction In this chapter you will learn the rationale underlying inference. You will also learn to apply certain inferential techniques. The methods introduced in this chapter are not commonly employed in research but are important pedagogically. They are relatively simple and their mastery will open the way for understanding the more complex methods dealt with in following chapters.

4.1 Introduction 3/50 4.1 Introduction (continued) The techniques you will learn may be divided into two broad categories. 1 Tests of hypotheses. 2 Confidence Intervals. Before you can begin their study you must understand the concept of sampling distributions.

4.2 Sampling Distributions 4/50 4.2 Sampling Distributions: Definition A sampling distribution is a distribution of sample statistics obtained from samples repeatedly drawn from one or more populations.

4.2 Sampling Distributions 5/50 Sampling Distribution of x The sampling distribution of x can be formed by taking repeated samples from some population, calculating x for each sample, and forming the resultant sample means into a relative frequency distribution.

4.2 Sampling Distributions 6/50 Characteristics of the Sampling Dist. of x The following characteristics of the sampling distribution of x should be noted. 1 The mean of the sampling distribution is equal to the mean of the population from which the samples were drawn. 2 The mean of the sampling distribution of some statistic is referred to as the expected value of the statistic and is symbolized by E [] where [] contains an identifier of the statistic.

4.2 Sampling Distributions 7/50 Characteristics (continued) 3 E [ x] = µ. This is a restatement of characteristic (1) given above. 4 The standard deviation of the sampling distribution of x is termed the standard error of the mean and is symbolized by σ x. 5 σ x = σ n where σ is the population standard deviation and n is the sample size.

4.2 Sampling Distributions 8/50 Characteristics (continued) 6 When the population from which samples are drawn is normally distributed, the sampling distribution of x will also be normally distributed. 7 When the population from which samples are drawn is not normally distributed, the sampling distribution of x will approach normality as sample size (n) increases. This is an expression of the central limit theorem. 8 Roughly speaking, the central limit theorem states that the sampling distributions of certain classes of statistics will approach normality as sample size (n) increases regardless of the shape of the sampled population.

4.2 Sampling Distributions 9/50 Example Given a population with standard deviation 5.293, find the standard deviations of sampling distributions generated from this population when samples are of sizes 10, 30 and 50.

4.2 Sampling Distributions 10/50 Solution Using equation 4.1 on page 76, we calculate the standard errors of the mean for sample sizes 10, 30 and 50 as follows, 5.293 10 = 1.67. 5.293 30 =.97. 5.293 50 =.75.

4.2 Sampling Distributions 11/50 Using The Normal Curve Just as you used the normal curve model to estimate probabilities associated with the selection of a single observation from a population, so too you can use this model to estimate probabilities associated with the means of samples selected from a population.

4.2 Sampling Distributions 12/50 Z Score Z scores associated with sample means are calculated as follows. Z = x µ σ n

4.2 Sampling Distributions 13/50 Example Given a population with mean 110.023 and standard deviation 4.970, estimate the probability of randomly selecting a sample of 15 observations and finding that the mean of the sample is greater than 111.

4.2 Sampling Distributions 14/50 Solution The Z score for a mean of 111.0 is Z = 111.0 110.023 =.76 4.970 15 The associated tail area is.2236 which is our estimated probability.

4.2 Sampling Distributions 15/50 Example Suppose 100 observations are randomly selected from a population whose mean and standard deviation are respectively 100 and 20. What is the probability that the mean of these observations will be between 99 and 103?

4.2 Sampling Distributions 16/50 Solution The area of a normal curve with mean 100 and standard deviation 20/ 100 that lies between 99 and 103 is the sum of the areas between 99 and 100 and 100 and 103. The Z score and area between 99 and 100 are respectively, Z = 99.0 100.0 20.0 =.50 and.1915. 100 The same values for the area between 103 and 100 are Z = 103.0 100.0 20.0 = 1.50 and.4332. 100 The probability estimate is then.1915 +.4332 =.6247.

4.2 Sampling Distributions 17/50 Distribution Of ˆp A dichotomous population is made up of some dichotomous characteristic such as lived died, tumor remission no tumor remission, pain no pain etc. Traditionally, when speaking in a general sense, one of the two dichotomous outcomes is termed success and the other failure.

4.2 Sampling Distributions 18/50 Distribution Of ˆp (continued) If the members of the population with the characteristic success are assigned the number one and those with a failure characteristic a zero, then the mean of the population will be the sum of the ones and zeros divided by the total number of observations in the population which is also the proportion of successes in the population.

4.2 Sampling Distributions 19/50 Distribution Of ˆp (continued) We designate the proportion of successes in the population as π and the proportion in a sample drawn from the population as ˆp.

4.2 Sampling Distributions 20/50 Distribution Of ˆp (continued) It can be shown that the standard deviation of the sampling distribution of ˆp, termed the standard error of ˆp is given by σˆp = π (1 π) n

4.2 Sampling Distributions 21/50 Example Given a dichotomous population where the proportion of successes is.10, find the standard deviation of the sampling distribution of ˆp if sample size is 5. Recalculate the standard error assuming samples of size 50.

4.2 Sampling Distributions 22/50 Solution The standard error of ˆp for samples of size 5 is σˆp = π (1 π) (.10) (.90) = n 5 The standard error of ˆp for samples of size 50 is =.134 σˆp = (.10) (.90) 50 =.042.

4.2 Sampling Distributions 23/50 The Binomial Distribution If the population is large and certain other conditions are met, the binomial distribution can be used to model the sampling distribution of ˆp.

4.2 Sampling Distributions 24/50 The Binomial Distribution (continued) The binomial distribution is generated by the equation P (y) = n! y! (n y)! πy (1 π) n y where P(y) is the probability of y successes in a sample of size n taken from a population where the proportion of successes is π.

4.2 Sampling Distributions 25/50 Example Calculate the sampling distribution of ˆp for samples of size 5 drawn from a population in which the proportion of successes is.10.

4.2 Sampling Distributions 26/50 Solution P (0) = 5! 0! (5 0)!.100 (1.10) 5 0 = 5! 0! 5!.100.90 5 =.90 5 =.59049.

Solution (continued) P (1) = 5! 1! (5 1)!.101 (1.10) 5 1 = 5 4! 1! 4!.101.90 4 = (5) (.10) (.6561) =.32805. 4.2 Sampling Distributions 27/50

4.2 Sampling Distributions 28/50 Solution (continued) P (2) = 5! 2! (5 2)!.102 (1.10) 5 2 = 5 4 3!.10 2.90 3 2! 3! = (10) (.01) (.729) =.0729

4.2 Sampling Distributions 29/50 Solution (continued) P (3) = 5! 3! (5 3)!.103 (1.10) 5 3 = 5 4 3 2!.10 3.90 2 3! 2! = (10) (.001) (.81) =.0081.

Solution (continued) P (4) = 5! 4! (5 4)!.104 (1.10) 5 4 = 5 4! 4! 1!.104.90 1 = (5) (.0001) (.90) =.00045. 4.2 Sampling Distributions 30/50

4.2 Sampling Distributions 31/50 Solution (continued) P (5) = 5! 5! (5 5)!.105 (1.10) 5 5 = 5! 5! 0!.105.90 0 =.10 5 =.00001.

Solution (continued) Table: Sampling distributions of ˆp for n = 5 and π =.10. Number of Proportion Successes Probability ˆp y P (y).00 0.59049.20 1.32805.40 2.07290.60 3.00810.80 4.00045 1.00 5.00001 4.2 Sampling Distributions 32/50

4.2 Sampling Distributions 33/50 Example Given that 10% of the residents of the United States would test positive for a certain antibody, what is the probability of randomly selecting five residents of the United States and finding that all five test positive for the antibody? at least four (i.e., four or more) will test positive? at least one will be positive?

Solution Number of Proportion Successes Probability ˆp y P (y).00 0.59049.20 1.32805.40 2.07290.60 3.00810.80 4.00045 1.00 5.00001 The probability that all five residents test positive is P (5) =.00001. The probability that at least four test positive is P (4) + P (5) =.00045 +.00001 =.00046 The probability that at least one tests positive is P (1)+P (2)+P (3)+P (4)+P (5) = 1 P (0) = 1.59049 =.40951. 4.2 Sampling Distributions 34/50

4.2 Sampling Distributions 35/50 Example A researcher believes that the proportion of blood donors in Iceland with type O positive blood is greater than.38 which is the proportion in the US. f the researcher assesses the blood types of 10 randomly selected donors in Iceland, what is the probability that 9 or 10 of the selected donors will have this blood type if the proportion is.38? If the number of subjects with type O positive blood is in fact 9 or 10, what implications would this have for the researcher s belief?

4.2 Sampling Distributions 36/50 Solution Given a population proportion of.38, the probability that the sample will contain 9 or 10 donors with type O positive blood is P (9) + P (10).

Solution (continued) P (9) = 10! 9! (10 9)!.389 (1.38) 10 9 = 10 9! 9! 1!.389.62 1 = (10) (.00017) (.62) =.00105 P (10) =.38 10 =.00006. 4.2 Sampling Distributions 37/50

4.2 Sampling Distributions 38/50 Solution (continued) The probability that 9 or 10 donors in the sample will have type O positive blood is then 0.00105 + 0.00006 = 0.00111. If the number of donors in the sample with type O positive blood is 9 or 10 the researcher s theory is supported because the probability of achieving such a result from a population where the proportion is.38 is so small. It is likely, though not proven, that the proportion of type O positives in the Islandic blood donor population is greater than.38.

4.2 Sampling Distributions 39/50 Normal Curve Approximation When sample size is sufficiently large, the normal curve can be used to approximate the sampling distribution of ˆp. The question as to how large a sample must be in order to obtain an adequate approximation cannot be answered definitively. An often used rule of thumb states that the normal curve approximation will be satisfactory so long as both nπ and n (1 π) are greater than or equal to five though some authors maintain that these values should be greater than or equal to 10.

4.2 Sampling Distributions 40/50 Normal Curve Approximation (continued) The normal curve model is used to approximate probabilities associated with the distribution of ˆp by means of the following equation. Z = ˆp π π(1 π) n where ˆp is the sample proportion of successes, π is the population proportion and n is the sample size.

4.2 Sampling Distributions 41/50 Example Suppose a random sample of 50 observations is taken from a dichotomous population in which the proportion of successes is.10. What is the probability that the proportion of successes in the sample will be greater than.12?

4.2 Sampling Distributions 42/50 Solution The estimated probability will be the area under a normal curve with mean.10 that lies above.13. Because the proportion of successes can only take values.00,.02,.04,...,.12,.14,..., 1.00, the upper real limit of the.12 interval (i.e.,.13) is used rather than.12. The upper limit is employed because the problem is to find the probability that the proportion of successes is greater than.12. The lower limit would have been used if the problem required the probability of obtaining a proportion of.12 or greater.

4.2 Sampling Distributions 43/50 Solution (continued) Upper and lower real limits of binomial proportions can be computed directly by adding and subtracting.5/n. For the present case the upper real limit is.12 +.5/50 =.13. Using upper and lower limits in this fashion when using a continuous distribution to approximate probabilities associated with a discrete variable is an example of what is referred to as a continuity correction

4.2 Sampling Distributions 44/50 Solution (continued) We now wcalculate Z = ˆp π π(1 π) n =.13.10 (.10)(.90) 50 =.03.0424 =.71. Reference to the normal curve table in Appendix A gives an associated area of.2389. The value as calculated by the binomial method is P (7) + P (8) + + P (50) =.2298.

4.2 Sampling Distributions 45/50 Example Suppose a random sample of 50 observations is taken from a dichotomous population in which the proportion of successes is.10. What is the probability that the proportion of successes in the sample will be.12?

4.2 Sampling Distributions 46/50 Solution The estimate will be the area between the lower real limit of.11 and the upper real limit of.13. As calculated previously, the Z score for.13 is.71 while that for.11 is Z =.11.10 (.10)(.90) 50 =.01.0424 =.24. Using these values in the normal curve table shows that the areas between.13 and.10 and.11 and.10 are.2611 and.0948 respectively. The area between.11 and.13 is then.2611.0948 =.1663.

4.2 Sampling Distributions 47/50 Example Approximately 16 percent of men in the United States aged 60 to 64 who exhibit a particular risk profile will have a heart attack in the next 10 years. If a random sample of 300 such men are observed over the next 10 years, what is the probability that less than 5% will experience a heart attack?

4.2 Sampling Distributions 48/50 Solution Because the problem specifies that less than 5% will experience a heart attack, the lower real limit of the five percent interval will be used. This limit is.05.5/300 =.048. The Z score is then Z =.048.16 (.16)(.84) 300 =.112.021 = 5.33. The normal curve table does not contain Z values of this magnitude but it can be safely concluded that the probability is less than.0002. (This is the tail area associated with Z = 3.50 which is the most extreme score in the table.)

4.2 Sampling Distributions 49/50 Example Suppose it is believed that a large community is evenly divided in its opinion as to whether a cap should be placed on the amount that can be recovered in medical malpractice law suits. If this supposition is correct, what is the probability that a random poll of 200 community members will produce 55 percent or more favorable responses? Compute the probability with and without continuity correction.

4.2 Sampling Distributions 50/50 Solution The continuity correction is.5/200 =.0025. Because the task is to find the probability that 55 percent or more will be favorable, the lower real limit of the 55 percent category or.55.0025 =.5475 will be used. Because the community is assumed evenly divided, the proportion favorable in the population is taken to be.50. The Z score is then Z =.5475.50 (.50)(.50) 200 The area above 1.36 is.0869. =.0475.035 = 1.36. Without continuity correction the Z score is.05/.035 = 1.43 which has an upper tail area of.0764. The probability as computed by the binomial equation is.0895.