Lecture 5: Sampling Distributions Taeyong Park Washington University in St. Louis February 15, 2017 Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 1 / 23
Today... Review of normal distributions and the standard normal distribution. Sampling distribution. Lab: Review the online assignment; generating random numbers; normal distribution; central limit theorem. Problem set 1 will be assigned after class. Covers lecture 1 - lecture 5. Due the beginning of next class. A hard copy for the first part and upload.r file to Blackborad for the second part. Weekly online assignment will also be assigend. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 2 / 23
Normal Distribution For a normal distribution with µ = 45 and σ = 5, find the probability that an observation falls: Above the value of 35 Below the value of 40 Between the values of 45 and 55 Between the values of 30 and 40 Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 3 / 23
Normal Distribution - Z-score z-score How many standard deviations is my value away from the mean? z = y µ σ Example: z = Example: z = 56 50 = 2 3 44 40 = 2 2 Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 4 / 23
The Standard Normal Distribution and Z-score The standard normal distribution The normal distribution with mean µ = 0 and standard deviation σ = 1. z-score and the standard normal distribution If a variable has a normal distribution, and if its values are converted to z-scores by subtracting the mean and dividing by the standard deviation, then the z-scores have the standard normal distribution. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 5 / 23
z-table Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 6 / 23
Learning goals Sampling Distribution Central Limit Theorem Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 7 / 23
Probability: Why do we care? Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 8 / 23
Makinga prediction using a sample data Population sample?: Straightforward We know that 50% of the entire Californian voters (7 mil.) vote for Republican. What will be the proportion of voting for Republican in the sample of 2,705? Sample population? We don t know about the entire Californian voters. What will be the proportion of voting for Republican among the entire Californian voters given a result from the sample of 2,705? Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 9 / 23
Makinga prediction using a sample data Sample population? We don t know about the entire Californian voters. What will be the proportion of voting for Republican among the entire Californian voters given a result from the sample of 2,705 (56.5%)? Instead, Suppose only half the population voted for S. Would it then be surprising that 56.5% of the sampled individuals voted for him? If very unlikely, we can infer that S will win. What if we suppose only 40% voted for S? Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 10 / 23
Three types of distributions Population distribution Sample data distribution Sampling distribution Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 11 / 23
Population distribution: Example The distribution from which we select the sample. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 12 / 23
Sample data distribution: Example The distrubution of data we actually observe. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 13 / 23
Sampling distribution Sampling Distributions A sampling distribution is the distribution of a statistic given repeated sampling. Repeated sampling probabilities for the possible values the statistic can take. Example: a sampling distribution of a sample mean. A sampling distribution specifies probabilities not for individual observations but for possible values of a statistic computed from the observations. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 14 / 23
Sampling distribution: Example An example of a sampling distribution? Population: American voters Several surveys - several samples Statistic: proportion of respondents that voted for Obama Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 15 / 23
Sampling distribution: Example Density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 47 48 49 50 51 52 53 Percentage of Obama Voters Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 16 / 23
Sampling distribution: What is the point? In most cases we do not have several samples. However... The form of sampling distributions if often known theoretically. We can then derive a distribution of the sample statistics for one sample of the given size n. This allows us to make inferences about population parameters. The sample mean is the most frequently used statistic. We derive the sampling distribution of the sample mean to make inferences, assuming repeated sampling. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 17 / 23
Sampling distribution of the sample mean The sample mean of sample data y = {y 1,y 2,...,y n }: y. Assuming repeated sampling: Mean of y y1 = {y11,y 12,...,y 1n } y2 = {y21,y 22,...,y 2n }... yk = {yk1,y k2,...,y kn } The mean of sampling distribution of y equals the population mean given repeated sampling. Standard error of y The standard deviation of sampling distribution of y, denoted by σ y. σ y describes how y varies from sample to sample. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 18 / 23
Sampling distribution of the sample mean In practice, we don t need to take samples repeatedly to find σ y. Instead, use the following formular: σ y = σ n, where σ is the population standard deviation and n is the sample size. Suppose a population having σ = 10 and a sample size of 100. σ y = 10 10 = 1. Individual observations tend to vary much more than sample means vary from sample to sample. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 19 / 23
Central limit theorem Central limit theorem For random sampling with a large sample size n, the sampling distribution of the sample mean y is approximately normal. The mean of the distribution is equal to population mean µ. The standard deviation of the distribution is equal to σ n. ) y N (µ, σ n Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 20 / 23
Central limit theorem: Some notes The approximate normality of the sampling distribution applies no matter what the shape of the population distribution. Remarkable! Even if the population distribution is U-shaped, highly discrete, or highly skewed. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 21 / 23
It works for EVERYTHING Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 22 / 23
Central limit theorem: Some notes If Y is normal, the CLT applies for all n. Otherwise, you need a large enough sample. Usually n=30 is good enough, but it will depend on the distribution. As n, the standard error is going to get smaller and smaller. Knowing that the sampling distribution of y is approximately normal helps us find probabiliteis for possible values of y. For instance, y almost certainly falls within 3σ y = 3σ n of µ. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 15, 2017 23 / 23