Math 140 Introductory Statistics

Similar documents
Math 140 Introductory Statistics

Sampling Distributions Chapter 18

4.2 Probability Distributions

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

The Central Limit Theorem

Chapter 7 Study Guide: The Central Limit Theorem

Numerical Descriptive Measures. Measures of Center: Mean and Median

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

Sampling Distributions and the Central Limit Theorem

Chapter 1 Discussion Problem Solutions D1. D2. D3. D4. D5.

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

The Central Limit Theorem for Sample Means (Averages)

AP STATISTICS Name: Period: Review Unit VI Probability Models and Sampling Distributions

1. Variability in estimates and CLT

AP Statistics Mr. Tobar Summer Assignment Chapter 1 Questions. Date

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Comparing Estimators

CHAPTER 5 Sampling Distributions

AP STATISTICS Name: Period: Review Unit VI Probability Models and Sampling Distributions

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Descriptive Statistics (Devore Chapter One)

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 7: Point Estimation and Sampling Distributions

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Expected Value of a Random Variable

Distribution of the Sample Mean

5.7 Probability Distributions and Variance

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

CHAPTER 6 Random Variables

Chapter 17. The. Value Example. The Standard Error. Example The Short Cut. Classifying and Counting. Chapter 17. The.

23.1 Probability Distributions

STAT Chapter 6: Sampling Distributions

Lecture 5: Sampling Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Examples of continuous probability distributions: The normal and standard normal

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Sampling Distributions

Lecture 6: Chapter 6

Data Analysis and Statistical Methods Statistics 651

Essential Question: What is a probability distribution for a discrete random variable, and how can it be displayed?

I. Standard Error II. Standard Error III. Standard Error 2.54

STAT 201 Chapter 6. Distribution

Math 14, Homework 6.2 p. 337 # 3, 4, 9, 10, 15, 18, 19, 21, 22 Name

Data Analysis and Statistical Methods Statistics 651

5.1 Mean, Median, & Mode

The Assumption(s) of Normality

Midterm Exam III Review

Value (x) probability Example A-2: Construct a histogram for population Ψ.

Basics. STAT:5400 Computing in Statistics Simulation studies in statistics Lecture 9 September 21, 2016

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

1 Sampling Distributions

MATH 10 INTRODUCTORY STATISTICS

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Section 6.5. The Central Limit Theorem

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

The following content is provided under a Creative Commons license. Your support

22.2 Shape, Center, and Spread

YouGov March 14-16, 2017

and µ Asian male > " men

Describing Data: One Quantitative Variable

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

STAT 157 HW1 Solutions

AP Statistics Chapter 6 - Random Variables

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

The Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis

Math 140 Introductory Statistics. First midterm September

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Chapter 6: Random Variables

The Binomial Distribution

Data Analysis and Statistical Methods Statistics 651

Chapter 5: Statistical Inference (in General)

MAKING SENSE OF DATA Essentials series

Unit2: Probabilityanddistributions. 3. Normal distribution

3. Probability Distributions and Sampling

Chapter 6: Discrete Probability Distributions

CH 5 Normal Probability Distributions Properties of the Normal Distribution

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

SAMPLING DISTRIBUTIONS. Chapter 7

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Stat 139 Homework 2 Solutions, Fall 2016

Sampling Distributions

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Chapter 8 Estimation

Econ 300: Quantitative Methods in Economics. 11th Class 10/19/09

Some Characteristics of Data

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

Multinomial Coefficient : A Generalization of the Binomial Coefficient

MATH 3200 Exam 3 Dr. Syring

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

8.1 Estimation of the Mean and Proportion

Chapter 7. Random Variables: 7.1: Discrete and Continuous. Random Variables. 7.2: Means and Variances of. Random Variables

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Transcription:

Math 140 Introductory Statistics

Let s make our own sampling! If we use a random sample (a survey) or if we randomly assign treatments to subjects (an experiment) we can come up with proper, unbiased conclusions We should work with randomized data to avoid bias But HOW to produce, collect and analyze data?

7.1 Generating sampling distributions Generate sampling distributions and study: The sample mean The sample shape, the center, the spread How to draw proper conclusions about what is likely and what is rare. How to relate this to the entire population? How to do this in the easiest way?

Sample vs. Population Sampling size n Population Samples

Our friends at Westvaco Recall, the people laid off were 55, 55 and 64 is this discrimination or not? We need to compare with RANDOM layoffs of three people

Perform a random simulation All possible sets of 3 people chosen from 10 " 10% $ ' =120 # 3 & For each of these groups calculate average age and create a dot plot - this is your sampling distribution

Conclusions from Westvaco The average age of the people that were actually laid off was 58 Common sense: It is rather hard for this to happen by chance - Westvaco has some explaining to do

Generate a sample distribution Simulated sampling distribution: distribution of summary statistics obtained from taking repeated random samples. I. Take a random sample of a fixed size n from a population. II. Compute a summary statistic for this sample. III. Repeat steps I and II many times. IV. Display the distribution of the summary statistic.

We will often have access to samples, but not necessarily to the entire population (too big or inaccessible)

Our friends at the NBA These are the salaries of NBA players. The mean is $4.6 million and the SD is $4.7 million. Highly skewed THESE ARE POPULATION STATISTICS (EVERYBODY)

Our friends at the NBA Suppose this data was not public and I am an NBA player who wants to know the average salary of my colleagues. I can only access 10 people at random. How is the average I find different from the true average? Since the distribution is skewed, should I be concerned?

Lets simulate a sampling distribution Select random samples of 10 from our distribution Calculate average salary Repeat many times (200?) Place them in a chart THESE ARE SAMPLE STATISTICS

Average simulated salaries 200 simulations The distribution is approximately Normal Centered at about $4.6 million Equivalent of what we did for Westvaco! SD is about $1.5 million

Average simulated salaries From our 200 simulations The distribution is approximately normal and centered at about $4.6 million, the SD is about $1.5 million The mean of the entire population was $4.6 million and the SD was $4.7 million.

Recall properties of the normal distribution For us the mean is $4.6 million and the SD is $1.5 million We can be 95% sure that our sample mean is within 3 million from the population mean

Normal distribution We can be 95% sure ANY mean of 10 people we pick falls between $1.6 and $7.6 million and centered about $4.6 million $1.6 mil $7.6 mil $4.6 mil

Average simulated salaries We can be pretty confident that the selection of 10 people will give us a good idea about the average salary of NBA players We did not need to sample the entire population! The SD from our SAMPLING DISTRIBUTION is $1.5 million. The SD from our POPULATION DISTRIBUTION Is $4.7 million

Average simulated salaries The SD from our SAMPLING DISTRIBUTION is $1.5 million. This is called the STANDARD ERROR The SD from our POPULATION DISTRIBUTION Is $4.7 million This is called the POPULATION STANDARD DEVIATION

Definitions Values that lie in the middle 95% of a sampling distribution are called REASONABLY LIKELY EVENTS Values that lie in the left 2.5% and in the right 2.5% Sides of a sampling distribution are called RARE EVENTS

Let s compare population sample Would we be surprised to draw a player with an $3 million salary? What about $8 million salary? Would be surprised to draw 10 players with an average salary of $8 million?

Utah s national parks Create the sampling distribution for the total number of square miles in any 2 parks. Use all possible samples of 2 parks.

Utah s national parks

Utah s national parks How many possible ways of selecting 2 parks? We can only survey 600 square miles a year. What is the probability that we DO NOT finish the survey within the first year?

Utah s national parks We can use all possible combinations

Utah s national parks Probability we don t finish survey is 4/10

Sample and population means Any sample mean x Population mean µ Usually they are different, but OVER MANY SAMPLES they tend to be the same Also, THE LARGER THE SAMPLE SIZE the closer they will be

Estimator points Any sample mean x Population mean µ When we use a summary statistic derived from the sample, (such as the sample mean) as an estimate of the population statistic (such as the population mean) we call it an estimator point.

Desired estimator points The mean of the sampling distribution should be the same if you calculated the mean of the entire population unbiased Also it is desirable that as the sample size increases, The SD should decrease So that we have the most precision possible And the least standard error

Back to Utah Calculate mean and SD for all parks Then do the same for all 10 samples of 2 parks

Back to Utah At the end calculate the mean area for all your samples. Is this mean the same as for the initial distribution? If so, our sample mean an unbiased estimator. Now calculate the SD of the sampling distribution. Compare with the previous SD.

Back to Utah The SD should be smaller here (105.23) than for the entire population (171.85)

Back to Utah Sample size 1 Sample size 2 This means that the spread we have is less if we use Sample sizes of 2 than if we use sample sizes of 1. The mean is the same, no bias The spread is different

Concepts A simulated sample distribution is the distribution of a sample statistic (the mean) for a large number of repeated samples The sample distributions are best described by shape, center and spread Sampling distributions DO NOT necessarily have the same shape as the population from which they were taken

Concepts The SD of the sampling distribution is called the standard error If the sampling distribution is normal, reasonably likely outcomes are those that lie within 2 SD of the mean (95% of data)

P5 page 319 Estimate the range of Utah s national parks Range = Largest Area - Smallest Area Select 3 parks at random and calculate the range 1) What is the range of the entire POPULATION? 2) Make a table for the range of groups of 3 3) Place your values on a dot plot 4) What is the mean of the sample? 5) Is the sample range biased or unbiased?

Practice Page 321 P3, P4, P5, E1, E2, E3, E5, E6, E7, E10,