Math 140 Introductory Statistics

Similar documents
Math 140 Introductory Statistics

Mount Olive High School Summer Assignment for AP Statistics

Chapter 1 Discussion Problem Solutions D1. D2. D3. D4. D5.

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

AP Statistics Mr. Tobar Summer Assignment Chapter 1 Questions. Date

STAT 157 HW1 Solutions

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Unit 2 Statistics of One Variable

STAT 113 Variability

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 15: Sampling distributions

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

BIOL The Normal Distribution and the Central Limit Theorem

Central Limit Theorem

Math 140 Introductory Statistics. First midterm September

Lecture 2 Describing Data

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

STAB22 section 1.3 and Chapter 1 exercises

CHAPTER 2 Describing Data: Numerical

Numerical Descriptions of Data

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Describing Data: One Quantitative Variable

Normal Model (Part 1)

NOTES: Chapter 4 Describing Data

Part V - Chance Variability

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.

Lectures delivered by Prof.K.K.Achary, YRC

Review. What is the probability of throwing two 6s in a row with a fair die? a) b) c) d) 0.333

Some Characteristics of Data

Variance, Standard Deviation Counting Techniques

Statistics 13 Elementary Statistics

Descriptive Statistics (Devore Chapter One)

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

The normal distribution is a theoretical model derived mathematically and not empirically.

Normal Approximation to Binomial Distributions

David Tenenbaum GEOG 090 UNC-CH Spring 2005

appstats5.notebook September 07, 2016 Chapter 5

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Lecture 9. Probability Distributions. Outline. Outline

2 Exploring Univariate Data

starting on 5/1/1953 up until 2/1/2017.

Lecture 9. Probability Distributions

The Assumption(s) of Normality

Skewness and the Mean, Median, and Mode *

Simple Descriptive Statistics

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Lecture 1: Review and Exploratory Data Analysis (EDA)

22.2 Shape, Center, and Spread

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 5 The Standard Deviation as a Ruler and the Normal Model

Frequency Distribution and Summary Statistics

Terms & Characteristics

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Chapter 4-Describing Data: Displaying and Exploring Data

Numerical Descriptive Measures. Measures of Center: Mean and Median

CABARRUS COUNTY 2008 APPRAISAL MANUAL

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Moments and Measures of Skewness and Kurtosis

SOLUTIONS TO THE LAB 1 ASSIGNMENT

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Copyright 2005 Pearson Education, Inc. Slide 6-1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Week 4 Inspecting Data: Distributions

23.1 Probability Distributions

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

Putting Things Together Part 2

AP Statistics Chapter 6 - Random Variables

Monte Carlo Simulation (Random Number Generation)

Continuous Probability Distributions

The Normal Distribution

Expected Value of a Random Variable

STAT 201 Chapter 6. Distribution

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

STA 320 Fall Thursday, Dec 5. Sampling Distribution. STA Fall

A useful modeling tricks.

Examples of continuous probability distributions: The normal and standard normal

Establishing a framework for statistical analysis via the Generalized Linear Model

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

1. Variability in estimates and CLT

CH 5 Normal Probability Distributions Properties of the Normal Distribution

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Basic Procedure for Histograms

Making sense of Schedule Risk Analysis

Math Take Home Quiz on Chapter 2

Chapter 5: Summarizing Data: Measures of Variation

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Description of Data I

Transcription:

Math 140 Introductory Statistics Professor Silvia Fernández Lecture 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Summary Statistic Consider as an example of our analysis Round 2 of the layoffs. 20 25 30 35 40 45 50 55 60 65 To simplify the statistical analysis to come, it will help to condense the data into a single number, called a summary statistic. One possible summary statistic is the average, or mean, age of the three who lost their jobs: 55 + 55 + 64 average = = 58 years 3 Martin v. Westvaco Martin: Look at the pattern in the data. All three of the workers laid off were much older than the average age of all workers. That s evidence of age discrimination. Westvaco: Not so fast! You re looking at only ten people total, and only three positions were eliminated. Just one small change and the picture would be entirely different. For example, suppose it had been the 25-year-old instead of the 64-year-old who was laid off. Switch the 25 and the 64 and you get a totally different set of averages: Actual data: 25 33 35 38 48 55 55 55 56 64 Altered data: 25 33 35 38 48 55 55 55 56 64 See! Just one small change and the average age of the three who were laid off is lower than the average age of the others. Actual data Altered data Laid Off 58.0 45.0 Retained 41.4 47.0 Martin v. Westvaco Martin: Not so fast, yourself! Of all the possible changes, you picked the one that is most favorable to your side. If you d switched one of the 55-year-olds who got laid off with the 55- year-old who kept his or her job, the averages wouldn t change at all. Why not compare what actually happened with all the possibilities that might have happened? Westvaco: What do you mean? Martin: Start with the ten workers, treat them all alike, and pick three at random. Do this over and over, to see what typically happens, and compare the actual data with these results. Then we ll find out how likely it is that their average age would be 58 or more. 1

Discussion D5. If you pick three of the ten ages at random, do you think you are likely to get an average age of 58 or more? D6. If the probability of getting an average age of 58 or more turns out to be small, does this favor Martin or Westvaco? Martin v. Westvaco Martin: Look at the pattern in the data. All three of the workers laid off were much older than average. Westvaco: So what? You could get a result like that just by chance. If chance alone can account for the pattern, there s no reason to ask us for any other explanation. Martin: Of course you could get this result by chance. Th e question is whether it s easy or hard to do so. If it s easy to get an average as large as 58 by drawing at random, I ll agree that we can t rule out chance as one possible explanation. But if an average that large is really hard to get from random draws, we agree that it s not reasonable to say that chance alone accounts for the pattern. Right? Westvaco: Right Martin v. Westvaco Martin: Here are the results of my simulation. If you look at the three hourly workers laid off in Round 2, the probability of getting an average age of 58 or greater by chance alone is only 5%. And if you do the same computations for the entire engineering department, the probability is a lot lower, about 1%. What do you say to that? Westvaco: Well... I ll agree that it s really hard to get an average age that extreme simply by chance, but that by itself still doesn t prove discrimination. Martin: No, but I think it leaves you with some explaining to do! Simulation In our example we can draw 3 of the 10 ages at random and compute the average. Then repeat this process a large number of times to see how likely would be to get 58 or more as the answer. Steps in a Simulation: Random model: Create a model for the chance process (pieces of paper thoroughly mixed, sequence of random numbers, computer generated random numbers). Summary Statistic: Calculate it (mean=average in our example) Repetition: Repeat a large number of times (1000s) Display the distribution: (Using a dot plot for example) Estimate the Probability: (In our example the proportion of values that gave 58 or more) Reach a conclusion: Interpret your results. 2

Simulation Martin Case: Round 2 - Hourly workers Discussion D7. Why must you estimate the probability of getting an average age of 58 or greater rather than the probability of getting an average age of 58? Discussion D8. How unlikely is too unlikely? The probability in the previous activity is in fact exactly equal to 0.05. In a typical court case, a probability of 0.025 or less is required to serve as evidence of discrimination. a. Did the Round 2 layoff s of hourly workers in the Martin case meet the court requirement? b. If the probability in the Martin case had been 0.01 instead of 0.05, how would that have changed your conclusions? 0.10 instead of 0.05? Inference Inference is a statistical procedure that involves deciding whether an event can reasonably be attributed to chance or whether you should look for some other explanation. In the Martin case we used simulation as a device for inference to determine whether the relatively high average age of the laid-off hourly employees in Round 2 could reasonably be due to chance. The probability was about 0.05, which was considered small enough to warrant asking for an explanation from Westvaco but not small enough to present in court as clear evidence of discrimination. 3

Practice Average age of 3 workers out of 10 P4. Suppose three workers were laid off from a set of ten whose ages were the same as those of the hourly workers in Round 2 in the Martin case. This time, however, the ages of those laid off were 48, 55, and 55. Actual age average (48+55+55)/3=52.66 45/200=22.5% 25 33 35 38 48 55 55 55 56 64 a. Use the dot plot in Display 1.10 on page 14 to estimate the probability of getting an average age as large as or larger than that of those laid off in this situation. b. What would your conclusion be if Westvaco had laid off workers of these three ages? Practice At the beginning of Round 1, there were 14 hourly workers. Their ages were 22, 25, 33, 35, 38, 48, 53, 55, 55, 55, 55, 56, 59, and 64. After the layoffs were complete, the ages of those left were 25, 38, 48, and 56. Think about how you would repeat Activity 1.2a using these data. a. What is the average age of the ten workers laid off? c. The results of 200 repetitions from a simulation are shown in Display 1.11. Suppose 10 workers are picked at random for layoff from the 14 hourly workers. Make a rough estimate of the probability of getting, just by chance, the same or larger average age as that of the workers who actually were laid off (from part a). 45 dots out of 200 to the right, corresponding to an average of 48.6 or larger. Estimated probability = 45/200=22.5% (22+33+35+53+55+55+55+55+59+64)/10=48.6 b. Describe a simulation for finding the distribution of the average age of ten workers laid off at random. Step 1. Select 10 out of the 14 ages at random and find their average. Step 2. Repeat step 1 many times. (For example, 200 times.) Step 3. Create a dot plot containing the averages obtained from your repetitions. d. Does this analysis provide evidence in Martin s favor? No, a probability of 22.5% is too large to be considered evidence that the actual average may not be due to chance. 4

Visualizing Distributions Uniform (or Rectangular) Distribution Recall the definition: The values of a summary statistic (e.g. the average age of the laid-off workers) and how often they occur. Four of the most common basic shapes: Uniform or Rectangular Normal Skewed Bimodal (Multimodal) Each outcome occurs roughly the same number of times. Examples. Number of U.S. births per month in a particular year (see Page 25) Computer generated random numbers on a particular interval. Number of times a fair die is rolled on a particular number. Month 1 2 3 4 5 6 7 8 9 10 11 Births (in thousands) 305 289 313 342 311 324 345 341 353 329 304 Deaths (in thousands) 218 191 198 189 195 182 192 178 176 193 189 12 324 192 Uniform (or Rectangular) Distribution Normal Distributions Number in Thousands Births in US (1997) 400 300 200 100 0 Month 1 2 3 4 5 6 7 8 9 10 Births (in thousands) 305 289 313 342 311 324 345 341 353 329 Deaths (in thousands) 218 191 198 189 195 182 192 178 176 193 These distributions arise from Variations in measurements. (e.g. pennies example, see 2.3 page 31) Natural variations in population sizes (e.g. weight of a set of people) Variations in averages of random samples. (e.g. Average age of 3 workers out of 10, see 1.10 in page 14) 1 5 8 11 11 304 189 Month 12 324 192 Births 5

Pennies example Average age of 3 workers out of 10 Normal Distributions Normal Distributions Idealized shape shown below (see 2.4 page 32) Properties: Single peak: The x-value of it is called the mean. The mean tells us where is the center of the distribution. The distribution is symmetric with respect to the mean. Idealized shape shown below (see 2.4 page 32) Properties: Inflection points: Where concavity changes. Roughly 2/3 of the area below the curve is between the inflection points. Inflection Points Mean Mean 6

Normal Distributions Idealized shape shown below (see 2.4 page 32) Properties: The distance between the mean and either of the inflection points is called the standard deviation (SD) The standard deviation measures how spread is the distribution. Skewed Distributions These are similar to the normal distributions but they are not symmetric. They have values bunching on one end and a long tail stretching in the other direction The tail tells you whether the distribution is skewed left or skewed right. SD SD Skewed Left Skewed Right Mean Skewed Distributions Example of a skewed right distribution Skewed distributions often occur because of a wall, that is, values that you cannot go below or above. Like zero for positive measurements, or 100 for percentages. To find out about center and spread it is useful to look at quartiles. Skewed Left Skewed Right 7

Median and Quartiles Visualizing Median and Quartiles Median: the value of the line dividing the number of values in equal halves. (Or the area under the curve in equal halves.) Repeat this process in each of the two halves to find the lower quartile (Q1) and the upper quartile (Q3). Q1, the median, and Q3 divide the number of values in quarters. The quartiles Q1 and Q3 enclose 50% of the values. Bimodal Distributions. Example of a bimodal distribution Previous distributions have had only one peak (unimodal) but some have two (bimodal) or even more (multimodal). Bimodal Distribution 8