Putting Things Together Part 1

Similar documents
Putting Things Together Part 2

Describing Data: One Quantitative Variable

appstats5.notebook September 07, 2016 Chapter 5

STAB22 section 1.3 and Chapter 1 exercises

Mini-Lecture 3.1 Measures of Central Tendency

Section3-2: Measures of Center

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

3) Marital status of each member of a randomly selected group of adults is an example of what type of variable?

Chapter 5 The Standard Deviation as a Ruler and the Normal Model

Unit 2 Statistics of One Variable

1. In a statistics class with 136 students, the professor records how much money each

Instructor: A.E.Cary. Math 243 Final Exam

Math Take Home Quiz on Chapter 2

Some estimates of the height of the podium

CHAPTER 2 Describing Data: Numerical

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

1 Describing Distributions with numbers

2 Exploring Univariate Data

22.2 Shape, Center, and Spread

NOTES: Chapter 4 Describing Data

STAT 113 Variability

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Math 243 Lecture Notes

Lecture 2 Describing Data

DATA SUMMARIZATION AND VISUALIZATION

AP Statistics Unit 1 (Chapters 1-6) Extra Practice: Part 1

Chapter 15: Sampling distributions

FINALS REVIEW BELL RINGER. Simplify the following expressions without using your calculator. 1) 6 2/3 + 1/2 2) 2 * 3(1/2 3/5) 3) 5/ /2 4

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

AP Stats ~ Lesson 6B: Transforming and Combining Random variables

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Source: Fall 2015 Biostats 540 Exam I. BIOSTATS 540 Fall 2016 Practice Test for Unit 1 Summarizing Data Page 1 of 6

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Empirical Rule (P148)

Unit 2 Measures of Variation

Chapter 2. Section 2.1

2 DESCRIPTIVE STATISTICS

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

c) Why do you think the two percentages don't agree? d) Create a histogram of these times. What do you see?

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

SOLUTIONS TO THE LAB 1 ASSIGNMENT

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3. Lecture 3 Sections

Descriptive Statistics (Devore Chapter One)

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.

Chapter 6 Section 1 Day s.notebook. April 29, Honors Statistics. Aug 23-8:26 PM. 3. Review OTL C6#2. Aug 23-8:31 PM

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

CHAPTER 6 Random Variables

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

Frequency Distribution and Summary Statistics

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Numerical Descriptive Measures. Measures of Center: Mean and Median

Name PID Section # (enrolled)

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Edexcel past paper questions

Exam 1 Review. 1) Identify the population being studied. The heights of 14 out of the 31 cucumber plants at Mr. Lonardo's greenhouse.

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Terms & Characteristics

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Multiple Choice: Identify the choice that best completes the statement or answers the question.

3.5 Applying the Normal Distribution (Z-Scores)

starting on 5/1/1953 up until 2/1/2017.

1. Variability in estimates and CLT

Name PID Section # (enrolled)

Statistics I Final Exam, 24 June Degrees in ADE, DER-ADE, ADE-INF, FICO, ECO, ECO-DER.

Sampling Distributions

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Lecture 1: Review and Exploratory Data Analysis (EDA)

Solutions for practice questions: Chapter 9, Statistics

LINEAR COMBINATIONS AND COMPOSITE GROUPS

4. DESCRIPTIVE STATISTICS

Lecture Week 4 Inspecting Data: Distributions

POLI 300 PROBLEM SET #7 due 11/08/10 MEASURES OF DISPERSION AND THE NORMAL DISTRIBUTION

Example. Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

BIOL The Normal Distribution and the Central Limit Theorem

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Simple Descriptive Statistics

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Math 140 Introductory Statistics. First midterm September

The Normal Distribution

6.2.1 Linear Transformations

2CORE. Summarising numerical data: the median, range, IQR and box plots

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

7 THE CENTRAL LIMIT THEOREM

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level STATISTICS 4040/01

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Transcription:

Putting Things Together Part 1 These exercise blend ideas from various graphs (histograms and boxplots), differing shapes of distributions, and values summarizing the data. Data for 1, 5, and 6 are in the instructor s shared folder on LakerApps: Putting Things Together Part 1. 1. School districts in Kentucky bid on contracts for milk. The boxplots below are for winning bids from districts in the northern (254 bids) and southern (100) regions of the state. Northern Market Southern 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 Price $ per Pint of Bid Answer the following using only the boxplots above. a) For the Northern region: State the 5-number summary Determine the Range and Interquartile Range (IQR). Approximately what percent of contracts had a winning bid above $0.140 per pint? Guess the standard deviation. Do it two ways: First, using the Range. Second, using a reduced Range computed by ignoring outliers. b) For the Southern region: State the 5-number summary Determine the Range and IQR. Approximately what percent of contracts had a winning bid below $0.145 per pint? Guess the standard deviation both ways again. c) Are the mean bids for the two regions closer together or further apart than the median bids? Explain. (Hint: a little skew is evident, and there are outliers.) 2. A specialty food company sells gourmet hams by mail order. The hams vary in size from 4.00 to 7.25 pounds, with a mean weight of 6.00 pounds and a standard deviation of 0.65 pounds. The quartiles and median are 5.50, 6.20 and 6.55 pounds. a) Find the Range and IQR of the weights. b) Is the distribution of the weights symmetric or skewed? If skewed, which way? Why? 1

# of Movies 3. John has a radar gun, and collects data on the speed of cars passing his house. The mean is 32.5 mph with standard deviation of 2.5 mph. a) Make a rough guess at the percentage of cars that go between 30.0 and 35.0 mph. b) Suppose that cars traveling outside this interval are equally likely to be going faster than 35.0 mph or slower than 30.0 mph. (This is symmetry.) What percentage of cars go faster than 35.0 mph? Slower than 30.0 mph? What can you now say about the percentile ranks for 30.0 and 35.0 mph? c) Make a rough guess at the percentage of cars that go between 27.5 and 37.5 mph. Again assuming symmetry: What can you say about the percentile ranks for 27.5 and 32.5 mph? 4. A popular band on tour played a series of concerts in large venues. They always drew a large crowd, averaging 21,359 fans. While the band did not announce (and probably never calculated) the standard deviation, which of these values do you think is most likely to be correct: 20, 200, 2000, or 20000 fans? Explain your choice. 5. The histogram shows the budgets of 110 major movie releases in a recent year. Try answering the questions below just using the histogram. (You can check answers using the data.) 20 15 10 5 0 a) What shape is this distribution? 0 b) How many of the movies had budget less than $40 million? c) What percent of the movies had budget less than $40 million? (This is the percentile rank of $40 million.) d) The median is approximately $ million. e) The mode is approximately $ million. f) Guess the standard deviation of budgets. 90 Budget ($ millions) g) Which of the following is closest to the mean budget? 30 60 $25 million $40 million $52 million $90 million 120 150 180 2

6. Surveying members of her church, a young woman obtains the following data on the number of marriages for each adult male (of whom there are 24): 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 3 Obtain the 5 number summary for this data. Notice that the first, second and third quartiles are identical. Our working definition of a percentile is k% of the data is lower and (100-k)% is higher. In this situation it makes no sense to say all three of the following simultaneously: 25% of the data is below 1; 75% above 1 50% of the data is below 1; 50% above 1 75% of the data is below 1; 25% above 1 This data is far too discrete for percentiles and percentile ranks to be useful. Here s the most effective summary for this type of highly discrete data # of marriages 0 1 2 3 % of males 21% 58% 13% 8% 7. Would it make good sense to summarize data with the five number summary for the following variables? (While you re at it: Circle only the variable; underline only the units.) a) The number of goals scored in a professional soccer game. (If you don t know anything about scoring in soccer, try an internet search of scores of soccer games. You ll quickly learn something.) b) The mean number of goals per game scored by professional soccer teams over the course of a lengthy season. c) Running times of movies (as displayed on a DVD player). d) The number of living grandparents of currently enrolled college students. e) Favorite brand of beer of 25-year-old males. 3

Solutions 1. a) Northern b) Southern 5-#-Summary: { 0.1041, 0.1224, 0.1280, 0.1400, 0.1800 } { 0.1064, 0.1374, 0.1447, 0.1525, 0.1690 } Range 0.0759 0.0626 IQR 0.0176 0.0152 The above values don t have to be exact, but you should be within 0.001 on each. % above 0.140? Since 0.140 is Q 3, 25% of the data are above 0.140. Guess SD 0.0759 / 4 = 0.019 (0.166 0.104) / 4 = 0.0155 % below 0.145? Since 0.145 is awful close to the median, 50% of the data are above 0.145. 0.0626 / 4 = 0.0157 (0.168 0.119) / 4 = 0.0125 Actual SD = 0.01579* Actual SD = 0.01329* This somewhat demonstrates that, at least for fairly large data sets, ignoring outliers and using a usual range yields a better guess of SD. c) Means The outliers will pull on the mean. For the Northern region the mean will be above the median; for the Southern, below. Consequently, the means are closer than the medians. *Here s what Minitab gives 1 : Descriptive Statistics: $ per Pint Mean = 0.13309* Mean = 0.14306* Variable Market Minimum Q1 Median Q3 Maximum IQR Range $ per Pint Northern 0.10410 0.12230 0.12800 0.14000 0.18000 0.01770 0.07590 Southern 0.10640 0.13705 0.14470 0.15250 0.16900 0.01545 0.06260 Difference in medians: 0.0167 Variable Market Mean StDev $ per Pint Northern 0.13309 0.01579 Southern 0.14306 0.01329 Difference in means: 0.0098 1 Minitab uses a slightly different method for percentiles. Consequently, it reports quartiles a bit different from what you can get in a spreadsheet. However: They are very close which is always the case for large data sets. 4

2. a) Range = 3.25 pounds; IQR = 1.05 pounds. b) This would be a left skewed distribution. The mean is below the median, which hints at left skew or outliers to the left. The simple boxplot is shown. Clearly data falls much further to the left of center than to the right. 4.0 4.5 5.0 3. a) About 68% of cars go between 30.0 and 35.0 mph, so about 32% do not. b) If they are split symmetrically, then 16% go less than 30 mph. So 30 mph is the 16 th percentile. Similarly, 16% go faster than 35.0 mph, so 35.0 mph is the 84 th percentile. These are only rough guesses which is the best one can do without more information. c) 95% of cars go between 27.5 and 37.5 mph. 27.5 mph has percentile rank of 2.5; 37.5 has percentile rank of 97.5. (Again, these are only rough guesses.) 4. 2000 is the best choice. This would put about 2 in 3 concerts (2/3 is quite close to 68%) with attendance between 19359 and 23359 and most (95%) between 17359 and 25359. A standard deviation of 20000 would imply about 68% of shows having between 359 and 41,359 as attendance and the problem states that concerts always drew a large crowd (values about 360 are ruled out). A standard deviation of 20 makes the attendance within 60 of 21,359 for almost all concerts, which is hard to believe. A standard deviation of 200 is not the worst answer (partial credit), but if that s the value then attendance never falls outside of 20,759 to 21,959, which is still a fairly narrow range especially given that the band would be playing in venues (arenas, probably) of a variety of capacities. 5. a) This is a right skewed distribution. b) 55 of the movies had budget less than $40 million. (5 + 11 + 19 + 20 = 55). c) 55 of 110 is 50%. The percentile rank of $40 million is about 50. d) So:$40 million is approximately 50 th percentile. $40 million is the median. (In fact, the median is $40 million. 2 ) e) The mode is about $35 million ($30 million is an acceptable answer). f) The range is about $180 million, yielding a guess of $45 million for the standard deviation. A better guess applies the principle from #1 above: Ignoring the outliers, the usual range is about $135 million, which leads to a guess of about $34 million for the standard deviation. (In fact, the standard deviation is $36.60 million 3.) g) The mean is $52.46 4 million. So $52 million is the best choice. 5.5 6.0 6.5 7.0 7.5 2 The data are available. Just compute it and check. 3 See the previous footnote. 4 See the previous footnote. 5

6. The 5-number summary is { 0, 1, 1, 1, 3 }. Not very informative except that you get the feel that 1 is most common. 7. Units Variable Percentiles? a) Soccer games Number of goals scored No b) Soccer teams Mean number of goals scored Yes c) Movies Running time Yes d) College students Number of living grandparents No e) 25-year-old males Favorite brand of beer Ridiculous e is not even quantitative. It makes no sense to talk about how much data is below (which is what percentiles are about) when the data consists of brands of beer. a and d are very discrete. These situations are very similar to what s going on in #6 above there d only be a handful of values, replicated quite often. So a 5 number summary would not be that effective. The variables described in b and c are both fairly continuous especially movie running times which are easily measured to the nearest 1 second, and on directors computers are displayed to the nearest 0.001 second. You wouldn t expect many soccer teams, after a long season, to have exactly the same average number of goals. While there might be a handful of ties, they would be relatively uncommon, and so percentiles / quartiles would provide a reasonable summary of the data. 6