Putting Things Together Part 2

Similar documents
Putting Things Together Part 1

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

appstats5.notebook September 07, 2016 Chapter 5

DATA SUMMARIZATION AND VISUALIZATION

STAB22 section 1.3 and Chapter 1 exercises

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Some estimates of the height of the podium

1 Describing Distributions with numbers

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Skewness and the Mean, Median, and Mode *

Describing Data: One Quantitative Variable

Lecture 2 Describing Data

2 Exploring Univariate Data

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Descriptive Statistics

STAT 113 Variability

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Section3-2: Measures of Center

Lecture 1: Review and Exploratory Data Analysis (EDA)

Math 227 Elementary Statistics. Bluman 5 th edition

Numerical Descriptive Measures. Measures of Center: Mean and Median

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

CHAPTER 2 Describing Data: Numerical

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Unit 2 Statistics of One Variable

Mini-Lecture 3.1 Measures of Central Tendency

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.

Empirical Rule (P148)

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Description of Data I

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Section 6-1 : Numerical Summaries

GOALS. Describing Data: Displaying and Exploring Data. Dot Plots - Examples. Dot Plots. Dot Plot Minitab Example. Stem-and-Leaf.

Example: Histogram for US household incomes from 2015 Table:

Data Analysis and Statistical Methods Statistics 651

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

Numerical Descriptions of Data

Copyright 2005 Pearson Education, Inc. Slide 6-1

Math 243 Lecture Notes

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Descriptive Statistics

Lesson 12: Describing Distributions: Shape, Center, and Spread

Chapter 4-Describing Data: Displaying and Exploring Data

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

4. DESCRIPTIVE STATISTICS

Frequency Distribution and Summary Statistics

Lecture Week 4 Inspecting Data: Distributions

3.1 Measures of Central Tendency

STA 248 H1S Winter 2008 Assignment 1 Solutions

Chapter 4. The Normal Distribution

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

STAT 157 HW1 Solutions

Chapter 15: Sampling distributions

Descriptive Analysis

Chapter 4-Describing Data: Displaying and Exploring Data

Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation.

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

AP Statistics Unit 1 (Chapters 1-6) Extra Practice: Part 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

Lectures delivered by Prof.K.K.Achary, YRC

Simple Descriptive Statistics

2CORE. Summarising numerical data: the median, range, IQR and box plots

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Shifting and rescaling data distributions

LINEAR COMBINATIONS AND COMPOSITE GROUPS

Describing Data: Displaying and Exploring Data

SOLUTIONS TO THE LAB 1 ASSIGNMENT

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

8. From FRED, search for Canada unemployment and download the unemployment rate for all persons 15 and over, monthly,

2 DESCRIPTIVE STATISTICS

Monte Carlo Simulation (Random Number Generation)

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

starting on 5/1/1953 up until 2/1/2017.

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.

DATA HANDLING Five-Number Summary

Data Analysis. BCF106 Fundamentals of Cost Analysis

FINALS REVIEW BELL RINGER. Simplify the following expressions without using your calculator. 1) 6 2/3 + 1/2 2) 2 * 3(1/2 3/5) 3) 5/ /2 4

The Normal Distribution

CH 5 Normal Probability Distributions Properties of the Normal Distribution

1.2 Describing Distributions with Numbers, Continued

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Transcription:

Frequency Putting Things Together Part These exercise blend ideas from various graphs (histograms and boxplots), differing shapes of distributions, and values summarizing the data. Data for, and are in the instructor s shared folder on LakerApps: Putting Things Together Part.. Data on the number of laps completed by drivers in a -lap car race. (The race was called off due to rain with laps to go.) n =. a) Obtain a histogram and identify the shape of this distribution. b) Determine the number summary. { } c) Draw a simple boxplot below the histogram. d) Determine the interquartile range and range: IQR = Range = e) Determine the mean and standard deviation: Mean = SD = f) Compute Range Standard Deviation to determine how many times bigger the standard deviation is than the range. The range is times bigger than the standard deviation. g) Mode =. Technically the mode (most common value) is every value in this data set (because all of them appear once; no ties). One way to more meaningfully identify a mode is to use a representative value from the interval that occurs most often in the histogram. For this histogram, values in the interval - occur most often, and it s correct to say is (approximately) the mode. h) For left skewed data (like this), how do the mean, mode and median compare? Which is largest? Which is smallest? Boxplot

Frequency Frequency Frequency Frequency. Here are histograms for data sets A, B, C and D. Notice that all are drawn to the same scales in both the data scale (horizontal) and the frequency scale (vertical). A B C D a) Determine the number summary for A. Identify the shape of each distribution. Complete the table below. Data Set Shape Min Q Median Q Max A B..... C..... D..... c) Use these number summaries to construct a simple boxplot for each data set. (Space is provided below. Stack the boxplots one atop the other.

A B C D Data d) Look at the boxplots ( # summaries) and histograms. While shape is generally defined in terms of a histogram, you should see that the orientation of a boxplot (the pattern of the # summary) provides a good indication of its shape. For a right skewed distribution (such as A) the # summary has the min, Q and median rather close, then Q and the max are relatively distant. For a symmetric distribution (B), the distance between the min and Q is about equal to that for Q and the max; Q to the median is about the same as the median to Q. The left skewed distribution is a mirror image of the right skewed distribution. A good quantitative key is to compare the distances from the quartiles to the median. If the first quartile is a lot closer to the median than the third, then you re probably looking at right skew. If the third quartile is a lot closer to the median than is the first, then you re probably looking at left skew. Q to median Distances median to Q Shape A:. < <. Right skewed B:.. Symmetric C:. > >. Left skewed D:. > >. Left skewed Of course, you want to see these comparisons through the boxplot you don t want to be computing all this. The < < means a good deal less than ; similarly > > means a good deal greater than. Finally, means approximately equal.

e) Here are means and standard deviations for the four data sets: Data Set Mean StDev A.. B.. Data Set Mean StDev C.. D.. For which data set are the mean and median closest? Notice the shape of the distribution for this data set. For sets C and D the mean is a decent amount less than the median. What shape are the distributions for C and D? For set A the mean is a decent amount greater than the median. What shape is the distribution for A? Fill in the blanks below with one of these phrases: less than greater than about equal to. For a right skewed distribution the mean is the median. For a left skewed distribution the mean is the median. For a symmetric distribution the mean is the median. For continuous data, it is rare for the mean and median to be exactly the same. Distributions of real data are virtually never exactly symmetric. Slight discrepancies between mean and median exist for a distribution that is best described as symmetric. (By slight discrepancy is meant: The mean and median plot very close to each other on the horizontal (x) axis of the histogram or boxplot.) f) Identify a reasonable value for the mode for each of the sets A D. A: B: C: D: You probably answered. for set A. Fine. Most statisticians would say. Why? Because the histogram shows a pattern of increasing frequency for values closer to. In the same way, a statistician would say that the mode for set C is. Now, examine the relationship between mean, median and mode for the data sets, keeping in mind the distribution shapes. A: Mode = (or.) Median =. Mean = Right skewed B: Mode =. Median =. Mean =. Symmetric C: Mode = (or.) Median =. Mean =. Left skewed D: Mode =. Median =. Mean =. Left skewed Suppose the mean and mode are quite different which generally happens when there is skew. Where does the median generally fall, relative to the mean and mode?

Percent Percent Percent Percent. a) For each histogram, identify the shape of the distribution. Also give reasonable values for the mode of each distribution............... A B.............. C D b) Match the histograms to the boxplots. c) For each of the distributions identified by histogram letters A, B, C and D, determine how the mean and median compare to each other, as well as to the mode.... Data...

. Here s a boxplot of the amounts people paid for an identical model of car. (Different people pay different amounts because automobile prices are usually negotiated.) Answer from the boxplot alone. (Do the best you can. No one can be exact.) a) Determine the -# summary. b) Determine values for the range and IQR. c) About what % of people paid over,? d) About what % of people paid between, and,? e) Should the mean price be less than, more than, or about equal to the median price?

Frequency Frequency Frequency Frequency. Consider the four data sets Set A............. Set B............. Set C............. Set D............. a) Obtain histograms for all four sets. (The preferred method is to use a computer to do this. If you do that, the scales of the histograms might be somewhat different from those shown below which have been forced to be identical. That s OK.) A N B C L D U b) Examine the histograms. Without any computing: i) What approximately are the means of these four sets? ii) Which set do you think has the most variability? The least? Rank the sets A, B, C and D, from least to most variable.

c) Obtain the five number summary for data set A. Similar five number summaries are shown for the other three data sets. Also determine the range and interquartile range (IQR). A: {,,,, } IQR = Range = B: {.,.,.,.,. } IQR =. Range =. C: {.,.,.,.,. } IQR =. Range =. D: {.,.,.,.,. } IQR =. Range =. d) Use the Range Rule of Thumb to guess the standard deviations for these four sets of data. e) Obtain mean and standard deviation for set A. f) Make some comparisons: How do the means compare? Set A B C D Mean... SD... Standard deviation, range and IQR are all measures of variability. The aim of this exercise is a finer point demonstrating that Range alone is somewhat flawed. How do the ranges compare? Rank the data sets A, B, C and D from smallest range to largest. How do standard deviations compare? Rank the data sets A, B, C and D from smallest standard deviation to largest. (This is how you want to answer (ii) of part (b) above.) Do these rankings agree with those for the range? Could you use the order of ranges to predict the order of standard deviations? How do IQRs compare? Rank the data sets A, B, C and D from smallest IQR to largest. Do these rankings agree with those for the standard deviation? Comment: The Range Rule of Thumb tends to work better when the shape is near Normal (bell). Data set A is closest to Normal shaped. The Range Rule of Thumb predicts. =. for the standard deviation, and, in fact, for set A the actual standard deviation is quite close to that:..

. The histograms below are all drawn to the same scale on the horizontal (x) axis. Min A Max Min B Max Min Max C a) How do the ranges compare for these four distributions? Min L D Max b) The standard deviations for the distributions are:,, and. Which standard deviation goes with each of the four histograms? U N c) Match the boxplots to the histograms. P d) Which of these distributions (A, B, C, D) has largest interquartile range? Second largest? Second smallest? Which has the smallest? How does this ranking comparing to that for the standard deviations? e) For which distribution is the Range Rule of Thumb (Range SD) going to work best? f) Suppose you know the means of these distributions are all. Use your answer to e to guess the values of Min and Max.

Solutions. a) The distribution is left skewed. b) The five number summary is {.,.,.,.,. } d) IQR =., Range =.. e) The mean is., the standard deviation is.. f).. h) Mean =.; median =.; mode =. So for left skewed data: mean < median < mode.. a) A is right skewed; B is symmetric; C and D are both left skewed. b) {.,.,.,.,. } is the # summary. The mean is. with standard deviation.. c) A B C D Data e) The mean and median are the closest for the symmetric distribution for data set B. The mean is. and the median is. these are very close when marked on the scale of the

histogram or boxplot. When the mean is below the median (as for C and D) we see left skew. When the mean is above the median (as for A) we see right skew. For a right skewed distribution the mean is greater than the median. For a left skewed distribution the mean is less than the median. For a symmetric distribution the mean is about equal to the median. f) The median is generally between the mean and mode. We can extend the results of part e: If Mean < Median < Mode then you are probably looking at a left skewed distribution. If Mean > Median > Mode then you are probably looking at a right skewed distribution. If the three are fairly close to each other, you are probably looking at a fairly symmetric distribution.. a) A and D are right skewed (D is more skewed than is A); B is symmetric; C is left skewed. For the modes, see the table below. b) A-; B-; C-; D-. c) A: Mode =. < Median < Mean B: Mode =. Median Median (so the mean and median are about.) C: Mean < Median < Mode =. D: Mode = < Median < Mean Your modes may be a little different, but should basically be in the same place when you compare to mine with marks under the horizontal (x) axis of the histograms.. a) {,,,, } (if you are fairly close, that s good). b) The range is about and the IQR about. Again: Your values should be close. c) About % from the boxplot. d) From the boxplot: About %. e) This is the sort of boxplot you d see for a right skewed distribution. So, the mean should be larger than the median.

. c) and e) Here are some corresponding statistics. Variable Mean StDev Minimum Q Median Q Maximum Range IQR A......... B......... C......... D......... d) The range rule of thumb does not discriminate the differences in variability among these data sets. The range rule of thumb anticipates a standard deviation of (..) / =. for each of the data sets. f). The means are identical. (The medians are nearly the same, and fairly close to the means which goes hand in hand with the symmetry shown in all these distributions.) The ranges are identical. The standard deviations are somewhat different. C has lowest, then A, then D, with B highest. Standard deviation measures a standard (typical) deviation from the mean. Take C for instance: Almost all the data is very near the mean. So the standard deviation is small relative to the others. For B however, much of the data is at the extremes far from the mean. The standard deviation is large relative to the others. The standard deviation is a more subtle measure of variability than is the range.. is a decent guess for all four but this exercise is designed to reinforce this idea: While standard deviation tends to be about ¼ the range, there is more to it than that. Standard deviation takes into account not only what the largest deviations from the center (mean) are, but also how often these occur relative to smaller deviations. The IQRs also measure variability, and they also discriminate the differences in variability among the data sets better than do the ranges. You can see that the order of IQRs (from small to high: C, A, D, B) is the same as for standard deviations. a) They are about the same. b) C-, A-, B-, D-. c) U-A, N-B, P-C, L-D. d) P has largest; then U; then N; L has smallest. The IQR is the width of the box, so a comparison is simple. Then using the result to part c we can order the histograms by IQR: C has largest; then A; then B; D has smallest. IQRs rank the same as standard deviations. (Which is good. They are just different ways of measuring the variability and generally they discriminate the same way. You can see that the range can, in cases, be unable to make this discrimination. Look at C and D. Clearly there s more variability in C extremes (values far from the mean/center) are very likely for C and very uncommon in D. e) B the one that has a Normal (bell) shape. f) Range rule of thumb works best with Normal shapes. For B we have a standard deviation of, anticipating a range of. With a mean at, we d have Min around and Max around.