STAB22 section 1.3 and Chapter 1 exercises

Similar documents
The Normal Distribution

Putting Things Together Part 2

Describing Data: One Quantitative Variable

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Lecture 2 Describing Data

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Frequency Distribution and Summary Statistics

2 Exploring Univariate Data

Expected Value of a Random Variable

DATA SUMMARIZATION AND VISUALIZATION

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.

Chapter 2. Section 2.1

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

STAB22 section 2.2. Figure 1: Plot of deforestation vs. price

Descriptive Statistics (Devore Chapter One)

Putting Things Together Part 1

FINALS REVIEW BELL RINGER. Simplify the following expressions without using your calculator. 1) 6 2/3 + 1/2 2) 2 * 3(1/2 3/5) 3) 5/ /2 4

Numerical Descriptive Measures. Measures of Center: Mean and Median

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

STAT 113 Variability

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Applications of Data Dispersions

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

NOTES: Chapter 4 Describing Data

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Example: Histogram for US household incomes from 2015 Table:

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Chapter 6. The Normal Probability Distributions

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 3. Lecture 3 Sections

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Description of Data I

IOP 201-Q (Industrial Psychological Research) Tutorial 5

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Example. Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables

1 Describing Distributions with numbers

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

3.5 Applying the Normal Distribution (Z-Scores)

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Examples of continuous probability distributions: The normal and standard normal

What s Normal? Chapter 8. Hitting the Curve. In This Chapter

Basic Procedure for Histograms

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Math 227 Elementary Statistics. Bluman 5 th edition

Descriptive Statistics

appstats5.notebook September 07, 2016 Chapter 5

The Normal Distribution

Data Analysis and Statistical Methods Statistics 651

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Unit 2 Statistics of One Variable

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.

The Normal Probability Distribution

CHAPTER 2 Describing Data: Numerical

Math Take Home Quiz on Chapter 2

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

ECON 214 Elements of Statistics for Economists

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Announcements. Unit 2: Probability and distributions Lecture 3: Normal distribution. Normal distribution. Heights of males

Chapter ! Bell Shaped

Continuous Probability Distributions

Chapter 5 The Standard Deviation as a Ruler and the Normal Model

Some estimates of the height of the podium

Solutions for practice questions: Chapter 9, Statistics

ECON 214 Elements of Statistics for Economists 2016/2017

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Honors Statistics. 3. Discuss homework C2# Discuss standard scores and percentiles. Chapter 2 Section Review day 2016s Notes.

3.1 Measures of Central Tendency

Properties of Probability Models: Part Two. What they forgot to tell you about the Gammas

Lecture 6: Chapter 6

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

Chapter 8 Estimation

Lecture 1: Review and Exploratory Data Analysis (EDA)

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

1. Confidence Intervals (cont.)

Transcription:

STAB22 section 1.3 and Chapter 1 exercises 1.101 Go up and down two times the standard deviation from the mean. So 95% of scores will be between 572 (2)(51) = 470 and 572 + (2)(51) = 674. 1.102 Same idea as the previous exercise, but go up and down 3 times the SD: from 572 (3)(51) = 419 to 572 + (3)(51) = 725. In a normal distribution, going up and down 3 times the SD includes almost all the data; this range has length 3+3 = 6 times the SD (which is the origin of the name six sigma in statistical process control). 1.104 z = (510 572)/51 = 1.21. A z-score says how your given value, here 510, compares to the mean. Here, the z-score is negative because the value is below the mean (it doesn t matter that the mean is positive). If you want another example, consider a city where the mean temperature in January is 10, with an SD of 6. A temperature of 5 has a z-score of z = (( 5) ( 10))/6 = 0.83 (positive because it is above the mean), and a temperature of 19 has a z-score of z = (( 19) ( 10))/6 = 1.5, negative because 19 is below the mean (and not just because 19 itself is negative). 1.105 Figure out z and look it up in Table A. z = (620 572)/51 = 0.94; a proportion 0.8264 of the normal curve is less than this. To find how much is greater, do the same calculation but subtract the answer from 1: 1 0.8264 = 0.1736. (The table always gives you less than.) 1.106 Turn your two values (620 and 660) into z- scores, look them both up in the table, and subtract. As we found in the previous question, 620 has a z-score of 0.94, which corresponds to 0.8264 in the table. 660 has z = (660 572)/51 = 1.73, which goes with proportion 0.9582. The proportion between 620 and 660 is 0.9582 0.8264 = 0.1318. Another way to see this is: the proportion of students scoring less than 620 is 0.8264; the proportion scoring more than 660 is 1 0.9582 = 0.0418; everyone else scores between, so the proportion is 1 0.8264 0.0418 = 0.1318. This way is perhaps easier to understand, but the first way is easier to do. 1.107 If 25% of students are going to score bigger than this score (whatever it is), 75% will score less than this score. Look up 0.7500 in the body of Table A; it s between 0.7486 and 0.7517, so z is between 0.67 and 0.68, slightly closer to 0.67. Then unstandardize this z value using the formula on page 64 (just above the question for 1.107) and the given mean and SD to get a score 1

of 572 + (0.67)(51) = 606. (If you are going to be in the top 25%, you ll need a score a bit bigger than the mean.) 1.116 It s a density curve, so the area under it has to be 1. The area of the shape shown is its width times its height; for the area to be 1, the height must be 1 as well. For (b), draw the (vertical) line x = 0.35 on the picture; the piece on the left is what you want. The width is 0.35 0 and the height is 1, so the area is 0.35 1 = 0.35. (c) is the same idea: draw the area that has x between 0.35 and 0.65; the width is 0.65 0.35 = 0.3 and the height is 1, so the area is 0.3. You might guess that the proportion of the uniform distribution between a and b is b a, and you would be correct. Use a = 0 or b = 1 if you don t have a lower or upper limit, as in (b). 1.117 This density curve is also a rectangle, so the area, width times height, has to be 1. Since the width is 4, the height must be 1. In (b), the width 4 is 1 0 = 1, so the proportion is 1 1 = 1. (If you 4 4 draw a picture, the area you want is obviously a quarter of the rectangle). For (c), the width is 2.5 0.5 = 2 so the proportion is 2 1 = 0.5. 4 1.118 For the median and quartiles, you want the values that cut off area 0.5, 0.25 and 0.75. These are 0.5 (median), 0.25 (Q1) and 0.75 (Q3). For the mean, note that the density curve has a symmetric shape, so the mean and median must be equal, 0.5. 1.119 For a skewed density curve, like a skewed histogram, the mean is pulled farthest into the tail, and the median lies between the peak and the median. Thus for (a), C is the mean and B the median. For (c), A is the mean and B the median. For a symmetric density curve with one peak, the mean and median are at that peak, so in (b), mean and median are both A. 1.120 It s easiest to draw the bell-curve first and then put the numbers below it. The mean is at the peak, and the shoulders of the curve are one standard deviation above and below the mean (that is, where the curve stops curving downwards and starts curving outwards). My rough sketch is in Figure 1. 1.122 For (a), go up and down 2 standard deviations from the mean, that is from 266 (2)(16) = 234 to 266 + (2)(16) = 298. Because the normal distribution is symmetric, these values cut off half of 5%, that is, 2.5%, on each end. So the shortest 2.5% of pregnancies last 234 days or less and the longest 2.5% last 298 days or more. 1.123 For (a), go up and down 3 times the SD: between 336 (3)(3) = 327 and 336 + (3)(3) = 345 2

days. (The SD for horses is less than for humans, so we can make more precise statements about pregnancy lengths for horses as compared to humans.) (b) is tricky: the middle 68% are between 333 and 339 days (1 SD), so of the other 32%, half (16%) are below 333 days and half (16%) are above 339 days. Figure 1: Normal density curve for ex. 1.120 1.125 It s easiest to get the data into software first (on the disk, look for the data acidrain). Calling for the Descriptive Statistics gets you the mean, 5.4256, and the SD, 0.5379. The normal probability plot was fairly straight, indicating that a normal distribution is a good fit to these data. So the 68-95-99.7 rule should be fairly accurate. The 68% limits are 5.4256 0.5379 = 4.8877 and 5.4256 + 0.5379 = 5.9635. Then go to the data in the Minitab worksheet and count how many of the values are within these two limits. This is made easier by the fact that the data values are sorted in order. The values in rows 18 to 88 inclusive are between these values; there are 88 18+1 = 71 of them, which is 71/105 100% = 67.6%. This is very close to 68%. For the others, go up and down 2 (and 3) times the SD from the mean, and count how many data values fall between those limits. The 95% limits are 5.4256 (2)(0.5379) = 4.3498 3

and 5.4256 + (2)(0.5379) = 6.5014. The values in rows 2 to 101 are between these limits; these are 100/105 = 95.2% of the total, again very close to 95%. The 99.7% limits are 5.4256 (3)(0.5379) = 3.8119 and 5.4256 + (3)(0.5379) = 7.0393. All 105 values, 100%, fall between these limits. This is again very close to 99.7%. A reminder that the rules don t work exactly for actual data, but if the data are close to normal, the rules will be close to correct. 1.131 130 goes with z = (130 100)/15 = 2. Thus those more than 2 SDs above the mean would qualify, which should be the same proportion as those further than 2 SDs below the mean. So, 2.5%, or more accurately, 0.0228. 1.132 Calculate z values to compare the results fairly, using the mean and SD for the test that was taken. Tonya s z is (1820 1509)/321 = 0.97, while Jermaine s is (29 21.5)/5.4 = 1.39. Both scored above average, but Jermaine s score is higher in standardized units. Sometimes you will see these scores expressed as percentile ranks, which is the percentage of all people who would score less than the given score. You can get these by looking up the z s in Table A: Tonya is at the 83rd percentile (table gives 0.8340) while Jermaine is at the 92nd percentile (table gives 0.9177). It is perhaps clearer this way that Jermaine s performance is better. See 1.136 and 1.137. 1.133 Jacob scored z = (16 21.5)/5.4 = 1.02, and Emily scored z = (1020 1509)/321 = 1.52. Both scored below average, but Jacob did better relative to the mean on his test. (Jacob is at the 15th percentile and Emily is at the 6th, using the same kind of calculation as in 1.132.) 1.134 Jose s score standardizes to z = (2080 1509)/321 = 1.78. To find out the equivalent ACT score, x, figure out how you would standardize x and put that equal to 1.78: (x 21.5)/5.4 = 1.78. Then solve for x to get 31.1, or 31 rounded off (since ACT scores are given as whole numbers). Or, if you prefer, do some trial and error to see what ACT score standardizes to 1.78: 31 is a little too low, 32 is too high by more, so 31 is best. 1.135 Same idea: Maria s z is z = (30 21.5)/5.4 = 1.57, and the standardized SAT score, say x, has to come to the same thing: (x 1509)/321 = 1.57, so x = 2012.97, which would presumably be rounded off to 2013. Or (in the previous two exercises) you can use the rule for unstandardizing z-values given at 4

the top of page 68: x = µ + σz, where µ is the mean and σ the SD. Thus you d get x = 21.5 + 5.4(1.78) = 31.1 and x = 1509 + 321(1.57) = 2012.97. 1.136 This is the same Tonya as in 1.132, with z = 0.97. Look in Table A to find that the proportion less than this is 0.8340, so the percentile is 83 (rounded off). 1.137 This is the Jacob of 1.133, with z = 1.02. The proportion less than this is 0.1539, so Jacob is at the 15th percentile (rounded off). 1.138 First, ask yourself what z-value cuts off the top 10% of the standard normal distribution. This same value cuts off the bottom 90%, and thus (Table A) is about z = 1.28. Then unstandardize according to the mean and SD of SAT scores: those SAT scores above 1509 + 321(1.28) = 1920 make up the top 10%. 1.140 First find the quartiles of the standard normal distribution. These are z = 0.67 and z = 0.67 (look up 0.25 and 0.75 in the body of Table A). Then unstandardize them according to the mean and SD of ACT scores: Q1 = 21.5+5.4( 0.67) = 17.8 and Q3 = 21.5 + 5.4(0.67) = 25.1. (These don t necessarily need to be rounded off since a quartile of whole numbers doesn t itself have to be a whole number.) 1.141 Same idea as the previous exercise, though it looks a bit more scary. Find the quintiles of the standard normal distribution by looking up 0.2, 0.4, 0.6 and 0.8 in the body of Table A: this gives z = 0.84, 0.25, 0.25, 0.84. (Note the symmetry: 0.2 below is the mirror image of 0.8 below, which is 0.2 above.) Then unstandardize onto the SAT scale: 1509 + 321( 0.84) = 1239, 1509 + 321( 0.25) = 1429, 1509 + 321(0.25) = 1589, 1509 + 321(0.84) = 1779. (You can keep one decimal place here, by the same argument as 1.140.) Note that the quintiles are equally spaced on the proportion scale but not the score scale: 1239 and 1429 are farther apart than 1429 and 1589, for instance. 1.142 (a) Standardize 40 (using the correct mean and SD), and look it up in Table A. z = (40 55)/15.5 = 0.97, so a proportion 0.1660 of young women have cholesterol level lower than 40. (b) This implies 60 or higher : since HDL can take any value, it has no chance of being exactly 60. Turn 60 into a z-score first: z = (60 55)/15.5 = 0.32. Table A gives proportion 0.6255 for this, which is the proportion less. So the proportion of women with HDL 60 or higher is 1 0.6255 = 5

0.3745 or 37%. (c) Everyone between is neither low (part (a)) or high (part (b)), so the answer is 1 0.1660 0.3745 = 0.4595. If you didn t recognize that you could use parts (a) and (b), you can churn through the whole thing: turn 60 into a z-score (0.32), turn 40 into a z-score (-0.97), look them both up in Table A (0.6255 and 0.1660) and subtract (0.6255 0.1660 = 0.4595). 1.143 Same kind of calculations as the previous exercise, but now using the different means and SDs. For 40, z = (40 46)/13.6 = 0.44, proportion less is 0.3300. For 60, z = (60 46)/13.6 = 1.03, proportion less is 0.8485, proportion more is 1 0.8485 = 0.1515. Proportion between 40 and 60 is 1 0.3300 0.1515 = 0.5185. Compared to the women of 1.142, the men have a lower HDL on average than the women (with a similar spread), so more of them have low HDL and fewer of them have high HDL. 1.145 (a) z = (240 266)/16 = 1.63, so 0.0516, or about 5%. (b) For 270 days, z = (270 266)/16 = 0.25, so proportion 0.5987 are shorter than this. Proportion between is 0.5987 0.0516 = 0.5471. (c) The z for 0.2 longer is that for 0.8 shorter, which is z = 0.84 (look for 0.8000 in the body of Table A). Unstandardize to get 266 + 16(0.84) = 279.4 days. 1.146 As found in 1.140: z = 0.67 and z = 0.67. For (b), unstandardize these to get Q1 = µ 0.67σ and Q3 = µ + 0.67σ. (Never mind that these are formulas; the unstandardizing works the same way.) For (c), put in the values 266 for µ and 16 for σ to get Q1 = 255.3 and Q3 = 276.7. 1.147 (a) is the difference of the two numbers found in 1.146: 0.67 ( 0.67) = 1.34. The easiest way to do (b) is to look back at 1.146(b) and take the difference between Q3 and Q1 (using the formulas), which is 1.34σ. The µs cancel out, which makes sense: the IQR, as a measure of spread, depends only on the standard deviation, which is another measure of spread. This way, the answer has to be c = 1.34, no matter what µ and σ are such is the power of mathematics. 1.150 This one is clearly not normal, because the plot is not close to a straight line. If you think about how it fails to be a straight line: there are too many values with emissions close to 0 (the plot is too flat on the left compared to the middle of the plot), and too many values with high emissions (because the plot is too steep on the right compared to the middle). Thus the distribution is skewed to the right. (Or you can memorize which 6

shape of curve goes with which kind of skew, but you would do better to be able to work it out from scratch.) The three countries at the top right look like outliers: if you were to make a straight line through the middle of the data, the line would definitely go below these three points. 1.164 Bar graphs for population and for open space would show only that the cities vary in both population and in how much open space they have. To investigate further, type the data into a Minitab worksheet and create a fourth column (using Calculator) with rate of open space per thousand residents. To make a bar chart, select Graph and Bar Chart. Select Bars represent values from a table. Select the Simple option. Select OK. In the next dialog, select your rates into the Graph Variables box, and select City as your categorical variable. Select Bar Chart Options, and order the bars by Increasing Y (for (e)). My bar chart is shown in Figure 2. I prefer a chart with the heights of the bars ordered, because it makes comparison easier. Here, you see that Miami and Chicago have a noticeably small amount of open space per inhabitant, and Washington DC and Minneapolis have a noticeably large amount, with the other cities being very similar. New York, however, has a lot of its open space concentrated in Central Park, so you might imagine that the rest of the city doesn t have much. Figure 2: Bar chart of open space per resident by city 1.165 The statement given is true on average, but a statement that women score higher than men hides the fact that there is a lot of variability; the majority of both men and women score between 500 and 650, but it s hard to be more precise. The men s scores have a lower mean and a larger SD than the women s, which means that the lowest scores are overwhemingly likely to be men. Surprisingly (because the men s scores have higher variability), the extremely high scores might be either men or women. (Notice that the density curves for scores 700 and higher are about the same height.) If you want to do some calculations: 0.0043 of 7

the women will score less than 450, compared to 0.0183 of the men; 0.0068 of the women will score more than 700, compared to 0.0071 of the men. (This is the usual thing: figure out z s and look them up in the table.) This justifies the statement that the lowest scores are mostly men and the highest could be either men or women. The scores in the middle are all mixed up. 1.170 Two things you can do here: compare the mean and the median, and see whether the median is closer to Q1 or Q3 (or about the same distance from both). Here, the mean is quite a bit bigger than the median, and the median is closer to Q1 and further away from Q3. These both suggest that the distribution is right-skewed; there are a few women with large weights compared to the others. (b) How many hours is quantitative, so a histogram or stemplot (or boxplot). For the second part of (b), you have to figure out how you re going to measure this. One way is to have set times during the semester, such as week 1, before midterm, after midterm, week 10, before final and get the number of study hours in those weeks for your students. Then you could do side-by-side boxplots to compare the study times (since time during the semester is categorical). (c) Favourite radio station is a categorical variable, so bar chart (or maybe pie chart if you want to be able to say that 33% of students prefer to listen to radio station XXXX, which is the highest percentage of any radio station ). (d) To assess whether a normal distribution applies to a collection of measurements, you need a normal probability plot, which you then assess for straightness. 1.171 (a) Make is categorical, so a bar chart (or maybe a pie chart, if you are thinking of out of all the students, what fraction drive cars of a certain make? ). For how old, a quantitative variable, a histogram is the thing (or a stemplot, or even a boxplot). 8