STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Similar documents
STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

BIOL The Normal Distribution and the Central Limit Theorem

Normal Model (Part 1)

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Math 243 Lecture Notes

Shifting and rescaling data distributions

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Density curves. (James Madison University) February 4, / 20

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Section3-2: Measures of Center

The Normal Distribution

Statistics 511 Supplemental Materials

Describing Data: One Quantitative Variable

What was in the last lecture?

Applications of Data Dispersions

Terms & Characteristics

Chapter 6. The Normal Probability Distributions

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

3.1 Measures of Central Tendency

AP Stats ~ Lesson 6B: Transforming and Combining Random variables

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Midterm Exam III Review

Math 227 Elementary Statistics. Bluman 5 th edition

1 Describing Distributions with numbers

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

ECON 214 Elements of Statistics for Economists 2016/2017

Lecture 2 Describing Data

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

The Normal Distribution

CHAPTER 6 Random Variables

Lecture 9. Probability Distributions. Outline. Outline

Chapter 3. Lecture 3 Sections

STAB22 section 1.3 and Chapter 1 exercises

Lecture 9. Probability Distributions

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

appstats5.notebook September 07, 2016 Chapter 5

The Normal Model The famous bell curve

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

22.2 Shape, Center, and Spread

Statistics vs. statistics

ECON 214 Elements of Statistics for Economists

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Unit 2 Statistics of One Variable

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

LECTURE 6 DISTRIBUTIONS

Lecture 6: Normal distribution

5.1 Mean, Median, & Mode

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Section 6.2 Transforming and Combining Random Variables. Linear Transformations

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Announcements. Unit 2: Probability and distributions Lecture 3: Normal distribution. Normal distribution. Heights of males

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Math 140 Introductory Statistics. First midterm September

University of California, Los Angeles Department of Statistics. The central limit theorem The distribution of the sample mean

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Unit2: Probabilityanddistributions. 3. Normal distribution

Some estimates of the height of the podium

Section Introduction to Normal Distributions

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

DATA HANDLING Five-Number Summary

Section 7.4 Transforming and Combining Random Variables (DAY 1)

Descriptive Statistics (Devore Chapter One)

STOR 155 Practice Midterm 1 Fall 2009

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Introduction to Statistics I

3) Marital status of each member of a randomly selected group of adults is an example of what type of variable?

Chapter 5 The Standard Deviation as a Ruler and the Normal Model

Chapter 6: Random Variables

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

University of California, Los Angeles Department of Statistics

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 6: Random Variables

If the distribution of a random variable x is approximately normal, then

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 6 Normal Probability Distribution QMIS 120. Dr.

and µ Asian male > " men

Lecture 6: Chapter 6

Fall 2011 Exam Score: /75. Exam 3

6.2.1 Linear Transformations

Chapter 15: Graphs, Charts, and Numbers Math 107

Examples of continuous probability distributions: The normal and standard normal

DATA SUMMARIZATION AND VISUALIZATION

Basic Procedure for Histograms

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Chapter 6: The Normal Distribution

I. Standard Error II. Standard Error III. Standard Error 2.54

Chapter 6: The Normal Distribution

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Transcription:

STAT 203 - Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model In Chapter 5, we introduced a few measures of center and spread, and discussed how the mean and standard deviation are good summaries when the histogram (or distribution) is symmetric and unimodal. When it is not symmetric, we use the median and IQR as summaries, although for most of the course, we will deal with things that are approximately symmetric and unimodal. Understanding the idea of what a Standard Deviation is, is very important as many statistical methods rely on this, and we will see it come up again and again throughout the course. Recall: The SD can be thought of as a measure of the typical deviation from the mean. Example: I was at a crucial point in my life where I m trying to decide what to do with it; teach or research? For my Masters I graduated with a grade of 87%, and the mean Master s grade is 83% with a standard deviation of 5%. For my course evaluations I have a mean rating of 4.65 (out of 5), and the mean evaluation score is typically 3.5 with a standard deviation of 0.4. Which one am I relatively better at? (Note: These are made up numbers!) If we just look at the values it is hard to compare the two. 87 is 4 larger than 83, and 4.65 is only 1.15 larger than 3.5, but...87 is a lot bigger than 4.65 and you can only go 1.5 over the average of 3.5, and 17 over the average grade of 83 so...how do we compare the two? The answer is to use the Standard Deviation as a measuring stick, as it summarizes the average deviation from the mean. Essentially, we will want to find out how far each one is from its respective mean, in terms of its average deviation from the mean. 1

The Masters grade is 87 83 = 4% above the mean grade, and... 4% in terms of standard deviation, it is = 0.8 standard deviations above the mean. SD=5% The evaluation score is 4.65 3.5 = 1.15 above the mean score, and... 1.15 in terms of standard deviation, it is SD=0.4 mean. = 2.875 standard deviations above the In terms of each of their own respective means and average deviation from the mean (or SD), the evaluation scores are much higher above their own mean than the Masters grade. In your every day life you are essentially using statistical tools to make decisions, without even knowing it... In my opinion, statistics is simply a discipline that tries to take the way a person thinks about things and makes logical decisions based on what they observe in every day life, and formalize these into a set of objective rules. Adding or Multiplying each value by a constant: 1. Adding a constant (shifting) If we add a constant (c) to each observation in the data then: The measures of center (mean, median, midrange) will all have the constant (c) added to them, and so will the Quartiles. The measures of spread (variance, SD, range, IQR) will all remain the same. 2. Multiplying by a constant (scaling) If we multiply each observation by a constant (c), then: The measures of center (mean, median, midrange), and the measures of spread (SD, range, IQR) and the Quartiles will all be multiplied by the constant (c). The variance will be multiplied by c 2 In short, adding changes center, but not spread. Multiplying changes the spread and center. Multiplying by a constant is how we change measurement units (eg) Kg to lbs. 2

Standardizing (Z-scores): Question: How can we compare observations that were measured on different scales or from two different distributions? Answer: By summarizing how far away each of the observations is from the mean, in terms of its standard deviation (or average/typical deviation from the mean)! The Z-score summarizes how far a given observation (y i ) is from its mean (ȳ), in terms of it s SD (s). Z-score (Z)= Z = y i ȳ s difference between observation and mean Standard deviation Exercise: A flight from Vancouver to Toronto usually takes 4.5 hours with a SD of 15 minutes. If my last flight took 4 hours and 10 minutes, how far is this from the mean in standard units? When we Standardize, we are adding (actually subtracting) a constant from every observation, and then multiplying (actually dividing) every observation by a constant...check rules on last page If we let M = y i ȳ, then the mean of M is ȳ ȳ = 0, and the SD of M is unchanged. If we now let Z = M, then the mean of Z is the mean of M times the constant, which SD equals 0. The SD of Z is the SD of M times the constant, which is SD = 1. SD So, Z-scores have a mean of 0 and a SD of 1. A positive Z-score means that the observation is above the mean, and a negative one means its below it. The farther an observation is from the mean, the larger the Z-score will be in absolute value. 3

The Normal Model (Bell Curve, Normal Distribution): This is where we take a small step into the theoretical world of statistics. Many types of data one collects have a distribution that is bell shaped and roughly symmetric, and the Normal Model is appropriate for summarizing these (note that we are dealing with only quantitative variables here). (eg) weight, IQ scores,... Characteristics of Normal Model: 1. It is bell-shaped, unimodal, and perfectly symmetric about the mean (Ȳ or µ). 2. The spread of the distribution is determined by the standard deviation (s or σ). 3. This model is denoted by: N(µ, σ 2 ), where µ=mean, σ 2 =Variance, and σ is the SD. 4. The total area under the curve is 100% (just as the total area of the bars for a histogram is 100%) Theoretical Normal Models Porbability (%) 0.00 0.05 0.10 0.15 0.20 N(2, 36) N( 4, 9) N(2, 4) 15 10 5 0 5 10 15 Values Notes: For the Normal Model, we use (µ) for the mean instead of (ȳ), and (σ) for the SD instead of (s), why??? The (ȳ) and (s) are Sample Statistics; numerical summaries of the observed data. (sample) The (µ) and (σ) are Population Parameters; that specify the theoretical model. (population) 4

Standardized Values (for the Normal Model: ) Z = y µ σ When we standardize an observation from a Normal Model, the Z-score is N(0, 1). What we do is we use a theoretical Normal Model to describe the distribution of an observed variable. One must check the histogram to make sure that such a model is appropriate (symmetric and unimodal). We take the observed estimates of the mean and SD, and if a Normal Model seems appropriate, then we use the Normal Model (with the same mean and SD to approximate the observed data. We then standardize the value(s) of interest, so that we can use a Standard Normal variable (N(0, 1)). We can then answer questions such as: What proportion of males have weights above 190lbs? How many between 210 and 220? and so on... The 68-95-99.7 Rule: Approximately 68% of the data will be within +/ 1 SD of the mean. Approximately 95% of the data will be within +/ 2 SDs of the mean. Approximately 99.7% of the data will be within +/ 3 SDs of the mean. (eg) if a class has a mean grade of 70% and a SD of 5% and the grades are normally distributed, then approximately 68% of students will receive grades between 65-75%, approx. 95% will receive grades between 60-80%, and 99.7% between 55-85%. Let s Draw a Picture: 5

Finding Percentages Under the Normal Model: 1. Draw a Normal Model and label where the mean is. Then shade the area of interest. 2. Standardize the y-value(s) that are at the boundaries of the area of interest. 3. Use the Normal Table in Appendix E of the Text Book to find the area of the shaded region. Example: What is the area (probability) below a Z-score of Z = 1.5? What is the area (probability) between Z-scores of -1.2 and 1.2? Summary: 1. We estimate the mean and SD for our observed data. 2. Check if a Normal Model is appropriate (symmetric, unimodal) 3. If it is, then we standardize the values of interest. 4. Use the Normal Table to find the percentages we are interested in. (the Normal Model is commonly used in statistics, so make sure to practice many of these problems) 6

Exercises: 1. Suppose that math SAT scores follow the normal model. The past results of the math SAT exams show that males and females have mean scores of 500 and 455 and standard deviations of 100 and 120, respectively. Chris and Kim took the math SAT exam, and they both scored 620. (a) Compare their scores using the z-score. (b) What percentage of males score over 600 on the math SAT test? (c) What percentage of females score between 255 and 555 on the math SAT test? 2. Find the area under the Normal Model for the following Z-scores. (a) smaller than -1.10 (b) bigger than -1.10 (c) bigger than 2.15 (d) between 0 and 1.18 (e) between -1.10 and 1.62 (f) smaller than -4.50 (g) bigger than -6.34 3. Find the z-scores corresponding to the following percentiles: (a) 50 th (b) 70 th (c) 15 th 4. Suppose that scores on a standard IQ test approximately follow the normal model with mean µ = 110 and standard deviation σ = 25. (a) What percentage of people have IQ scores above 100? (b) What percentage have scores between 90 and 120? (c) Find the interquartile range for the IQ scores. 7

5. The length of human pregnancies from conception to birth varies according to a distribution that is approximately normal with mean 266 days and standard deviation 16 days. (a) Between what values do the lengths of the middle 95% of all pregnancies fall? Use the 68-95-99.7 rule to answer this question. (b) How short are the shortest 1% of all pregnancies? 8