STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Similar documents
STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

BIOL The Normal Distribution and the Central Limit Theorem

Normal Model (Part 1)

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Math 243 Lecture Notes

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Shifting and rescaling data distributions

Density curves. (James Madison University) February 4, / 20

Section3-2: Measures of Center

The Normal Distribution

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Describing Data: One Quantitative Variable

Statistics 511 Supplemental Materials

Applications of Data Dispersions

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Terms & Characteristics

Chapter 6. The Normal Probability Distributions

AP Stats ~ Lesson 6B: Transforming and Combining Random variables

Math 227 Elementary Statistics. Bluman 5 th edition

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

What was in the last lecture?

The Normal Distribution

3.1 Measures of Central Tendency

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Midterm Exam III Review

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

IOP 201-Q (Industrial Psychological Research) Tutorial 5

ECON 214 Elements of Statistics for Economists 2016/2017

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Lecture 9. Probability Distributions. Outline. Outline

CHAPTER 6 Random Variables

Chapter 3. Lecture 3 Sections

Lecture 9. Probability Distributions

1 Describing Distributions with numbers

STAB22 section 1.3 and Chapter 1 exercises

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

appstats5.notebook September 07, 2016 Chapter 5

5.1 Mean, Median, & Mode

The Normal Model The famous bell curve

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

Section 6.2 Transforming and Combining Random Variables. Linear Transformations

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

22.2 Shape, Center, and Spread

Statistics vs. statistics

Chapter 6: The Normal Distribution

ECON 214 Elements of Statistics for Economists

Chapter 6: The Normal Distribution

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Unit 2 Statistics of One Variable

Lecture 6: Normal distribution

LECTURE 6 DISTRIBUTIONS

Lecture 2 Describing Data

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Section Introduction to Normal Distributions

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Fall 2011 Exam Score: /75. Exam 3

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

Announcements. Unit 2: Probability and distributions Lecture 3: Normal distribution. Normal distribution. Heights of males

Math 140 Introductory Statistics. First midterm September

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

University of California, Los Angeles Department of Statistics. The central limit theorem The distribution of the sample mean

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

Examples of continuous probability distributions: The normal and standard normal

CHAPTER 6 Random Variables

Unit2: Probabilityanddistributions. 3. Normal distribution

Some estimates of the height of the podium

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

Section 7.4 Transforming and Combining Random Variables (DAY 1)

STOR 155 Practice Midterm 1 Fall 2009

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

3) Marital status of each member of a randomly selected group of adults is an example of what type of variable?

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Introduction to Statistics I

Chapter 5 The Standard Deviation as a Ruler and the Normal Model

Chapter 6: Random Variables

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

University of California, Los Angeles Department of Statistics

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Chapter 6: Random Variables

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 6 Normal Probability Distribution QMIS 120. Dr.

Lecture 6: Chapter 6

6.2.1 Linear Transformations

Chapter 15: Graphs, Charts, and Numbers Math 107

DATA SUMMARIZATION AND VISUALIZATION

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

I. Standard Error II. Standard Error III. Standard Error 2.54

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

DATA HANDLING Five-Number Summary

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Transcription:

STAT 203 - Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model In Chapter 5, we introduced a few measures of center and spread, and discussed how the mean and standard deviation are good summaries when the histogram (or distribution) is symmetric and unimodal. When it is not symmetric, we use the median and IQR as summaries, although for most of the course, we will deal with things that are approximately symmetric and unimodal. Understanding the idea of what a Standard Deviation is, is very important as almost all statistical methods rely on this, and we will see it come up again and again throughout the course (in all of statistics, actually). Recall: The SD can be thought of as a measure of the typical deviation from the mean. We will use the standard deviation as a unit of measurement... I will explain this Example: Suppose I were at a crucial point in my life where I was trying to decide what to do with it; pursue my education or a career as a professional golfer? Suppose that in high school my graduating average was 82%, and the mean graduating average is 67% with a standard deviation of 15%. Suppose that for golfing, I have a mean score of 77, and the mean score of a competitive golfer is typically 79 with a standard deviation of 2.1. Which one am I relatively better at? (Note: These are made up numbers!) If we just look at the values it is hard to compare the two. They are measured on different scales, with different units of measurement. A grade must be between 0 100, while a golf score almost never gets below 59. So how can we compare the two? The answer is to use the Standard Deviation as a measuring stick, as it summarizes the average /typical deviation from the mean. Essentially, we will want to find out how far each one is from its respective mean, in terms of its average deviation from the mean. 1

The High school grade is... above the mean grade, and... in terms of standard deviation, it is... standard deviations above the mean. The evaluation score is... above the mean score, and... in terms of standard deviation, it is... standard deviations above the mean. We can compare the two in terms of each of their own respective means and average deviation from the mean (or SD)... In your every day life you are essentially using statistical tools to make decisions, without even knowing it... In my opinion, statistics is simply a discipline that tries to take the way a person thinks about things and makes logical decisions based on what they observe in every day life, and formalize these into a set of objective rules. Adding or Multiplying each value by a constant: 1. Adding a constant (shifting) If we add a constant (c) to each observation in the data then: The measures of center (mean, median, midrange) will all have the constant (c) added to them, and so will the Quartiles. The measures of spread (variance, SD, range, IQR) will all remain the same. 2. Multiplying by a constant (scaling) If we multiply each observation by a constant (c), then: The measures of center (mean, median, midrange), and the measures of spread (SD, range, IQR) and the Quartiles will all be multiplied by the constant (c). The variance will be multiplied by c 2 In short, adding changes center, but not spread. Multiplying changes the spread and center. Multiplying by a constant is how we change measurement units (eg) Kg to lbs. 2

Standardizing (Z-scores): Question: How can we compare observations that were measured on different scales or from two different distributions? Answer: By summarizing how far away each of the observations is from the mean, in terms of its standard deviation (or average/typical deviation from the mean)! The Z-score summarizes how far a given observation (y i ) is from its mean (ȳ), in terms of it s SD (s). Z-score (Z)= Z = y i ȳ s difference between observation and mean Standard deviation Exercise: A flight from Vancouver to Toronto usually takes 4.5 hours with a SD of 15 minutes. If my last flight took 4 hours and 10 minutes, how far is this from the mean in standard units? When we Standardize, we are adding (actually subtracting) a constant from every observation, and then multiplying (actually dividing) every observation by a constant...check rules on last page If we let M = y i ȳ, then the mean of M is ȳ ȳ = 0, and the SD of M is unchanged. If we now let Z = M, then the mean of Z is the mean of M times the constant, which SD equals 0. The SD of Z is the SD of M times the constant, which is SD = 1. SD So, Z-scores have a mean of 0 and a SD of 1. A positive Z-score means that the observation is above the mean, and a negative one means its below it. The farther an observation is from the mean, the larger the Z-score will be in absolute value. 3

The Normal Model (Bell Curve, Normal Distribution): This is where we take a small step into the theoretical world of statistics. Many types of data one collects have a distribution that is bell shaped and roughly symmetric, and the Normal Model is appropriate for summarizing these (note that we are dealing with only quantitative variables here). (eg) weight, IQ scores,... Characteristics of Normal Model: 1. It is bell-shaped, unimodal, and perfectly symmetric about the mean (Ȳ or µ). 2. The spread of the distribution is determined by the standard deviation (s or σ). 3. This model is denoted by: N(µ, σ 2 ), where µ=mean, σ 2 =Variance, and σ is the SD. 4. The total area under the curve is 100% (just as the total area of the bars for a histogram is 100%) Theoretical Normal Models Porbability (%) 0.00 0.05 0.10 0.15 0.20 N(2, 36) N( 4, 9) N(2, 4) 15 10 5 0 5 10 15 Values Notes: For the Normal Model, we use (µ) for the mean instead of (ȳ), and (σ) for the SD instead of (s), why??? The (ȳ) and (s) are Sample Statistics; numerical summaries of the observed data. (sample) The (µ) and (σ) are Population Parameters; that specify the theoretical model. (population) 4

Standardized Values (for the Normal Model: ) Z = y µ σ When we standardize an observation from a Normal Model, the Z-score is N(0, 1). What we do is we use a theoretical Normal Model to describe the distribution of an observed variable. One must check the histogram to make sure that such a model is appropriate (symmetric and unimodal). We take the observed estimates of the mean and SD, and if a Normal Model seems appropriate, then we use the Normal Model (with the same mean and SD to approximate the observed data. We then standardize the value(s) of interest, so that we can use a Standard Normal variable (N(0, 1)). We can then answer questions such as: What proportion of males have weights above 190lbs? How many between 210 and 220? and so on... The 68-95-99.7 Rule: Approximately 68% of the data will be within +/ 1 SD of the mean. Approximately 95% of the data will be within +/ 2 SDs of the mean. Approximately 99.7% of the data will be within +/ 3 SDs of the mean. (eg) if a class has a mean grade of 70% and a SD of 5% and the grades are normally distributed, then approximately 68% of students will receive grades between 65-75%, approx. 95% will receive grades between 60-80%, and 99.7% between 55-85%. Let s Draw a Picture: 5

Finding Percentages Under the Normal Model: 1. Draw a Normal Model and label where the mean is. Then shade the area of interest. 2. Standardize the y-value(s) that are at the boundaries of the area of interest. 3. Use the Normal Table in Appendix E of the Text Book to find the area of the shaded region. Example: What is the area (probability) below a Z-score of Z = 1.52? What is the area (probability) between Z-scores of -1.23 and 1.23? Summary: 1. We estimate the mean and SD for our observed data. 2. Check if a Normal Model is appropriate (symmetric, unimodal) 3. If it is, then we standardize the values of interest. 4. Use the Normal Table to find the percentages we are interested in. (the Normal Model is HUGE in statistics, so make sure to practice many of these problems) 6

Exercises: 1. Suppose that math SAT scores follow the normal model. The past results of the math SAT exams show that males and females have mean scores of 500 and 455 and standard deviations of 100 and 120, respectively. Joe and Linda took the math SAT exam, and they both scored 620. (a) Compare their scores using the z-score. (b) What percentage of males score over 600 on the math SAT test? (c) What percentage of females score between 255 and 555 on the math SAT test? 2. Find the area under the Normal Model for the following Z-scores. (a) smaller than -1.10 (b) bigger than -1.10 (c) bigger than 2.15 (d) between 0 and 1.18 (e) between -1.10 and 1.62 (f) smaller than -4.50 (g) bigger than -6.34 3. Find the z-scores corresponding to the following percentiles: (a) 50 th (b) 70 th (c) 15 th 4. Suppose that scores on a standard IQ test approximately follow the normal model with mean µ = 110 and standard deviation σ = 25. (a) What percentage of people have IQ scores above 100? (b) What percentage have scores between 90 and 120? (c) Find the interquartile range for the IQ scores. 7

5. The length of human pregnancies from conception to birth varies according to a distribution that is approximately normal with mean 266 days and standard deviation 16 days. (a) Between what values do the lengths of the middle 95% of all pregnancies fall? Use the 68-95-99.7 rule to answer this question. (b) How short are the shortest 1% of all pregnancies? 8