Lecture 5 - Continuous Distributions

Similar documents
Announcements. Unit 2: Probability and distributions Lecture 3: Normal distribution. Normal distribution. Heights of males

Lecture 6: Normal distribution

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

Unit2: Probabilityanddistributions. 3. Normal distribution

Chapter 3: Distributions of Random Variables

Chapter 3: Distributions of Random Variables

LECTURE 6 DISTRIBUTIONS

Announcements. Data resources: Data and GIS Services. Project. Lab 3a due tomorrow at 6 PM Project Proposal. Nicole Dalzell.

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Introduction to Statistics I

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

ECON 214 Elements of Statistics for Economists

ECON 214 Elements of Statistics for Economists 2016/2017

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Lecture 9 - Sampling Distributions and the CLT

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Chapter 6. The Normal Probability Distributions

The Normal Distribution

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Lecture 9. Probability Distributions. Outline. Outline

Lecture 9. Probability Distributions

Nicole Dalzell. July 7, 2014

Density curves. (James Madison University) February 4, / 20

Distributions of random variables

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

Introduction to Business Statistics QM 120 Chapter 6

Math 227 Elementary Statistics. Bluman 5 th edition

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Distributions of random variables

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Business Statistics 41000: Probability 3

Business Statistics 41000: Probability 4

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

Examples of continuous probability distributions: The normal and standard normal

Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 6 Normal Probability Distribution QMIS 120. Dr.

Lecture 6: Chapter 6

Statistics for Business and Economics: Random Variables:Continuous

Shifting and rescaling data distributions

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

The Normal Distribution

Statistics, Measures of Central Tendency I

Statistical Methods in Practice STAT/MATH 3379

Statistics 511 Supplemental Materials

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

CH 5 Normal Probability Distributions Properties of the Normal Distribution

The Normal Distribution. (Ch 4.3)

The normal distribution is a theoretical model derived mathematically and not empirically.

University of California, Los Angeles Department of Statistics. Normal distribution

Chapter 4. The Normal Distribution

Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

The topics in this section are related and necessary topics for both course objectives.

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Chapter ! Bell Shaped

Standard Normal, Inverse Normal and Sampling Distributions

Prob and Stats, Nov 7

STAB22 section 1.3 and Chapter 1 exercises

Chapter 7 1. Random Variables

On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2

Chapter 4 Continuous Random Variables and Probability Distributions

The Normal Probability Distribution

Chapter Seven. The Normal Distribution

2011 Pearson Education, Inc

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Statistics and Probability

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT 157 HW1 Solutions

Probability. An intro for calculus students P= Figure 1: A normal integral

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Chapter 6: Random Variables and Probability Distributions

MTH 245: Mathematics for Management, Life, and Social Sciences

Expected Value of a Random Variable

Statistics for Business and Economics

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Statistics, Their Distributions, and the Central Limit Theorem

Discrete Random Variables

Normal Model (Part 1)

Chapter 4 Continuous Random Variables and Probability Distributions

What was in the last lecture?

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Normal Probability Distributions

Continuous random variables

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Basic Procedure for Histograms

7 THE CENTRAL LIMIT THEOREM

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Data Analysis and Statistical Methods Statistics 651

One sample z-test and t-test

Transcription:

Lecture 5 - Continuous Distributions Statistics 102 Colin Rundel January 30, 2013

Announcements Announcements HW1 and Lab 1 have been graded and your scores are posted in Gradebook on Sakai (it is good practice to always double check your scores). You should have picked up Lab 1 yesterday, HW1 will be passed back in class today. Any questions about grading should be directed to me and not the TAs - regrade requests need to be made in writing. I will hold on to any unclaimed assignments, come to office hours to pick them up. Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 2 / 30

Types of Distributions Discrete Probability Distributions A discrete probability distribution lists all possible events and the probabilities with which they occur. Rules for probability distributions: 1 The events listed must be disjoint 2 Each probability must be between 0 and 1 3 The probabilities must total 1 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 3 / 30

Types of Distributions Continuous Probability Distributions A continuous probability distribution differs from a discrete probability distribution in several ways. The probability that a continuous RV will equal a specific value is zero. As such they cannot be expressed in tabular form. Instead, we use an equation or formula to describe its distribution (probability density function). We can calculate probability for ranges of values (area under the curve). Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 4 / 30

Normal distribution Normal distribution Unimodal and symmetric, bell shaped curve Most variables are nearly normal, but none are exactly normal Denoted as N(µ, σ) Normal with mean µ and standard deviation σ 1 Curve given by the equation - [ σ exp 1 ( x µ ) ] 2 2π 2 σ Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 5 / 30

Normal distribution Heights of males The male heights on OkCupid very nearly follow the expected normal distribution except the whole thing is shifted to the right of where it should be. Almost universally guys like to add a couple inches. You can also see a more subtle vanity at work: starting at roughly 5 8, the top of the dotted curve tilts even further rightward. This means that guys as they get closer to six feet round up a bit more than usual, stretching for that coveted psychological benchmark. http:// blog.okcupid.com/ index.php/ the-biggest-lies-in-online-dating/ Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 6 / 30

Normal distribution Heights of females When we looked into the data for women, we were surprised to see height exaggeration was just as widespread, though without the lurch towards a benchmark height. http:// blog.okcupid.com/ index.php/ the-biggest-lies-in-online-dating/ Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 7 / 30

Normal distribution Normal distribution model Normal distributions with different parameters µ: mean, σ: standard deviation N(µ = 0, σ = 1) N(µ = 19, σ = 4) -3-2 -1 0 1 2 3 7 11 15 19 23 27 31 0 10 20 30 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 8 / 30

Normal distribution 68-95-99.7 Rule 68-95-99.7 Rule For nearly normally distributed data, about 68% falls within 1 SD of the mean, about 95% falls within 2 SD of the mean, about 99.7% falls within 3 SD of the mean. It is possible for observations to fall 4, 5, or more standard deviations away from the mean, but these occurrences are very rare if the data are nearly normal. 68% 95% 99.7% µ 3σ µ 2σ µ σ µ µ + σ µ + 2σ µ + 3σ Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 9 / 30

Normal distribution 68-95-99.7 Rule Describing variability using the 68-95-99.7 Rule SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. 68% of students score between 1200 and 1800 on the SAT. 95% of students score between 900 and 2100 on the SAT. 99.7% of students score between 600 and 2400 on the SAT. 68% 95% 99.7% 600 900 1200 1500 1800 2100 2400 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 10 / 30

Normal distribution 68-95-99.7 Rule Number of hours of sleep on school nights 0 10 30 50 70 98 % 95 % 75 % 4 5 6 7 8 9 10 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 11 / 30

Normal distribution Standardizing with Z scores Comparing SAT and ACT SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. ACT scores are distributed nearly normally with mean 21 and standard deviation 5. A college admissions officer wants to determine which of the two applicants scored better on their standardized test with respect to the other test takers: Pam, who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT? Pam Jim 600 900 1200 1500 1800 2100 2400 6 11 16 21 26 31 36 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 12 / 30

Normal distribution Standardizing with Z scores Standardizing with Z scores Since we cannot just compare these two raw scores, we instead compare how many standard deviations beyond the mean each observation is. Pam s score is 1800 1500 300 = 1 standard deviation above the mean. Jim s score is 24 21 5 = 0.6 standard deviations above the mean. Jim Pam 2 1 0 1 2 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 13 / 30

Normal distribution Standardizing with Z scores Standardizing with Z scores (cont.) These are called standardized scores, or Z scores. Z score of an observation is the number of standard deviations it falls above or below the mean. observation mean Z = SD Z scores are defined for distributions of any shape, but only when the distribution is normal can we use Z scores to calculate percentiles. Observations that are more than 2 SD away from the mean ( Z > 2) are usually considered unusual. Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 14 / 30

Normal distribution Standardizing with Z scores Z distribution Another reason we use Z scores is if the distribution of X is nearly normal then the Z scores of X will have a Z distribution. Z distribution is a special case of the normal distribution where µ = 0 and σ = 1 (unit normal distribution) Given that a linear transformation of a normally distributed random variable will also be normally distributed then we can easily show that Z N(0, 1) ( ) X µ E(Z) = E = E(X /σ) µ/σ = µ/σ µ/σ = 0 σ ( ) X µ Var(Z) = Var = Var(X /σ) = 1 σ σ 2 Var(X ) = 1 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 15 / 30

Normal distribution Standardizing with Z scores Percentiles Percentile is the percentage of observations that fall below a given data point. Graphically, percentile is the area below the probability distribution curve to the left of that observation. 600 900 1200 1500 1800 2100 2400 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 16 / 30

Normal distribution Standardizing with Z scores Example - SAT Approximately what percent of students score below 1800 on the SAT? 600 900 1200 1500 1800 2100 2400 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 17 / 30

Normal distribution Calculating percentiles and probabilities Calculating percentiles There are many ways to compute percentiles/areas under the curve: R: pnorm(1800, mean = 1500, sd = 300)) Applet: http:// www.socr.ucla.edu/ htmls/ SOCR Distributions.html Calculus: [ 1800 1 P(X 1800) = 300 2π exp 1 ( ) ] 2 x 1500 dx 2 300 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 18 / 30

Normal distribution Calculating percentiles and probabilities Calculating Exact Percentiles Second decimal place of Z Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 19 / 30

Evaluating nearly normalness Normal probability plot A histogram and normal probability plot of a sample of 100 male heights. Male heights (inches) 60 65 70 75 80 Theoretical Quantiles male heights (in.) 2 1 0 1 2 65 70 75 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 20 / 30

Evaluating nearly normalness Anatomy of a normal probability plot Data are plotted on the y-axis of a normal probability plot, and theoretical quantiles (following a normal distribution) on the x-axis. If there is a one-to-one relationship between the data and the theoretical quantiles, then the data follow a nearly normal distribution. Since a one-to-one relationship would appear as a straight line on a scatter plot, the closer the points are to a perfect straight line, the more confident we can be that the data follow the normal model. Constructing a normal probability plot requires calculating percentiles and corresponding z-scores for each observation, which is tedious. Therefore we generally rely on software when making these plots. Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 21 / 30

Evaluating nearly normalness Constructing a normal probability plot 2.2 Constructing a normal probability plot (special topic) e construct a normal probability plot for the heights of a sample of 100 men as follows We construct a normal probability plot for the heights of a sample of 100 1) men Order asthe follows: observations. 2) Determine 1 Order the percentile observations. of each observation in the ordered data set. 2 3) Identify Determine the Z score the percentile corresponding of each to each observation percentile. in the ordered data set. 3 4) Create Identify a scatterplot the Z score of thecorresponding observations (vertical) to each against percentile. the Z scores (horizontal) 4 Create a scatterplot of the observations (vertical) against the Z scores the observations (horizontal) are normally distributed, then their Z scores will approximately corr ond to their percentiles and thus to the z i in Table 3.16. Observation i 1 2 3 100 x i 61 63 63 78 Percentile 0.99% 1.98% 2.97% 99.01% z i 2.33 2.06 1.89 2.33 Table 3.16: Construction details for a normal probability plot of 100 men s heights. The first observation is assumed to be at the 0.99 th percentile, and the z i corresponding to a lower tail of 0.0099 is 2.33. To create the plot Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 22 / 30

Evaluating nearly normalness Example - NBA Height Below is a histogram and normal probability plot for the heights of NBA from the 2008-2009 season. Do these data appear to follow a normal distribution? Height (inches) 70 75 80 85 90 Theoretical quantiles 3 2 1 0 1 2 3 70 75 80 85 90 Why do the points on the normal probability have jumps? Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 23 / 30

Evaluating nearly normalness Normal probability plot and skewness Right Skew - If the plotted points appear to bend up and to the left of the normal line that indicates a long tail to the right. Left Skew - If the plotted points bend down and to the right of the normal line that indicates a long tail to the left. Short Tails - An S shaped-curve indicates shorter than normal tails, i.e. narrower than expected. Long Tails - A curve which starts below the normal line, bends to follow it, and ends above it indicates long tails. That is, you are seeing more variance than you would expect in a normal distribution, i.e. wider than expected. Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 24 / 30

Examples Normal probability and Quality control Six sigma The term six sigma process comes from the notion that if one has six standard deviations between the process mean and the nearest specification limit, as shown in the graph, practically no items will fail to meet specifications. http:// en.wikipedia.org/ wiki/ Six Sigma Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 25 / 30

Examples Normal probability and Quality control Example - Dosage At a pharmaceutical factory the amount of the active ingredient which is added to each pill is supposed to be 36 mg. The amount of the active ingredient added follows a nearly normal distribution with a standard deviation of 0.11 mg. Once every 30 minutes a pill is selected from the production line, and its composition is measured precisely. If the amount of the active ingredient in the pill is below 35.8 mg or above 36.2 mg, then that production run of pills fails the quality control inspection. What percent of pills have less than 35.8 mg of the active ingredient? Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 26 / 30

Examples Normal probability and Quality control Finding the exact probability Second decimal place of Z 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 Z 0.0014 0.0014 0.0015 0.0015 0.0016 0.0016 0.0017 0.0018 0.0018 0.0019 2.9 0.0019 0.0020 0.0021 0.0021 0.0022 0.0023 0.0023 0.0024 0.0025 0.0026 2.8 0.0026 0.0027 0.0028 0.0029 0.0030 0.0031 0.0032 0.0033 0.0034 0.0035 2.7 0.0036 0.0037 0.0038 0.0039 0.0040 0.0041 0.0043 0.0044 0.0045 0.0047 2.6 0.0048 0.0049 0.0051 0.0052 0.0054 0.0055 0.0057 0.0059 0.0060 0.0062 2.5 0.0064 0.0066 0.0068 0.0069 0.0071 0.0073 0.0075 0.0078 0.0080 0.0082 2.4 0.0084 0.0087 0.0089 0.0091 0.0094 0.0096 0.0099 0.0102 0.0104 0.0107 2.3 0.0110 0.0113 0.0116 0.0119 0.0122 0.0125 0.0129 0.0132 0.0136 0.0139 2.2 0.0143 0.0146 0.0150 0.0154 0.0158 0.0162 0.0166 0.0170 0.0174 0.0179 2.1 0.0183 0.0188 0.0192 0.0197 0.0202 0.0207 0.0212 0.0217 0.0222 0.0228 2.0 0.0233 0.0239 0.0244 0.0250 0.0256 0.0262 0.0268 0.0274 0.0281 0.0287 1.9 0.0294 0.0301 0.0307 0.0314 0.0322 0.0329 0.0336 0.0344 0.0351 0.0359 1.8 0.0367 0.0375 0.0384 0.0392 0.0401 0.0409 0.0418 0.0427 0.0436 0.0446 1.7 0.0455 0.0465 0.0475 0.0485 0.0495 0.0505 0.0516 0.0526 0.0537 0.0548 1.6 0.0559 0.0571 0.0582 0.0594 0.0606 0.0618 0.0630 0.0643 0.0655 0.0668 1.5 Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 27 / 30

Examples Normal probability and Quality control Example - Dosage pt. 2 At the same pharmaceutical factory (µ = 36 oz and σ = 0.11 oz). What percent of production runs pass the quality control inspection (between 35.8 and 36.2 mg of active ingredient in the tested pill)? Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 28 / 30

Examples Finding cutoff points Example - Body Temperature Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the lowest 3% of human body temperatures? 0.03 0.09 0.08 0.07 0.06 0.05 Z 0.0233 0.0239 0.0244 0.0250 0.0256 1.9 0.0294 0.0301 0.0307 0.0314 0.0322 1.8 0.0367 0.0375 0.0384 0.0392 0.0401 1.7? 98.2 P(X < x) = 0.03 P(Z < -1.88) = 0.03 obs mean Z = x 98.2 = 1.88 SD 0.73 x = ( 1.88 0.73) + 98.2 = 96.8 Mackowiak, Wasserman, and Levine (1992) Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 29 / 30

Examples Finding cutoff points Example - Body Temperature pt. 2 What is the cutoff for the highest 10% of human body temperatures? Statistics 102 (Colin Rundel) Lec 5 January 30, 2013 30 / 30