Variance, Standard Deviation Counting Techniques

Similar documents
Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

1 Describing Distributions with numbers

Some estimates of the height of the podium

Section3-2: Measures of Center

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Empirical Rule (P148)

4.2 Probability Distributions

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Midterm Exam III Review

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

DATA SUMMARIZATION AND VISUALIZATION

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

2 Exploring Univariate Data

Frequency Distribution and Summary Statistics

Basic Procedure for Histograms

Math 140 Introductory Statistics. First midterm September

3.1 Measures of Central Tendency

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

AP Statistics Chapter 6 - Random Variables

Section 6-1 : Numerical Summaries

STAT 113 Variability

Standard Normal, Inverse Normal and Sampling Distributions

DATA HANDLING Five-Number Summary

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

CHAPTER 2 Describing Data: Numerical

Numerical Descriptions of Data

Applications of Data Dispersions

Statistical Methods in Practice STAT/MATH 3379

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

appstats5.notebook September 07, 2016 Chapter 5

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Describing Data: One Quantitative Variable

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Normal Model (Part 1)

Lecture 1: Review and Exploratory Data Analysis (EDA)

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

MAS187/AEF258. University of Newcastle upon Tyne

Descriptive Statistics (Devore Chapter One)

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Lecture 9. Probability Distributions. Outline. Outline

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Description of Data I

Lecture 2 Describing Data

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Diploma in Financial Management with Public Finance

Chapter 4-Describing Data: Displaying and Exploring Data

CHAPTER 6 Random Variables

Lecture 9. Probability Distributions

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Test Bank Elementary Statistics 2nd Edition William Navidi

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Chapter 3. Lecture 3 Sections

6.2.1 Linear Transformations

Chapter 4-Describing Data: Displaying and Exploring Data

Edexcel past paper questions

6.1 Discrete & Continuous Random Variables. Nov 4 6:53 PM. Objectives

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Descriptive Statistics

Simple Random Sample

Exam 1 Review. 1) Identify the population being studied. The heights of 14 out of the 31 cucumber plants at Mr. Lonardo's greenhouse.

Math146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39

Chapter 7 Study Guide: The Central Limit Theorem

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

MAKING SENSE OF DATA Essentials series

2 DESCRIPTIVE STATISTICS

5.3 Standard Deviation

Measures of Variability

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Section Introduction to Normal Distributions

Inverse Normal Distribution and Approximation to Binomial

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

( ) P = = =

Counting Basics. Venn diagrams

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

NOTES: Chapter 4 Describing Data

A.REPRESENTATION OF DATA

Section 6.2 Transforming and Combining Random Variables. Linear Transformations

Some Characteristics of Data

Simple Descriptive Statistics

Part V - Chance Variability

POLI 300 PROBLEM SET #7 due 11/08/10 MEASURES OF DISPERSION AND THE NORMAL DISTRIBUTION

22.2 Shape, Center, and Spread

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

CHAPTER 6 Random Variables

SECTION 6.2 (DAY 1) TRANSFORMING RANDOM VARIABLES NOVEMBER 16 TH, 2017

SAMPLE. HSC formula sheet. Sphere V = 4 πr. Volume. A area of base

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Exploratory Data Analysis

Chapter 6: Random Variables

MSM Course 1 Flashcards. Associative Property. base (in numeration) Commutative Property. Distributive Property. Chapter 1 (p.

Transcription:

Variance, Standard Deviation Counting Techniques Section 1.3 & 2.1 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston 1 / 52

Outline 1 Quartiles 2 The 1.5IQR Rule 3 Understanding Standard Deviation 4 Calculating The Standard Deviation 5 Coefficient of Variation 6 Counting Techniques 7 Permutations 8 Combinations 2 / 52

Types of Measurements for the Spread Range Percentiles Quartiles IQR; Interquartile range Variance Standard deviation Coefficient of Variation 3 / 52

The Quartiles The first quartile is 25th percentile, Q 1. The second quartile is the median and the 50th percentile, Q 2. The third quartile is the 75th percentile, Q 3. 4 / 52

Interquartile Range Interquartile range, IQR, is the difference between Q 3 and Q 1 IQR = Q 3 Q 1 5 / 52

Example Twelve babies spoke for the first time at the following ages (in months): 8 9 10 11 12 13 15 15 18 20 20 26 Find Q 1, Q 2, Q 3, the range and the IQR. 6 / 52

Find the Five Number Summary of the Course Scores > stem(grades$score,scale=0.5) The decimal point is 1 digit(s) to the right of the 0 827 2 2 4 0391 6 78825 8 0134445701238 10 1114 7 / 52

Detecting Outliers: 1.5IQR Rule An outlier is an observation that is "distant" from the rest of the data. Outliers can occur by chance or by measurement errors. Any point that falls outside the interval calculated by Q 1 1.5(IQR) and Q 3 + 1.5(IQR) is considered an outlier. athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 8 / 52

Outliers for Basketball Shoe Prices? Recall: Q 1 = 130, Q 3 = 215, So IQR = 215-130 = 85. Q 1 1.5(IQR) = 130 1.5(85) = 2.5 Q 3 + 1.5(IQR) = 215 + 1.5(85) = 342.5 Any price that is below $2.50 or above $342.50 is considered an outlier. athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 9 / 52

Outliers? The following is information from 91 pairs of basketball shoes: > fivenum(shoes$price) [1] 40 75 90 120 250 The highest four numbers in the dataset is..., 170, 225, 250, 250. Are there any prices that are considered an outlier? 10 / 52

A Graph of the Five Number Summary: Boxplot A central box spans the quartiles. A line inside the box marks the median. Lines extend from the box out to the smallest and largest observations. Asterisks represents any values that are considered to be outliers. Boxplots are most useful for side-by-side comparison of several distributions. Rcode: boxplot(dataset name$variable name) 11 / 52

Boxplot of Prices 50 100 150 200 250 boxplot(shoes$price,horizontal = T) 12 / 52

Boxplot of Course Scores 20 40 60 80 100 13 / 52

Boxplot of Course Scores by Session Fal15 Sp16 Sum16 20 40 60 80 100 boxplot(grades$score~grades$session,horizontal=true) 14 / 52

Question about the Graphs Given the first type of plot indicated in each pair, which of the second plots could not always be generated from it? a) dot plot, histogram b) stem and leaf, dot plot c) histogram, stem and leaf d) dot plot, box plot 15 / 52

Measuring Spread: The Standard Deviation Measures spread by looking at how far the observations are from their mean. Most common numerical description for the spread of a distribution. A larger standard deviation implies that the values have a wider spread from the mean. Denoted s when used with a sample. This is the one we calculate from a list of values. Denoted σ when used with a population. This is the "idealized" standard deviation. The standard deviation has the same units of measurements as the original observations. 16 / 52

Definition of the Standard Deviation The standard deviation is the average distance each observation is from the mean. Using this list of values from a sample: 3, 3, 9, 15, 15 The mean is 9. By definition, the average distance each of these values are from the mean is 6. So the standard deviation is 6. 17 / 52

Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 18 / 52

Adding or Subtracting a Value to the Observations Adding or subtracting the same value to all the original observations does not change the standard deviation of the list. Using this list of values: 3, 3, 9, 15, 15 mean = 9, standard deviation = 6. If we add 4 to all the values: 7, 7, 13, 19, 19 mean = 13, standard deviation = 6 19 / 52

Multiplying or Dividing a Value to the Observations Multiplying or dividing the same value to all the original observations will change the standard deviation by that factor. Using this list of values: 3, 3, 9, 15, 15: mean = 9, standard deviation = 6. If we double all the values: 6, 6, 18, 30, 30 mean = 18, standard deviation = 12 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 20 / 52

Population Variance and Standard Deviation If N is the number of values in a population with mean mu, and x i represents each individual in the population, the the population variance is found by: σ 2 = N i=1 (x i µ) 2 N and the population standard deviation is the square root, σ = σ 2. 21 / 52

Sample Variance and Standard Deviation Most of the time we are working with a sample instead of a population. So the sample variance is found by: s 2 = n i=1 (x i x) 2 n 1 and the sample standard deviation is the square root, s = s 2. Where n is the number of observations (samples), x i is the value for the i th observation and x is the sample mean. 22 / 52

Calculating the Standard Deviation By Hand When calculating by hand we will calculate s. 1. Find the mean of the observations x. 2. Calculate the difference between the observations and the mean for each observation x i x. This is called the deviations of the observations. 3. Square the deviations for each observation (x i x) 2. 4. Add up the squared deviations together n i=1 (x i x) 2. 5. Divide the sum of the squared deviations by one less than the number of observations n 1. This is the variance s 2 = 1 n 1 n (x i x) 2 i=1 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 23 / 52

Step 6: Standard Deviation 6. Find the square root of the variance. This is the standard deviation s = 1 n (x i x) n 1 2 i=1 24 / 52

Example: Section A Determine the sample standard deviation of the test scores for Section A. Section A Scores (X i ) 65 66 67 68 71 73 74 77 77 77 25 / 52

Step 1: Calculate the Mean The sample mean is x = 71.5. 26 / 52

Use Table To Calculate Standard Deviation Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 66 67 68 71 73 74 77 77 77 sum 27 / 52

Step 2: Calculate Deviations For All Values Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 65 71.5 = 6.5 66 66 71.5 = 5.5 67 67 71.5 = 4.5 68 68 71.5 = 3.5 71 71 71.5 = 0.5 73 73 71.5 = 1.5 74 74 71.5 = 2.5 77 77 71.5 = 5.5 77 77 71.5 = 5.5 77 77 71.5 = 5.5 sum 28 / 52

Step 3: Calculate Squared Deviations Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 65 71.5 = 6.5 ( 6.5) 2 = 42.25 66 66 71.5 = 5.5 ( 5.5) 2 = 30.25 67 67 71.5 = 4.5 ( 4.5) 2 = 20.25 68 68 71.5 = 3.5 ( 3.5) 2 = 12.25 71 71 71.5 = 0.5 ( 0.5) 2 = 0.25 73 73 71.5 = 1.5 1.5 2 = 2.25 74 74 71.5 = 2.5 2.5 2 = 6.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 sum athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 29 / 52

Step 4: Calculate the Sum of the Squared Deviations Variable Deviations Deviations Squared Score(X i ) X i X (X i X) 2 65 65 71.5 = 6.5 ( 6.5) 2 = 42.25 66 66 71.5 = 5.5 ( 5.5) 2 = 30.25 67 67 71.5 = 4.5 ( 4.5) 2 = 20.25 68 68 71.5 = 3.5 ( 3.5) 2 = 12.25 71 71 71.5 = 0.5 ( 0.5) 2 = 0.25 73 73 71.5 = 1.5 1.5 2 = 2.25 74 74 71.5 = 2.5 2.5 2 = 6.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 sum n i=1 (X i X) 2 = 204.5 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 30 / 52

Step 5: Calculate the Variance variance = s 2 = 1 n 1 n (x i x) 2 i=1 = 204.5 9 = 22.7222 31 / 52

Step 6: Take the Square Root of the Variance standard deviation = s = 1 n 1 = 22.7222 = 4.77 n (x i x) 2 i=1 32 / 52

Sample Standard Deviation of Section A test scores Sample standard deviation is s = 4.77. This implies that from the sample of the 10 students from section A the tests scores has a spread, on average, of 4.77 points from the mean of 71.50 points. athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 33 / 52

Example A statistics teacher wants to decide whether or not to curve an exam. From her class of 300 students, she chose a sample of 10 students and their grade were: 72, 88, 85, 81, 60, 54, 70, 72, 63, 43 Determine the sample mean. What is the variance? What is the standard deviation? athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 34 / 52

Add 10 Suppose the statistics instructor decides to curve the grade by adding 10 points to each score. What is the new mean, variance and standard deviation? 35 / 52

Multiply by 2 For the following dataset the mean is x = 4.5, the variance is s 2 = 3.5 and the standard deviation is s = 1.870829. 3, 6, 2, 7, 4, 5 Now, multiply each value by 2. What is the new variance and the new standard deviation? 36 / 52

Calculating Standard Deviation For larger data sets use a calculator or computer software. Each calculator is different if you cannot determine how to compute standard deviation from your calculator ask your instructor. For this course we will be using R as the software. The function for the sample standard deviation in R is sd(data name$variable name). athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 37 / 52

Coefficient of Variation This is to compare the variation between two groups. The coefficient of variation (cv) is the ratio of the standard deviation to the mean. cv = sd mean A smaller ratio will indicate less variation in the data. 38 / 52

CV of test scores Section A Section B Sample Size 10 10 Sample Mean 71.5 71.5 Sample Standard Deviation 4.770 18.22 4.77 CV 71.5 = 0.066 18.22 71.5 = 0.2548 39 / 52

CV Example The following statistics were collected on two different groups of stock prices: Portfolio A Portfolio B Sample size 10 15 Sample mean $52.65 $49.80 Sample standard deviation $6.50 $2.95 What can be said about the variability of each portfolio? 40 / 52

Beginning Example In the city of Milford, applications for zoning changes go through a two-step process: 1. A review by the panning commission. 2. A final decision by the city council. At step 1 the planning commission reviews the zoning change request and makes a positive or negative recommendation concerning the change. At step 2 the city council reviews the planning commission s recommendation and then votes to approve or to disapprove the zoning change. How many possible decisions can be made for a zoning change in Milford? 41 / 52

Counting Rules If an experiment can be described as a sequence of k steps with n 1 possible outcomes on the first step, n 2 possible outcomes on the second step, and so on, then the total number of experimental outcomes is given by (n 1 )(n 2 )... (n k ). A tree diagram can be used as a graphical representation in visualizing a multiple-step experiment. 42 / 52

Tree diagram Step 1 Planning Commission Step 2 City Council approve Sample Points (positive, approve) positive disapprove (positive, disapprove) negative approve (negative, approve) disapprove (negative, disapprove) 43 / 52

Examples How many ways can you create a pizza choosing a meat and two veggies if you have 3 choices of meats and 4 choices for veggies? In how many ways can 6 people be seated in a row? How many possible outcomes can we have when rolling a pair of 6-sided die? 44 / 52

Permutations It allows one to compute the number of outcomes when r objects are to be selected from a set of n objects where the order of selection is important. The number of permutations is given by P n r = n! (n r)! Where n! = n(n 1)(n 2) (2)(1) Rocode for n!: factorial(n) 45 / 52

Allowing Repeated Values When we allow repeated values, The number of orderings of n objects taken r at a time, with repetition is n r. Example: In how many ways can you write 4 letters on a tag using each of the letters C O U G A R with repetition? 46 / 52

Several Objects At Once The number of permutations, P, of n objects taken n at a time with r objects alike, s of another kind alike, and t of another kind alike is P = n! r!s!t! Example: How many different words (they do not have to be real words) can be formed from the letters in the word MISSISSIPPI? 47 / 52

Objects Taken of Circular The number of circular permutations of n objects is (n 1)!. Example: In how many ways can 12 people be seated around a circular table? 48 / 52

Combinations Counts the number of experimental outcomes when the experiment involves selecting r objects from a (usually larger) set of n objects. The number of combinations of n objects taken r unordered at a time is Rcode: choose(n,r) C n r = ( ) n r = n! r!(n r)! 49 / 52

Difference Between Combinations and Permutations 50 / 52

Examples In how many ways can a committee of 5 be chosen from a group of 12 people? In a manufacturing company they have to choose 5 out of 50 boxes to be sent to a store. How many ways can they choose the 5 boxes? 51 / 52

Examples 1. A researcher selects 3 fish from a tank of 12 and puts each of the 3 fish into different containers. How many ways can this be done? 2. Among 10 electrical components 2 are known not to function. If 5 components are randomly selected, how many ways can we have only one of components not functioning? 52 / 52