Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html
Math 2311 Class Notes for Section 1.3-1.5 Section 1.3: Another important question we want to answer about data is about its spread or dispersion. Roughly speaking, the population standard deviation, σ, tells the average distance that data values fall from the mean. The standard deviation is the square root of the population variance, σ 2. So, what is the variance? The variance is the average of the squared differences of the data values from the mean. If N is the number of values in a population with mean µ, and x i represents each individual value in the population, then the variance is found by: σ 2 = N i=1 (x i µ) N And the population standard deviation is σ = σ 2
Most of the time we are not working with the entire population. Instead, we are working with a sample. Sample variance - s 2 = n i=1 (x i x ) n 1 Sample standard deviation - s = s 2 Example: 1. A statistics teacher wants to decide whether or not to curve an exam. From her class of 300 students, she chose a sample of 10 students and their grades were: 72, 88, 85, 81, 60, 54, 70, 72, 63, 43 Find the mean, variance and standard deviation for this sample.
2. Suppose the statistics teacher decides to curve the grades by adding 10 points to each score. What is the new mean, variance and standard deviation?
We can see from example 2 that adding the same value to all elements does not affect the variance (or standard deviation) of a set of data. What about multiplying? 3. Find the variance and the standard deviation for the following set of data (whose mean is 4.5) 3, 6, 2, 7, 4, 5 Now, multiply each value by 2. What is the new variance and the new standard deviation?
Sometimes we want to compare the variation between two groups. The coefficient of variation can be used for this. The coefficient of variation is the ratio of the standard deviation to the mean. A smaller ratio will indicate less variation in the data. Example: 4. The following statistics were collected on two different groups of stock prices: Portfolio A Portfolio B Sample size 10 15 Sample mean $52.65 $49.80 Sample standard deviation $6.50 $2.95 What can be said about the variability of each portfolio?
Section 1.4: More measures of spread (or dispersion): Range Drawbacks of range: sensitivity to outliers Percentiles: o 25 th percentile, Q1 o 50 th percentile, Median or Q2 o 75 th percentile, Q3
The values of the minimum, Q1, Q2, Q3 and the maximum make up what is called our five number summary. IQR Example: 1. Twelve babies spoke for the first time at the following ages (in months): 8 9 10 11 12 13 15 15 18 20 20 26 Find Q1, Q2, Q3, the range and the IQR.
The IQR is used to determine data classified as outliers. An outlier is an observation that is distant from the rest of the data. Outliers can occur by chance or be measurement errors so it is important to identify them. Any point that falls outside the interval calculated by Q1-1.5(IQR) and Q3 + 1.5(IQR) is considered an outlier. Example: 2. Are there any outliers in the data set given for example 1? If so, what are they?
There are other percentiles as well. The kth percentile means that k% of the ordered data values are at or below that data value. For example, if the median is 100, then 50% of the ordered data values fall at or below 100. Also, (100-k)% represents the amount of ordered data that falls above the percentile data value. If you are looking for the measurement that has a desired percentile rank, the 100Pth percentile, is the measurement with rank (or position in the list) of np+0.5, where n represents the number of data values in the sample. Example: 3. In a collection of 30 data measurements, which measurement represents the 30 th percentile?
Suppose you know the position (the order) of a value and want to know what percentile it is ranked at. In general, if you have n data measurements, x 1 represents the 100( 1 0.5 ) / n th percentile, x 2 represents the 100( 2 0.5 ) / n th th percentile, and x i represents the 100( i 0.5 ) / n percentile. Example: 4. Using the data in example 1, determine the percentile of the 4 th order statistic ( x ). 4
Section 1.5: Graphs and Describing Distributions Lets start with two examples, one for categorical data and one for quantitative data: Example 1: Suppose we have found the eye color for 85 people. 45 have brown eyes, 25 have blue eyes, 10 have green eyes and 10 have hazel eyes. Display the data in a bar graph. Example 2: Height measurements for a group of people were taken. The results are recorded below (in inches): 66, 68, 63, 71, 68, 69, 65, 70, 73, 67, 62, 59, 63, 68, 71, 63, 63, 60, 64, 66, 58 We will organize these data sets using different graphs: A bar graph is created by listing the categorical data along the x-axis and the frequencies along the y-axis. Bars are drawn above each data value.
A dot plot is made simply by putting dots above the values listed on a number line.
A stem and leaf plot, the data is arranged by values. The digits in the largest place are referred to as the stem and the digits in the smallest place are referred to as the leaf (leaves). The leaves are displayed to the right of the stem. A split stemplot divides up the stems into equal groups. Back-to-back stempots can be used when comparing two sets of data.
Histograms are created by first dividing the data into classes, or bins, of equal width. Next, count the number of observations in each class. The horizontal axis will represent the variable values and the vertical axis will represent your frequency or your relative frequency.
Boxplots not only help identify features about our data quickly (such as spread and location of center) but can be very helpful when comparing data sets. How to make a box plot: 1. Order the values in the data set in ascending order (least to greatest). 2. Find and label the median. 3. Of the lower half (less than the median do not include), find and label Q1. 4. Of the upper half (greater than the median do not include), find and label Q3. 5. Label the minimum and maximum. 6. Draw and label the scale on an axis. 7. Plot the five number summary. 8. Sketch a box starting at Q1 to Q3. 9. Sketch a segment within the box to represent the median. 10. Connect the min and max to the box with line segments. Note: If data contains outliers, a box and whiskers plot can be used instead to display the data. In a box and whiskers plot, the outliers are displayed with dots above the value and the segments begin (or end) at the next data value within the outlier interval.
A pie chart is a circular chart, divided into sectors, indicating the proportion of each data value compared to the entire set of values. Pie charts are good for categorical data.
A cumulative frequency plot of the percentages (also called an ogive) can be used to view the total number of events that occurred up to a certain value. Example: Here is an ogive for Hudson Auto Repair s cost of parts sold: Example: Hudson Auto Repair Ogive with Cumulative Percent Frequencies 100 80 60 40 20 50 60 70 80 90 Parts Cost ($) Where is the median of this data?
Patterns and shapes: Uniform graphs Symmetric graphs
Some other features Bell Shaped Skewed right Skewed left