SOLUTIONS TO THE LAB 1 ASSIGNMENT

Similar documents
Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Unit 2 Statistics of One Variable

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Some estimates of the height of the podium

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Lecture 1: Review and Exploratory Data Analysis (EDA)

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Frequency Distribution and Summary Statistics

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

DATA SUMMARIZATION AND VISUALIZATION

appstats5.notebook September 07, 2016 Chapter 5

SOLUTIONS: DESCRIPTIVE STATISTICS

CHAPTER 2 Describing Data: Numerical

STAT 113 Variability

Lecture 2 Describing Data

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Descriptive Statistics

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Section3-2: Measures of Center

Skewness and the Mean, Median, and Mode *

Description of Data I

Numerical Descriptions of Data

Lecture Week 4 Inspecting Data: Distributions

Describing Data: One Quantitative Variable

22.2 Shape, Center, and Spread

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

DATA HANDLING Five-Number Summary

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Numerical Measurements

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Monte Carlo Simulation (Random Number Generation)

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

Numerical Descriptive Measures. Measures of Center: Mean and Median

Descriptive Statistics

2 Exploring Univariate Data

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Probability Distribution Unit Review

Mini-Lecture 3.1 Measures of Central Tendency

Applications of Data Dispersions

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Some Characteristics of Data

starting on 5/1/1953 up until 2/1/2017.

8. From FRED, search for Canada unemployment and download the unemployment rate for all persons 15 and over, monthly,

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Terms & Characteristics

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Topic 8: Model Diagnostics

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

Math 140 Introductory Statistics. First midterm September

SUMMARY STATISTICS EXAMPLES AND ACTIVITIES

Lectures delivered by Prof.K.K.Achary, YRC

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL

Statistics I Chapter 2: Analysis of univariate data

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Chapter 4. The Normal Distribution

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

3.1 Measures of Central Tendency

Empirical Rule (P148)

Misleading Graphs. Examples Compare unlike quantities Truncate the y-axis Improper scaling Chart Junk Impossible to interpret

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

STAT 157 HW1 Solutions

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Simple Descriptive Statistics

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

STA 248 H1S Winter 2008 Assignment 1 Solutions

Diploma in Financial Management with Public Finance

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Fundamentals of Statistics

Full file at

STASTICAL METHODOLOGY FOR DEVELOPING TIME STANDARDS American Association for Respiratory Care 2011 All Rights Reserved

1.2 Describing Distributions with Numbers, Continued

2 DESCRIPTIVE STATISTICS

NCSS Statistical Software. Reference Intervals

Engineering Mathematics III. Moments

Putting Things Together Part 2

Numerical summary of data

NOTES: Chapter 4 Describing Data

STOCHASTIC COST ESTIMATION AND RISK ANALYSIS IN MANAGING SOFTWARE PROJECTS

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

1 Describing Distributions with numbers

Math146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39

Session 5: Associations

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Construct a runs plot and determine if the process appears to be in statistical control.

Introduction to Descriptive Statistics

Found under MATH NUM

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Shifting and rescaling data distributions

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Transcription:

SOLUTIONS TO THE LAB 1 ASSIGNMENT Question 1 Excel produces the following histogram of pull strengths for the 100 resistors: 2 20 Histogram of Pull Strengths (lb) Frequency 1 10 0 9 61 63 6 67 69 71 73 7 (c) The histogram is one-peaked, bell-shaped, and approximately symmetric. Given the relatively small spread, there is one observation (between 74 and 7) lying far above the main body of the data. This observation may be considered an outlier. We will verify in Question 2 that indeed, the single observation is an outlier in a formal sense. The tails of the distribution are relatively short. The center of the distribution is at approximately 6 pounds. As the distribution is approximately symmetric, we expect that the values of mean and the median are very similar, and close to 6. If all 100 PST values were overestimated by approximately the same small positive value due to a poorly calibrated measuring device, the shape of the histogram would be approximately the same as the histogram for the overestimated values. However, the center (peak) of the histogram would be shifted to the left by the difference between the overestimated values and the accurate values. The mean and the median would also be shifted by the difference to the left but standard deviation and the interquartile range would not be affected (would be the same as the values obtained for the overestimated PST values. Question 2 The summary statistics for the pull strengths obtained with the Descriptive Statistics tool are displayed below: Summary Statistics Mean 64.89 Standard Error 0.29214323 Median 64.4 Mode 64.3 Standard Deviation 2.921432297 Sample Variance 8.34766667 1

Kurtosis 0.6677167 Skewness 0.282186648 Range 16.3 Minimum 8.2 Maximum 74. Sum 648.9 Count 100 The Paste Function feature applied to our data returns the following values of the first quartile, the third quartile, and the interquartile range: First Quartile Q 1 = 63.17 Third Quartile Q 3 = 66.800 Interquartile range = 3.62 (c) As the distribution of pull strengths is approximately symmetric, the mean and standard deviation are appropriate measures of center and variation. The median and the interquartile range are used for skewed distributions. Question 3 According to the 1.*IQR criterion, an outlier is any data point that lies below Q 1-1.*IQR or above the value Q 3 +1.*IQR. Taking into account the values of the lower and upper quartiles, and the interquartile range obtained in Question 2, an outlier lies below 7.737 and above 72.237. There is only one observation that satisfies the condition, the value of 74. - the largest observation in the data set. The outlier 74. lies far above the main body of the data. Thus we expect that the mean and the standard deviation of the remaining 99 observations would decrease. We do not expect a significant change in the value of the median. The summary statistics for the data without the outlier are displayed below: Summary Statistics (Outlier Removed) Mean 64.76161616 Standard Error 0.278230661 Median 64.4 Mode 64.3 Standard Deviation 2.768360123 Sample Variance 7.66381777 Kurtosis -0.109386988 Skewness 0.0029634 Range 13.4 Minimum 8.2 Maximum 71.6 Sum 6411.4 Count 99 The table confirms the conclusions we have reached before. 2

Question 4 In order to convert all 100 PST measurements to kilograms, it is necessary to multiple each value in the column PST by 0.44. As a consequence, the new mean and the new median can be also obtained by multiplying the value of the mean and the median for the measurements expressed in pounds by 0.44. Moreover, given the formula for the standard deviation and the above, the new standard deviation can be obtained from the standard deviation for the original data by multiplying it by 0.44. Also the interquartile range for the data in kilograms is equal to the interquartile range for data on the original scale of measurement multiplied by 0.44. The histogram for the data expressed in kilograms will have the same shape as the histogram obtained in Question 1. The peak of the new histogram will be approximately at 6*0.44 = 29.1. Question In order to answer the question whether the new ozone-friendly cleaning process produces similarly strong or stronger solder-joints, on the average, we look at the summary statistics for the distribution. The mean of the pull strengths obtained is 64.761616, and it is almost identical to the mean of pull strengths for the old technology (64.8). The small difference is due to sampling variability. Thus the new technology produces solder-joints of similar strength, on the average. Now we compare the variability of the two processes. The standard deviation for the old technology is 2.2 lb. This value is smaller than the value of 2.7683 lb obtained in Question 3 (after excluding the outlier). Given the large sample size that the new standard deviation is based on (99), it is safe to conclude that the new process results in slightly higher variability than the old process. More advanced statistical methods are required to determine whether the difference is statistically significant. The new process can be examined thoroughly to determine whether some sources of extra variation can be eliminated. Question 6 The histogram of electrical resistance for the 100 boards is displayed below: Histogram of Electrical Resistances 2 20 Frequency 1 10 0 0.2 0.6 1 1.4 1.8 2.2 2.6 3 More The histogram is one-peaked, and skewed to the right. Most of the observations lie between 0 and 1, but there are several observations outside the range. The right tail is longer than the left tail of the distribution. There is one outlier. 3

As the distribution is skewed, median and interquartile range are appropriate measures of center and spread, respectively. Question 7 The scatterplot of electrical resistance (RES) versus pull strength (PST) displays the relationship between the two variables. It allows you to assess the type of relationship (linear, nonlinear), direction (positive, negati ve), and its strength. The scatterplot for the data is displayed below: Scatterplot of RES vs. PST Electrical Resistance (in teraohms) 3.0 3.00 2.0 2.00 1.0 1.00 0.0 0.00 60 6 70 7 Pull Strength (in pounds) There is no clear pattern in the plot. It seems that the points in the plot are randomly scattered. However, it is worthy to notice a substantial difference in the variation of pull strength values for low electrical resistance values relative to that one for the high electrical resistance values. There are no obvious outliers in the plot. 4

LAB 1 ASSIGNMENT MARKING SCHEMA Proper Header and appearance: 10 points 1. Correctly formatted histogram: 6 points. Analysis of the shape of the histogram: 3 points Center (estimates of the mean and the median): 2 points (c) Histogram of accurate measurements: 2 points Mean, Median, standard deviation and IQR of accurate values: 2 points 2. Summary Statistics: Descriptive Statistics output (mean, median, standard deviation, IQR): 4 points First Quartile, Third Quartile, IQR: 3 points (c) Discussion of appropriateness: 2 points 3. Determining the lower and upper range for outliers: 2 points Identifying the outlier: 2 points Effect of removing the outlier on some summary statistics: 3 points 4. Effect of expressing the PST values in kilograms on summaries: 2 points Effect of expressing the PST values in kilograms on histogram: 2 points. Comparing the average strength of resistors: 2 points Comparing the variability of the two processes: 2 points 6. Correctly formatted histogram: 6 points. Analysis of the shape of the histogram: 3 points Numerical measures to describe typical resistance and the spread: 2 points 7. Relationship between pull strengths and resistance Discussion of the pattern in the scatterplot: 3 points Outliers: 1 point Correctly formatted scatterplot: 6 points TOTAL = 70