STAB22 section 2.2. Figure 1: Plot of deforestation vs. price

Similar documents
STAB22 section 1.3 and Chapter 1 exercises

SEX DISCRIMINATION PROBLEM

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Descriptive Statistics (Devore Chapter One)

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Chapter 18: The Correlational Procedures

Stat3011: Solution of Midterm Exam One

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

CHAPTER 2 Describing Data: Numerical

LINEAR COMBINATIONS AND COMPOSITE GROUPS

Session 5: Associations

Chapter 12. Homework. For each situation below, state the independent variable and the dependent variable.

Problem Set 4 Solutions

Establishing a framework for statistical analysis via the Generalized Linear Model

Mathematics: A Christian Perspective

STAB22 section 5.2 and Chapter 5 exercises

STOR 155 Practice Midterm 1 Fall 2009

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Handout 3 More on the National Debt

Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation,

Chapter 5 The Standard Deviation as a Ruler and the Normal Model

Important definitions and helpful examples related to this project are provided in Chapter 3 of the NAU MAT 114 course website.

Chapter 3. Lecture 3 Sections

ECO155L19.doc 1 OKAY SO WHAT WE WANT TO DO IS WE WANT TO DISTINGUISH BETWEEN NOMINAL AND REAL GROSS DOMESTIC PRODUCT. WE SORT OF

Math 140 Introductory Statistics. First midterm September

Symmetric Game. In animal behaviour a typical realization involves two parents balancing their individual investment in the common

Linear regression model

Full file at Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations

SOLUTIONS TO THE LAB 1 ASSIGNMENT

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

The Normal Distribution

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Summary of Statistical Analysis Tools EDAD 5630

Checks and Balances TV: America s #1 Source for Balanced Financial Advice

Use the data you collected and plot the points to create scattergrams or scatter plots.

Chapter 14. Descriptive Methods in Regression and Correlation. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 14, Slide 1

Honors Statistics. 3. Discuss homework C2# Discuss standard scores and percentiles. Chapter 2 Section Review day 2016s Notes.

Examples of continuous probability distributions: The normal and standard normal

Multiple regression - a brief introduction

What Should the Fed Do?

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Annually Renewable Term Insurance

Homework: (Due Wed) Chapter 10: #5, 22, 42

Case 2: Motomart INTRODUCTION OBJECTIVES

TABLE OF CONTENTS C ORRELATION EXPLAINED INTRODUCTION...2 CORRELATION DEFINED...3 LENGTH OF DATA...5 CORRELATION IN MICROSOFT EXCEL...

Subject: Psychopathy

Ti 83/84. Descriptive Statistics for a List of Numbers

Unit2: Probabilityanddistributions. 3. Normal distribution

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

How I Trade Forex Using the Slope Direction Line

3 Ways to Write Ratios

Chapter 2. Section 2.1

Inverted Withdrawal Rates and the Sequence of Returns Bonus

Retirement. Optimal Asset Allocation in Retirement: A Downside Risk Perspective. JUne W. Van Harlow, Ph.D., CFA Director of Research ABSTRACT

Putting Things Together Part 2

Business Statistics: A First Course

NCSS Statistical Software. Reference Intervals

DATA SUMMARIZATION AND VISUALIZATION

You should already have a worksheet with the Basic Plus Plan details in it as well as another plan you have chosen from ehealthinsurance.com.

Notes 6: Examples in Action - The 1990 Recession, the 1974 Recession and the Expansion of the Late 1990s

Point-Biserial and Biserial Correlations

Answers To Chapter 7. Review Questions

Finding Math All About Money: Does it Pay? (Teacher s Guide)

Estimating a demand function

PAIRS TRADING (just an introduction)

Chapter 12. Homework. For each situation below, state the independent variable and the dependent variable.

Exploratory Data Analysis

Putting Things Together Part 1

Jacob: What data do we use? Do we compile paid loss triangles for a line of business?

Maybe you can see through my eyes well, maybe I can try to show you what I see through my eyes.

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

The Spearman s Rank Correlation Test

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

Lecture 5 - Continuous Distributions

3 Ways to Write Ratios

Handout 5: Summarizing Numerical Data STAT 100 Spring 2016

d) What is the slope? Interpret in the context of the problem.

Sampling Distributions

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

GOVERNMENT POLICIES AND POPULARITY: HONG KONG CASH HANDOUT

Correlation and Regression Applet Activity

GETTING STARTED. To OPEN MINITAB: Click Start>Programs>Minitab14>Minitab14 or Click Minitab 14 on your Desktop

Business Statistics. University of Chicago Booth School of Business Fall Jeffrey R. Russell

Learning Objectives for Ch. 7

STAT 1220 FALL 2010 Common Final Exam December 10, 2010

March 2016 Remodeling Business Pulse Tracking Remodeling Conditions & Outlook. Conducted by:

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

Assessment Schedule 2017 Mathematics and Statistics: Demonstrate understanding of chance and data (91037)

Probability & Statistics Modular Learning Exercises

The Accuracy of Percentages. Confidence Intervals

Full file at

Chapter 33: Public Goods

Gender pay gap report. Pension Protection Fund

M249 Diagnostic Quiz

Jacob: The illustrative worksheet shows the values of the simulation parameters in the upper left section (Cells D5:F10). Is this for documentation?

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Figure 3.6 Swing High

The figures in the left (debit) column are all either ASSETS or EXPENSES.

Transcription:

STAB22 section 2.2 2.29 A change in price leads to a change in amount of deforestation, so price is explanatory and deforestation the response. There are no difficulties in producing a plot; mine is in Figure 1. It seems to be a pretty clear upward trend (positive association), with a higher price associated with more deforestation. So we d expect the correlation to be positive. Figure 1: Plot of deforestation vs. price We can also use Minitab to get the mean and SD for each variable. These are shown in Figure 2. Descriptive Statistics Variable N Mean Median TrMean StDev SE Mean price 5 50.00 54.00 50.00 16.32 7.30 deforest 5 1.738 1.690 1.738 0.928 0.415 Variable Minimum Maximum Q1 Q3 price 29.00 72.00 34.50 63.50 deforest 0.490 3.100 1.040 2.460 Figure 2: Descriptives for price and deforestation To calculate the correlation, standardize each data value, using the correct mean and SD. For example, for the first price, 29, the standardized value is (29 50)/16.32 = 1.29; the first deforestation value, 0.49, standardizes to (0.49 1.738)/0.928 = 1.35. Continuing in this way gives: z x z y z x z y -1.29-1.35 1.73-0.61-0.15 0.10 0.25-0.05-0.01 0.31 0.09 0.03 1.35 1.47 1.98 As you see, multiply the two standardized values together to get the last column. You can see that most of the pairs of standarized values have the same sign, so the number in the last column is positive. (The x-values and y-values tend to be above or below the mean together.) Finally, add up the numbers in the last column and divide by n 1 to get r = 3.82/4 = 0.955. We saw in the scatterplot that the relationship was upward and strong, so we d expect to get a correlation that is positive and clsoe to 1, as we did. Minitab should produce about the same value (to within rounding error) and does. Select Stat, Basic Statistics and Correlation. Select the two columns (order doesn t matter) and click OK. See Figure 3. Correlations (Pearson) Correlation of price and deforest = 0.955, P-Value = 0.011 Figure 3: Correlation of price and deforestation You can see that calculating the correlation by hand is a slow process, so don t expect to be asked to do it on an exam. 2.30 Here and in the next exercise, there s no value in calculating the correlation by hand, so use Minitab to do it: Stat, Basic Statistics, 1

Correlation. Select the two variables (doesn t matter which one is first), to get a correlation of -0.221. This is a weak correlation, and we saw that the relationship in Exercise 2.6 was weak: it was hard to predict final exam score from the first test score. Indeed, the correlation here is negative, which even suggests that students who do better on the first test do worse on the final exam! 2.31 Same deal as in the previous exercise: read the data into Minitab, and find the correlation. Here it is 0.519, which is positive and stronger than the previous one (if not itself very strong). This means that higher final exam scores go with higher second test scores, though the correspondence is not near perfect. This matches the message we got from the scatterplot in 2.7. 2.34 This is a moderately strong correlation. A correlation of 0.9 would show the points being much closer to the line, while a correlation of 0.1 would show much less pattern. So the correlation must be closest to 0.6. The actual correlation (from Minitab) is 0.6821. 2.35 This is again closest to 0.6, for the same reasons as in 2.34. (The correlation looks weaker to me, but it is often surprising how strong a correlation can actually be). 2.37 The correlation would not change; correlation is a number without units, and it doesn t change if the units of the variables change. 2.38 If you didn t do 2.28, draw a scatterplot here. Mine is in Figure 4. It makes more sense to think of 2003 as the response and 2002 as the explanatory variable. The correlation for all 23 funds is 0.623. (This was found using Minitab: select Stat, Basic Statistics, Correlation, and then selecting the two variables in either order.) The Gold Fund is in row 20 of the data set; click on the row label 20, right-click on the selected cells, then select Delete Cells. Figure 4: Scatterplot of sector funds The original row 20 disappears. Now you can re-calculate the correlation: it is now 0.872. Taking out just one pair of returns has changed the correlation quite dramatically, so the correlation is not resistant. (This is no surprise, given that the correlation is calculated using means and SDs, and they are not resistant.) The Gold Fund is the point on the far right of the scatter plot: an outlier because it does not lie on the general trend. Taking it out should make the relationship stronger and thus make the correlation closer to 1. 2.40 The correlation for the day length data is 0.280, and for the icicle lengths is 0.876. (I found these both by reading the data sets into Minitab and getting it to find the correlations.) This question illustrates that you can t really ask the question how high a correlation is high because the answer is always going to be it depends. What it depends on is the field of study, and the kind of study: a lab experiment will usually give a stronger correlation than field data, and so an interesting correlation will be different in the two cases. 2

2.42 The data for this one don t appear to be on the disk, so if you want to use Minitab, you ll have to type the data in yourself. I used Minitab for this exercise; my plot is in Figure 5. Figure 6: Fuel used against speed Figure 5: Scatterplot of Ex. 2.30 data The correlation (calculated by hand, by calculator, or by Minitab) is 0.481, which is about 0.5, and not especially high. On the scatterplot, you see a nice trend through five of the points, but an outlier at the bottom right (the point with x = 10 and y = 1). This is bringing the correlation down. If you take out this one point, the correlation shoots up to 0.993. So this shows that the correlation (because it s based on means and SDs) can be badly affected by outliers. 2.44 See Figure 6 for the scatterplot. The correlation is 0.172, close to zero; this is because the trend shown on the scatterplot is a curve rather than a straight line. Since there isn t a straight-line relationship, there isn t a very big correlation. You can often guess what the correlation will be for a curved relationship. This one goes down sharply and then up moderately. Since the overall trend is more down than up, the correlation comes out negative. But it doesn t reflect the fact that if you used a curve to predict fuel consumption from speed, you would be able to get accurate answers. 2.45 My plot is shown in Figure 7. (I thought of highway gas mileage as the response; you could do it the other way around.) The Insight is the car far at the top right. It seems to fit the linear pattern made by the other cars. Using Minitab, the correlation using all the cars is 0.990. Taking out the row containing the Insight and re-calculating the correlation gives a correlation of 0.973. Both correlations are high, but the Insight increases the strength of the relationship by virtue of being so far away. 2.47 Invent some plausible ages for women getting married, say 20, 22, 24, 26, 28. The men they marry have ages 22, 24, 26, 28, 30. You can draw a scatterplot by hand, or enter the numbers into a Minitab worksheet and draw the scatterplot that way. My plot is in Figure 8; yours will be different according to the ages you chose. 3

The points form a perfect straight line (with equation y = x+2), so the correlation should be 1. You can check this by calculation if you want. Figure 7: Scatterplot of city and highway gas mileages 2.50 The correlation is a number without units calculated from two quantitative variables. Knowing this helps you see the blunders. (a) Gender is categorical, not quantitative; (b) 1.09 is a very high correlation; the blunder is that the correlation must be between -1 and 1. (c) the correlation doesn t have units (such as bushels). You could calculate a correlation in (a) by turning the categorical variable into numbers, say 1 for males and 0 for females. The result is called a point-biserial correlation coefficient. In the case of (a), if it came out positive it would mean that males tend to have higher incomes (because males were given the value 1). 2.51 Examine the relationships means draw some scatterplots, and if you think it is reasonable, calculate some correlations. For GPA and IQ, the scatterplot is in Figure 9. The relationship, on the whole, is moderately linear, but with a few outliers at the bottom end (the two students with the lowest GPAs, and maybe the student with the lowest IQ also). The correlation is 0.634. For GPA and self-concept, the plot is in Figure 10. There seems to be an upward trend, but a weaker one than for GPA and IQ (the correlation here is 0.542). The student with the lowest GPA again appears to be an outlier. In both cases, the relationship looks roughly linear, and so calculating a correlation is sensible. Taking out the outliers would make the correlations (a little) bigger, but not very much bigger as there is so much data. Figure 8: Plot of husband s and wife s ages 4

Figure 9: Scatterplot of GPA and IQ Figure 10: Scatterplot of GPA vs. self-concept 5