Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation,

Similar documents
Business Statistics: A First Course

Subject: Psychopathy

Chapter 18: The Correlational Procedures

Chapter 14. Descriptive Methods in Regression and Correlation. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 14, Slide 1

GRAPHS IN ECONOMICS. Appendix. Key Concepts. Graphing Data

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Risk and Return and Portfolio Theory

Regression. Lecture Notes VII

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

CHAPTER 2 Describing Data: Numerical

Session 5: Associations

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

We take up chapter 7 beginning the week of October 16.

Risk Analysis. å To change Benchmark tickers:

Maths/stats support 12 Spearman s rank correlation

INTERNATIONAL JOURNAL FOR INNOVATIVE RESEARCH IN MULTIDISCIPLINARY FIELD ISSN Volume - 3, Issue - 2, Feb

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

A Statistical Analysis: Is the Homicide Rate of the United States Affected by the State of the Economy?

ERM (Part 1) Measurement and Modeling of Depedencies in Economic Capital. PAK Study Manual

Random Variables and Probability Distributions

M249 Diagnostic Quiz

Establishing a framework for statistical analysis via the Generalized Linear Model

Impact of Unemployment and GDP on Inflation: Imperial study of Pakistan s Economy

Chapter 6 Simple Correlation and

STAB22 section 2.2. Figure 1: Plot of deforestation vs. price

Business Statistics. University of Chicago Booth School of Business Fall Jeffrey R. Russell

Quantitative Methods

Quantitative Methods

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

Econometrics and Economic Data

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

FINITE MATH LECTURE NOTES. c Janice Epstein 1998, 1999, 2000 All rights reserved.

Section-2. Data Analysis

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

rise m x run The slope is a ratio of how y changes as x changes: Lines and Linear Modeling POINT-SLOPE form: y y1 m( x

Chapter 6: Quadratic Functions & Their Algebra

3. Joyce needs to gather data that can be modeled with a linear function. Which situation would give Joyce the data she needs?

Semester Exam Review

STT 315 Handout and Project on Correlation and Regression (Unit 11)

Problem Set 2. PPPA 6022 Due in class, on paper, March 5. Some overall instructions:

Probability & Statistics Modular Learning Exercises

MAY 2018 PROFESSIONAL EXAMINATIONS QUANTITATIVE TOOLS IN BUSINESS (PAPER 1.4) CHIEF EXAMINER S REPORT, QUESTIONS AND MARKING SCHEME

Copyrighted 2007 FINANCIAL VARIABLES EFFECT ON THE U.S. GROSS PRIVATE DOMESTIC INVESTMENT (GPDI)

Introduction to Population Modeling

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

σ e, which will be large when prediction errors are Linear regression model

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Chapter 4 Factoring and Quadratic Equations

VARIABILITY: Range Variance Standard Deviation

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

Stat3011: Solution of Midterm Exam One

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Module 6 Portfolio risk and return

Diploma in Financial Management with Public Finance

5.3 Standard Deviation

Name: Common Core Algebra L R Final Exam 2015 CLONE 3 Teacher:

notebook October 08, What are the x and y intercepts? (write your answers as coordinates).

DATA HANDLING Five-Number Summary

The Spearman s Rank Correlation Test

12.1 One-Way Analysis of Variance. ANOVA - analysis of variance - used to compare the means of several populations.

SEX DISCRIMINATION PROBLEM

Name: Class: Date: in general form.

Use the data you collected and plot the points to create scattergrams or scatter plots.

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

PRACTICE PROBLEMS FOR EXAM 2

BUSINESS MATHEMATICS & QUANTITATIVE METHODS

Describing Data: Displaying and Exploring Data

Statistics & Statistical Tests: Assumptions & Conclusions

Probability distributions relevant to radiowave propagation modelling

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

IOP 201-Q (Industrial Psychological Research) Tutorial 5

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

C03-Fundamentals of business mathematics

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

The SAS System 11:03 Monday, November 11,

Lesson 21: Comparing Linear and Exponential Functions Again

Exploring Data and Graphics

COST-VOLUME-PROFIT ANALYSIS

MATH 143: Introduction to Probability and Statistics Worksheet 9 for Thurs., Dec. 10: What procedure?

CEO Attributes, Compensation, and Firm Value: Evidence from a Structural Estimation. Internet Appendix

DATA SUMMARIZATION AND VISUALIZATION

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

1. (9; 3ea) The table lists the survey results of 100 non-senior students. Math major Art major Biology major

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Point-Biserial and Biserial Correlations

Section 5.6: HISTORICAL AND EXPONENTIAL DEPRECIATION OBJECTIVES

STA1510 (BASIC STATISTICS) AND STA1610 (INTRODUCTION TO STATISTICS) NOTES PART 1

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

CHAPTER 2 RISK AND RETURN: Part I

UNIVERSITY OF MUMBAI

Data screening, transformations: MRC05

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Tables and Charts. Numbers Title of Tables Page Number

Public Employees as Politicians: Evidence from Close Elections

BARUCH COLLEGE MATH 2003 SPRING 2006 MANUAL FOR THE UNIFORM FINAL EXAMINATION

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Transcription:

Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation, Hour 2 Hypothesis testing for correlation (Pearson) Correlation and regression.

Correlation vs association Association refers to any sort of trend between between any two variables. Correlations are a specific type of association. Correlation refers to a trend (usually linear) between any two variables of interval data pertaining to the same set of observations.

In each case 'trend' just means 'happens together'. Examples of association: Health science is more popular amongst women, computer science is more popular amongst men. There is an association between field of study and gender.

Lifetime incomes of post-secondary graduates is higher than that of high school graduates. There is a (positive) association between education level and lifetime income. See required reading note 2.1: Ordinal data. Examples of correlation*: The weight of bearded dragons increases with the head-totail length of bearded dragons. This is a positive correlation.

Country by country, life expectancy at birth increases as the income-per-capita increases. This is a positive correlation. Heating costs decrease as outdoor temperature increases. This is a negative correlation. *Some examples have a non-linear component, we will revisit these later. The most common graph to show two sets of interval data together is the scatter plot.

Each dot represents a subject. In Length vs. Weight, each dot is a dragon.

The height of the dot represents the length of the dragon. How far it is to the right represents the weight of the dragon. The dragon for this dot is 18cm long, and weighs 700g.

There is an obvious upward trend in the graph. This shows a positive correlation.

The negative correlation between heating cost and outdoor temperature can be shown the same way.

The lack of correlation between two variables can also be show in a scatterplot.

Basil is happy(?) to be a data point.

Pearson coefficient Pearson s correlation coefficient refers to the strength and direction of a linear trend between two numerical variables (usually continuous, but not always). It is the most popular to use and is considered the default option. If someone is referring to the correlation, it's almost always the Pearson correlation coefficient. Much like how mean is the default of average.

Specifically, the Pearson correlation coefficient is... r when representing a sample statistic or ρ, ( rho, pronounced 'row') when representing a parameter. Pearson correlation is always a value between -1 and 1 that tells how strong a correlation is and in what direction.

The stronger a correlation, the farther the coefficient is from zero (and the closer it is to 1 or -1)

Positive correlations have positive coefficients r. Negative correlations have negative coefficients r. The stronger the negative correlation, the closer it is to -1.

A perfect correlation, one in which all the values fit perfectly on a line, has a correlation 1 (for positive) or -1 (for negative).

If there is no correlation at all, r will have a value of zero. However, since r is from a sample, it will vary like everything else from a sample. Instead of zero, it usually has some value close to zero on either side.

The Pearson's correlation can be computed from a sample using a lot of different ways. This is my personal favourite....because it can be simplified with some intermediate steps.

First, recall the standard deviation formula, for x and likewise for y, which you have previously seen all in one square root.

Next, a handy property of square roots, they cancel. All this makes the Pearson correlation formula can be written as:

But we can go further by putting the standard deviations inside the sum. The sum is over i, so sx and sy are like constant values. Do the parts the parentheses look familiar? The parts in the parentheses are the standard values of x and y, respectively.

where z i is the number of standard deviations that x i is above or below the mean of x. If x i is above the mean, then z i is positive. If x i is below the mean, then z i is negative.,

Now we have this! z xi and z yi are the standard values for each x and y, respectively. n is number of (x,y) pairs. It's the number of observations as usual, but we have measured two variables from each. (necessary complicated formulae like this will be available on exams)

When both x and y are above average, the term inside the sum, z xi z yi is positive. Likewise when they are both below average, because z xi z yi becomes a product of two negatives. So, when x and y are above average together and below average together a lot, r sums to a positive number.

Why are we looking at these formulae?

- Reminder of the formulae for standard deviation s and for the standardized value z. - A demonstration of the how several of the classic statistics formulae connect. - To show what's 'under the hood' of the correlation coefficient.

But sometimes it doesn't come together right.

Scatterplots show the interaction between two variables, and Pearson's correlation coefficient shows the strength and direction of the linear trend in that interaction.

Pearson's correlation does NOT, however, indicate the slope of that linear relationship. Only whether it is negative or positive.

It is also not an appropriate measure to describe non-linear relationships between variables.

In real world contexts, the most common form of non-linear relationship is a curvilinear one. (See: Gapminder World)

Life expectancy increases with the logarithm of income, not linearly with income. (See: Gapminder World) In this case, the issue is one of diminishing returns.

In other cases, a curvilinear relationship is the result of multiple competing factors.

Mathematically, non-linear means messy.

Spearman's rank-sum correlation. The Spearman correlation coefficient is the go-to alternative to Pearson. Calculation of the Spearman correlation doesn't use the values of x and y directly, but their ranks. Compared to Pearson's r, the Spearman correlation is more flexible, but also less able to account for extreme values. See: Optional reading note 2.2 Non-Parametrics.

The rank of a value is where it falls within a sample. Amongst n numbers, the lowest value is given 1, the highest is given n. Ties are averaged. Example data: 2, 10, 999, 4, 7, -30, 12, 10 Ranks of example data: 2, 5.5, 8, 3, 4, 1, 7, 5.5

Using ranks allows the Spearman correlation to describe the strength and direction of any relation as long as the general trend it is always increasing, or always decreasing. In other terms, Spearman correlation is 'blind' to the amount of increase or decrease in a non-linear relationship. Consider that in the above example data, the distances... from -30 to 2... from 2 to 4, and... from 12 to 999......are each only 1 rank.

Take a break, but stay warmed up.

reading note 2.1: Ordinal data. Reading note 2.2: Non-parametric the window effect in regression, regression and the bivariate normal.

Diagrams used: "Spearman fig1" by Skbkekas - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/file:spearman_fig1.svg#/media/file:spearman_fig1.svg "Spearman fig2" by Skbkekas - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/file:spearman_fig2.svg#/media/file:spearman_fig2.svg "Spearman fig3" by Skbkekas - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/file:spearman_fig1.svg#/media/file:spearman_fig3.svg "Spearman fig4" by Skbkekas - Own work. Licensed under CC BY-SA 3.0 via Commons https://commons.wikimedia.org/wiki/file:spearman_fig1.svg#/media/file:spearman_fig4.svg