Most of the transformations we will deal with will be in the families of powers and roots: p X -> (X -1)/-1.
|
|
- Camron Barber
- 5 years ago
- Views:
Transcription
1 Powers and Roots Quite often when we re dealing with quantitative data, it turns out that for the purposes of analysis, it is useful to carry out a transformation of one of the variables of interest. This could occur for any one of a number of reasons: C C C C A distribution may be skewed. This can be problematic for any one of a number of reasons. 1) Skewed distributions are difficult to eamine because many observations may be piled up in one place. 2) Unusual observations in the direction opposite from that of the skew may be hidden. 3) Summary measures like the mean and median don t necessarily make much sense for a skewed distribution. The relationship between two variables many be nonlinear. Again, this can be a problem for any one of a number of reasons. 1) Linear relationships are easier to interpret than nonlinear ones. With a linear relationship, after all, a one unit change in the independent variable leads to some unit change in the dependent variable. 2) The statistical theory for linear models is simple and well-developed. This is what we will be learning for the rest of the quarter. 3) Nonparametric regression is not feasible when there is more than independent variable. Spread may be nonconstant. There are cases where the spread of a variable will depend on its level. In that situation, comparing two distributions of the same variable from populations are at different levels may be difficult. One or both of the variables may be a proportion. Most of the transformations we will deal with will be in the families of powers and roots: p X -> (X -1)/-1. In the case that p is zero, we will substitute the log transformation, X -> log. The relationships of these transformed variables to the original are summarized in the figure below: Note that you will not always be using the specific formula listed above. You may need to fiddle a bit so that all of the independent variables are positive. Also, as a general rule, carrying out a transformation only makes sense when the ratio of the largest to the smallest value is quite large.
2 Creating transformed variables in STATA with the generate command. An important command you will need to use in STATA to create transformed variables is the generate command. The format is generate new variable = epression involving eisting variables for eample, the curves in the previous transparency were generated from an original variable by the following commands: generate y0 = ln() generate y1 = - 1 generate y2 = ^2-1 generate y3 = ^3-1 generate ym1 = (^-1-1)/-1 The original variable had been created to consist of 1000 observations ranging from 0 to 4. Note that ln() is the natural logarithm of, log() is the log base 10 of, and ^3 is cubed. The range of functions you can include in your epression is quite wide. help functions will summarize them for you. For eample, normd() is the height of the normal distribution at, normprob() is the value of the cumulative normal at, and so on y = normd() y = normprob() y = sin(1/) y = sin 2-4 4
3
4 Dealing with skewness If you have a variable that is skewed, you may use a power/root or log transformation to try to make it look more normal, or at least more symmetric. In general, if a distribution is skewed to the right, you may need to eperiment with a log or a root transformation, while if it is skewed to the left, you may need to square or cube it. Here are some eamples: First, a situation where a variable is skewed to the left. On the left is a smoothed univariate distribution (using STATA s kdensity) histogram of countries according to the percentage of males with a primary school education in 1993 (prim93m). Note that it is skewed to the left, indicating that while most countries have a fairly high proportion of males with some primary school education, some have very little. On the right is a smoothed distribution of a variable prim93m2 generated by squaring prim93m. Note that it is much more symmetric, and even sort of normal Primary 93 M (World Bank) prim93m2 Here s a situation where transformation of a left-skewed distribution improves symmetry, but not normality. On the left is a distribution of life epectancies in 1995, on the right is a distribution of th life epectancy to the 4 power: e0 in 1995 (World Bank) e+06 e0_ e+07 The smoothed distribution is the univariate equivalent of the bivariate smoothing we saw earlier.
5 Just as we can raise to a power to adjust for a left skew, we can log or take a root to correct for a right skew. On the left is a smoothed distribution of per capita GDP, on the right is the equivalent for logged per capita GDP United Nations per capita GDP lpcgdp95 Below is an etreme eample. On the left is a distribution of surface areas of countries. It is right skewed, indicating that while most countries have a small surface area, some have very large ones. On the right is a distribution of the tenth-root of surface areas, the result of some eperimentation Surface Area in sq. km. World B AreaRt2
6 Transforming to deal with nonlinearity Often we will want to transform a variable to linearize a relationship. An important rule of thumb is that if we plot the dependent variable on the vertical versus the independent variable on the horizontal, if the plot ehibits a bulge up and to the left, we will probably need to either transform the independent variable downward or the dependent variable upward. If the bulge is down and to the left, we may need to shift transform the independent variable upward or the dependent variable downward. Here s an eample where on the left, male life epectancy is plotted against per capita GDP, and on the right, it is plotted against logged per capita GDP Male e from U.N. Male e from U.N United Nations per capita GDP lpcgdp95 Here s an eample where the dependent variable is TFR United Nations per capita GDP lpcgdp95
7 Nonconstant spread Sometimes the spread of a variable depends on the level of another variable. For eample, according to the figure on the left below, how spread out the distribution of the infant mortality rate is varies according to the level of fertility, even though the relationship between the two is linear. That can sometimes be a problem if you want in some way to compare the distribution of the variable at different levels. Outliers may be harder to locate at one of the levels, for eample. When spread increases with level, working down the ladder of powers can stabilize it. The right on the right below plots the cube root of the infant mortality rate as a function of the TFR IMR IMR
8 Proportions An important transformation when one of the variables represents a proportion is the logit transformation, ln(p/(1-p)). This is the log of the odds of P. Proportions are unusual beasts because they are bounded, by 0 and 1, and often they behave differently when you look at distributions of them. Often you will see eamples of nonconstant spread when you are looking at proportions as a dependent variable, and the logit transformation is another way of dealing with this. Here is the IMR/TFR relationship from the previous page, followed by one where the IMR is transformed into the logit: IMR logitimr
9 Here s an eample where we re looking at a distribution of countries by the percentage of dwellings with running water. It s right skewed, but squaring the percentage doesn t do much that s useful, indeed it produces a bimodal distribution that still looks a bit skewed Fraction Fraction Water (World Bank) mwater95 Here s what happens when we look at a logistic transformation: Fraction lwater95 Notice the distribution is much better-behaved.
Overview. Family of powers and roots
4. Transformations Overview.................................................................. 2 Family of powers and roots...................................................... 3 Family of powers and roots......................................................
More informationSTAT 113 Variability
STAT 113 Variability Colin Reimer Dawson Oberlin College September 14, 2017 1 / 48 Outline Last Time: Shape and Center Variability Boxplots and the IQR Variance and Standard Deviaton Transformations 2
More informationNormal Probability Distributions
Normal Probability Distributions Properties of Normal Distributions The most important probability distribution in statistics is the normal distribution. Normal curve A normal distribution is a continuous
More information1 Describing Distributions with numbers
1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write
More informationSome estimates of the height of the podium
Some estimates of the height of the podium 24 36 40 40 40 41 42 44 46 48 50 53 65 98 1 5 number summary Inter quartile range (IQR) range = max min 2 1.5 IQR outlier rule 3 make a boxplot 24 36 40 40 40
More informationNumerical Descriptive Measures. Measures of Center: Mean and Median
Steve Sawin Statistics Numerical Descriptive Measures Having seen the shape of a distribution by looking at the histogram, the two most obvious questions to ask about the specific distribution is where
More informationNon-linearities in Simple Regression
Non-linearities in Simple Regression 1. Eample: Monthly Earnings and Years of Education In this tutorial, we will focus on an eample that eplores the relationship between total monthly earnings and years
More informationContinuous Distributions
Quantitative Methods 2013 Continuous Distributions 1 The most important probability distribution in statistics is the normal distribution. Carl Friedrich Gauss (1777 1855) Normal curve A normal distribution
More informationPercentiles, STATA, Box Plots, Standardizing, and Other Transformations
Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Lecture 3 Reading: Sections 5.7 54 Remember, when you finish a chapter make sure not to miss the last couple of boxes: What Can Go
More informationMathematics 1000, Winter 2008
Mathematics 1000, Winter 2008 Lecture 4 Sheng Zhang Department of Mathematics Wayne State University January 16, 2008 Announcement Monday is Martin Luther King Day NO CLASS Today s Topics Curves and Histograms
More informationAssessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College
Introductory Statistics Lectures Assessing Normality Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the author 2009 (Compile
More informationappstats5.notebook September 07, 2016 Chapter 5
Chapter 5 Describing Distributions Numerically Chapter 5 Objective: Students will be able to use statistics appropriate to the shape of the data distribution to compare of two or more different data sets.
More informationDot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.
Introduction We continue our study of descriptive statistics with measures of dispersion, such as dot plots, stem and leaf displays, quartiles, percentiles, and box plots. Dot plots, a stem-and-leaf display,
More informationThe Normal Distribution
5.1 Introduction to Normal Distributions and the Standard Normal Distribution Section Learning objectives: 1. How to interpret graphs of normal probability distributions 2. How to find areas under the
More informationUnit 2 Statistics of One Variable
Unit 2 Statistics of One Variable Day 6 Summarizing Quantitative Data Summarizing Quantitative Data We have discussed how to display quantitative data in a histogram It is useful to be able to describe
More informationCHAPTER 2 Describing Data: Numerical
CHAPTER Multiple-Choice Questions 1. A scatter plot can illustrate all of the following except: A) the median of each of the two variables B) the range of each of the two variables C) an indication of
More information2 Exploring Univariate Data
2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting
More informationLecture 2 Describing Data
Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 05 Normal Distribution So far we have looked at discrete distributions
More informationSTAB22 section 1.3 and Chapter 1 exercises
STAB22 section 1.3 and Chapter 1 exercises 1.101 Go up and down two times the standard deviation from the mean. So 95% of scores will be between 572 (2)(51) = 470 and 572 + (2)(51) = 674. 1.102 Same idea
More information22.2 Shape, Center, and Spread
Name Class Date 22.2 Shape, Center, and Spread Essential Question: Which measures of center and spread are appropriate for a normal distribution, and which are appropriate for a skewed distribution? Eplore
More informationDATA SUMMARIZATION AND VISUALIZATION
APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Checking for
More informationSummarising Data. Summarising Data. Examples of Types of Data. Types of Data
Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017
More informationSection 1.5: Factoring Special Products
Objective: Identify and factor special products including a difference of two perfect squares, perfect square trinomials, and sum and difference of two perfect cubes. When factoring there are a few special
More informationPutting Things Together Part 2
Frequency Putting Things Together Part These exercise blend ideas from various graphs (histograms and boxplots), differing shapes of distributions, and values summarizing the data. Data for, and are in
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Ani Manichaikul amanicha@jhsph.edu 16 April 2007 1 / 40 Course Information I Office hours For questions and help When? I ll announce this tomorrow
More informationWeek 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.
Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.
More informationChapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1
Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and
More informationRandom variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.
Distributions September 17 Random variables Anything that can be measured or categorized is called a variable If the value that a variable takes on is subject to variability, then it the variable is a
More informationthe display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.
1 Insurance data Generalized linear modeling is a methodology for modeling relationships between variables. It generalizes the classical normal linear model, by relaxing some of its restrictive assumptions,
More informationSEX DISCRIMINATION PROBLEM
SEX DISCRIMINATION PROBLEM 5. Displaying Relationships between Variables In this section we will use scatterplots to examine the relationship between the dependent variable (starting salary) and each of
More information9. Logit and Probit Models For Dichotomous Data
Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar
More informationBasic Procedure for Histograms
Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that
More informationx is a random variable which is a numerical description of the outcome of an experiment.
Chapter 5 Discrete Probability Distributions Random Variables is a random variable which is a numerical description of the outcome of an eperiment. Discrete: If the possible values change by steps or jumps.
More informationMonetary Economics Measuring Asset Returns. Gerald P. Dwyer Fall 2015
Monetary Economics Measuring Asset Returns Gerald P. Dwyer Fall 2015 WSJ Readings Readings this lecture, Cuthbertson Ch. 9 Readings next lecture, Cuthbertson, Chs. 10 13 Measuring Asset Returns Outline
More informationThe Normal Distribution
Stat 6 Introduction to Business Statistics I Spring 009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:300:50 a.m. Chapter, Section.3 The Normal Distribution Density Curves So far we
More informationStat 101 Exam 1 - Embers Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.
More informationMeasures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean
Measure of Center Measures of Center The value at the center or middle of a data set 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) 1 2 Mean Notation The measure of center obtained by adding the values
More informationSTAT 157 HW1 Solutions
STAT 157 HW1 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/10/spring/stats157.dir/ Problem 1. 1.a: (6 points) Determine the Relative Frequency and the Cumulative Relative Frequency (fill
More informationProbability distributions
Probability distributions Introduction What is a probability? If I perform n eperiments and a particular event occurs on r occasions, the relative frequency of this event is simply r n. his is an eperimental
More informationData screening, transformations: MRC05
Dale Berger Data screening, transformations: MRC05 This is a demonstration of data screening and transformations for a regression analysis. Our interest is in predicting current salary from education level
More informationSkewness and the Mean, Median, and Mode *
OpenStax-CNX module: m46931 1 Skewness and the Mean, Median, and Mode * OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Consider the following
More informationPoint Estimation. Some General Concepts of Point Estimation. Example. Estimator quality
Point Estimation Some General Concepts of Point Estimation Statistical inference = conclusions about parameters Parameters == population characteristics A point estimate of a parameter is a value (based
More informationChapter 6. Transformation of Variables
6.1 Chapter 6. Transformation of Variables 1. Need for transformation 2. Power transformations: Transformation to achieve linearity Transformation to stabilize variance Logarithmic transformation MACT
More informationMATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)
LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of
More informationProbability distributions relevant to radiowave propagation modelling
Rec. ITU-R P.57 RECOMMENDATION ITU-R P.57 PROBABILITY DISTRIBUTIONS RELEVANT TO RADIOWAVE PROPAGATION MODELLING (994) Rec. ITU-R P.57 The ITU Radiocommunication Assembly, considering a) that the propagation
More informationMath 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment
Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html Math 2311 Class
More informationCopyright 2005 Pearson Education, Inc. Slide 6-1
Copyright 2005 Pearson Education, Inc. Slide 6-1 Chapter 6 Copyright 2005 Pearson Education, Inc. Measures of Center in a Distribution 6-A The mean is what we most commonly call the average value. It is
More informationJacob: What data do we use? Do we compile paid loss triangles for a line of business?
PROJECT TEMPLATES FOR REGRESSION ANALYSIS APPLIED TO LOSS RESERVING BACKGROUND ON PAID LOSS TRIANGLES (The attached PDF file has better formatting.) {The paid loss triangle helps you! distinguish between
More informationAP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE
AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,
More informationChapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)
Starter Ch. 6: A z-score Analysis Starter Ch. 6 Your Statistics teacher has announced that the lower of your two tests will be dropped. You got a 90 on test 1 and an 85 on test 2. You re all set to drop
More informationExample: Histogram for US household incomes from 2015 Table:
1 Example: Histogram for US household incomes from 2015 Table: Income level Relative frequency $0 - $14,999 11.6% $15,000 - $24,999 10.5% $25,000 - $34,999 10% $35,000 - $49,999 12.7% $50,000 - $74,999
More informationEstablishing a framework for statistical analysis via the Generalized Linear Model
PSY349: Lecture 1: INTRO & CORRELATION Establishing a framework for statistical analysis via the Generalized Linear Model GLM provides a unified framework that incorporates a number of statistical methods
More informationEconS Income E ects
EconS 305 - Income E ects Eric Dunaway Washington State University eric.dunaway@wsu.edu September 23, 2015 Eric Dunaway (WSU) EconS 305 - Lecture 13 September 23, 2015 1 / 41 Introduction Over the net
More informationProbability. An intro for calculus students P= Figure 1: A normal integral
Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided
More informationComputing Statistics ID1050 Quantitative & Qualitative Reasoning
Computing Statistics ID1050 Quantitative & Qualitative Reasoning Single-variable Statistics We will be considering six statistics of a data set Three measures of the middle Mean, median, and mode Two measures
More informationH i s t o g r a m o f P ir o. P i r o. H i s t o g r a m o f P i r o. P i r o
fit Lecture 3 Common problem in applications: find a density which fits well an eperimental sample. Given a sample 1,..., n, we look for a density f which may generate that sample. There eist infinitely
More informationWeek 19 Algebra 2 Assignment:
Week 9 Algebra Assignment: Day : pp. 66-67 #- odd, omit #, 7 Day : pp. 66-67 #- even, omit #8 Day : pp. 7-7 #- odd Day 4: pp. 7-7 #-4 even Day : pp. 77-79 #- odd, 7 Notes on Assignment: Pages 66-67: General
More informationFINITE MATH LECTURE NOTES. c Janice Epstein 1998, 1999, 2000 All rights reserved.
FINITE MATH LECTURE NOTES c Janice Epstein 1998, 1999, 2000 All rights reserved. August 27, 2001 Chapter 1 Straight Lines and Linear Functions In this chapter we will learn about lines - how to draw them
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 6 Normal Probability Distributions 6-1 Overview 6-2 The Standard Normal Distribution
More informationDescriptive Analysis
Descriptive Analysis HERTANTO WAHYU SUBAGIO Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable
More informationExploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) Introduction A Need to Explore Your Data The first step of data analysis should always be a detailed examination of the data. The examination of your data is called Exploratory
More informationGraphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics
Graphical and Tabular Methods in Descriptive Statistics MATH 3342 Section 1.2 Descriptive Statistics n Graphs and Tables n Numerical Summaries Sections 1.3 and 1.4 1 Why graph data? n The amount of data
More informationBoth the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.
Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of
More informationCHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =
Solutions to End-of-Section and Chapter Review Problems 225 CHAPTER 6 6.1 (a) P(Z < 1.20) = 0.88493 P(Z > 1.25) = 1 0.89435 = 0.10565 P(1.25 < Z < 1.70) = 0.95543 0.89435 = 0.06108 (d) P(Z < 1.25) or Z
More informationSummary of Statistical Analysis Tools EDAD 5630
Summary of Statistical Analysis Tools EDAD 5630 Test Name Program Used Purpose Steps Main Uses/Applications in Schools Principal Component Analysis SPSS Measure Underlying Constructs Reliability SPSS Measure
More informationWe will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.
We will discuss the normal distribution in greater detail in our unit on probability. However, as it is often of use to use exploratory data analysis to determine if the sample seems reasonably normally
More informationDescriptive Statistics
Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations
More informationSolutions for practice questions: Chapter 9, Statistics
Solutions for practice questions: Chapter 9, Statistics If you find any errors, please let me know at mailto:msfrisbie@pfrisbie.com. 1. We know that µ is the mean of 30 values of y, 30 30 i= 1 2 ( y i
More informationSummary of Information from Recapitulation Report Submittals (DR-489 series, DR-493, Central Assessment, Agricultural Schedule):
County: Martin Study Type: 2014 - In-Depth The department approved your preliminary assessment roll for 2014. Roll approval statistical summary reports and graphics for 2014 are attached for additional
More informationMA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.
MA 115 Lecture 05 - Measures of Spread Wednesday, September 6, 017 Objectives: Introduce variance, standard deviation, range. 1. Measures of Spread In Lecture 04, we looked at several measures of central
More informationIntroduction to Risk, Return and Opportunity Cost of Capital
Introduction to Risk, Return and Opportunity Cost of Capital Thomas Emilsson, --5 Risk, Return and Portfolio Variance Risk and Return When making investments, there needs to be an estimation of the risks
More informationGRAPHS IN ECONOMICS. Appendix. Key Concepts. A Positive Relationship
Appendi GRAPHS IN ECONOMICS Ke Concepts Graphing Data Graphs represent quantit as a distance on a line. On a graph, the horizontal scale line is the -ais, the vertical scale line is the -ais, and the intersection
More informationStatistical Evidence and Inference
Statistical Evidence and Inference Basic Methods of Analysis Understanding the methods used by economists requires some basic terminology regarding the distribution of random variables. The mean of a distribution
More informationSOLUTIONS TO THE LAB 1 ASSIGNMENT
SOLUTIONS TO THE LAB 1 ASSIGNMENT Question 1 Excel produces the following histogram of pull strengths for the 100 resistors: 2 20 Histogram of Pull Strengths (lb) Frequency 1 10 0 9 61 63 6 67 69 71 73
More informationRandom variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.
Distributions February 11 Random variables Anything that can be measured or categorized is called a variable If the value that a variable takes on is subject to variability, then it the variable is a random
More informationBasic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract
Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, 2013 Abstract Review summary statistics and measures of location. Discuss the placement exam as an exercise
More informationSampling Distributions and the Central Limit Theorem
Sampling Distributions and the Central Limit Theorem February 18 Data distributions and sampling distributions So far, we have discussed the distribution of data (i.e. of random variables in our sample,
More informationSection 6-1 : Numerical Summaries
MAT 2377 (Winter 2012) Section 6-1 : Numerical Summaries With a random experiment comes data. In these notes, we learn techniques to describe the data. Data : We will denote the n observations of the random
More informationNCSS Statistical Software. Reference Intervals
Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and
More informationSTATISTICAL DISTRIBUTIONS AND THE CALCULATOR
STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either
More information3.1 Measures of Central Tendency
3.1 Measures of Central Tendency n Summation Notation x i or x Sum observation on the variable that appears to the right of the summation symbol. Example 1 Suppose the variable x i is used to represent
More informationFrequency Distribution Models 1- Probability Density Function (PDF)
Models 1- Probability Density Function (PDF) What is a PDF model? A mathematical equation that describes the frequency curve or probability distribution of a data set. Why modeling? It represents and summarizes
More informationGRAPHS IN ECONOMICS. Appendix. Key Concepts. Graphing Data
Appendix GRAPHS IN ECONOMICS Key Concepts Graphing Data Graphs represent quantity as a distance on a line. On a graph, the horizontal scale line is the x-axis, the vertical scale line is the y-axis, and
More informationDescriptive Statistics (Devore Chapter One)
Descriptive Statistics (Devore Chapter One) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 0 Perspective 1 1 Pictorial and Tabular Descriptions of Data 2 1.1 Stem-and-Leaf
More informationMultiple regression - a brief introduction
Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict
More informationUnit2: Probabilityanddistributions. 3. Normal distribution
Announcements Unit: Probabilityanddistributions 3 Normal distribution Sta 101 - Spring 015 Duke University, Department of Statistical Science February, 015 Peer evaluation 1 by Friday 11:59pm Office hours:
More informationFinancial Econometrics
Financial Econometrics Value at Risk Gerald P. Dwyer Trinity College, Dublin January 2016 Outline 1 Value at Risk Introduction VaR RiskMetrics TM Summary Risk What do we mean by risk? Dictionary: possibility
More informationFrequency Distribution and Summary Statistics
Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai i at Mānoa Outline 1. Stemplot 2. Frequency table 3. Summary
More informationFundamentals of Statistics
CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct
More information2 DESCRIPTIVE STATISTICS
Chapter 2 Descriptive Statistics 47 2 DESCRIPTIVE STATISTICS Figure 2.1 When you have large amounts of data, you will need to organize it in a way that makes sense. These ballots from an election are rolled
More informationIntroduction to Population Modeling
Introduction to Population Modeling In addition to estimating the size of a population, it is often beneficial to estimate how the population size changes over time. Ecologists often uses models to create
More informationSimple Descriptive Statistics
Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Review of previous
More informationSTAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.
STAT 515 -- Chapter 5: Continuous Distributions Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s. Continuous distributions typically are represented by
More informationThe following content is provided under a Creative Commons license. Your support
MITOCW Recitation 6 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make
More informationBusiness Statistics 41000: Probability 4
Business Statistics 41000: Probability 4 Drew D. Creal University of Chicago, Booth School of Business February 14 and 15, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:
More informationSampling Distributions
AP Statistics Ch. 7 Notes Sampling Distributions A major field of statistics is statistical inference, which is using information from a sample to draw conclusions about a wider population. Parameter:
More informationBusiness Statistics 41000: Probability 3
Business Statistics 41000: Probability 3 Drew D. Creal University of Chicago, Booth School of Business February 7 and 8, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office: 404
More information