Business Statistics. University of Chicago Booth School of Business Fall Jeffrey R. Russell

Size: px
Start display at page:

Download "Business Statistics. University of Chicago Booth School of Business Fall Jeffrey R. Russell"

Transcription

1 Business Statistics University of Chicago Booth School of Business Fall 08 Jeffrey R. Russell There is no text book for the course. You may choose to pick up a copy of Statistics for Business and Economics by Newbold Carlson and Thorne (8 th edition) if you feel you would like an outside reference. The course web page will provide our primary communication outside of class: Homework will be given the first lecture of each week and is due the first lecture of the following week.

2 Student Evaluation Grades will be determined by 0% homework, 30% midterm, 60% final exam. Homework due at start of class. You may work in groups of size 3 or less. You may bring a 8.5x cheat sheet to the midterm (onesided) and to the final exam (two sided). Re grade requests should be made in writing within one week of the return of the midterm. It is expected that if you sign up for this section of b stats you will be available for both the midterm and final exam times. Review Sessions Thursdays 3:00 3:50 Harper 3B Thursdays 5:00 5:50 Booth 455 room 3 Review sessions don t meet 9/7,/, and /3. Support for Excel, but you can use any stat package you would like. Time split between computing and questions on material.

3 I will provide real time examples in Excel. You are free to use any software, however, to complete the homework. What is Statistics? Data Real world Models. This is the raw information. Computerization => Lots of data!. Are simplified representations of the real world data.. The allow us to concisely summarize the real world activity 3. They allow us to make predictions about what might happen 3

4 Of course no model can exactly explain real world phenomenon. There is inherent randomness in everything. This introduces challenges in how we chose which model is a good model. Our models will have a random component to capture randomness or uncertainty. Statistics is the link between the models and the data. Data Real world Statistics Models Capture Uncertainty Models make statements about how the world should look.. Estimation: We can use the observed data to figure out which models are most consistent with the data.. Testing: Conversely the right model should make accurate predictions about how the real world data should look. If the models predictions don t look like the real world data, then we can conclude that the model is not a good one. 4

5 Course structure Data Real world Statistics Models Capture Uncertainty. We begin with a discussion of summarizing real world data (means, variances, covariances, graphical summaries etc ).. We next introduce models. Since the real world is uncertain our models will involve uncertainty/probability. 3. We then combine the data with the models to discuss estimation and testing. We begin with very simple models and move to more complex models. 4. We finish with applying these ideas to the regression model..descriptive Statistics. Dotplot and Histogram. The Time Series Plot.3 The Mean and Median.4 Summation Notation.5 The Sample Variance and Standard Deviation.6 The Empirical Rule.7 Other Statistics.8 Covariance and Correlation.9 Linear Functions.0 Mean and Variance of a Linear Function. Linear Combinations. Portfolios.3 Mean and Variance of a Linear Combination.4 Statistics as Estimates (intro) 5

6 . Dotplot and Histogram Let s look at some returns data. Simple Returns The return on an asset is the percentage increase in wealth invested in the asset over a given time period. If you invest B at the beginning of the time period you get E = B+rB=(+r)B at the end of the time period, where r is the return. Clearly, given E and B we can calculate r: r = (E B)/B The Canadian Returns Data Here are 07 monthly returns on a broadbased portfolio of Canadian assets (more on portfolios later). canada Each number corresponds to a month. They are given in time order (go across rows first). Our first observation is.07. In the first month, the return was.07, in the th,.03. 6

7 The Dotplot To display the returns we ll first use a dotplot. For each number simply place a dot above the corresponding point on the number line Canada Returns The returns are centered or located at about.0 The spread or variation in the returns is huge. We also have data on countries other than Canada. Let s compare Canada with Japan. Character Dotplot. : : : : : :. : : : :. : : : : : : : : : : : : : : : : : : : : : : : : :. : : : : : : : : : : : : : : : : canada ::.. :.. :::.:: :.:. : :::.::: :::: : : :.:: :::: :::: :::: : :: : :. : japan

8 It really helps to get things on the same scale. How is Japan different from Canada?. : : : :: :.::: :.: : : :::: :::: ::: :::: :::: :::. : :::: :::: :::: ::: canada.. ::.. :.. :::.:: :.:. : :::.::: :::: : : :.:: :::: :::: :::: : :: : :. : japan Mutual Funds Data Let s use the dotplot to compare returns on some other kinds of assets. We ll look at returns on two different mutual funds, the equally weighted market, and T bills. The equally weighted market is returns on a portfolio where you spread you money out equally over a wide variety of stocks. 8

9 Data on 4 different kinds of returns: Dreyfus growth fund Putman income fund (note that each dot is now points) Equally weighted market T bills (the risk free asset)..:.. ::.:: : : :::::.:. ::::::: ::: ::::::::::: :::::::::::::: ::::::::::::::::..... ::::::::::::::::::: drefus Each dot represents points.. ::..::: :::: ::::: : :::::: :.::::::.::...::::::::::: Putnminc. : : :.: : : :: : : :: :.: ::.:::.::.:...:.::::::::::::::.::::::::::::::::::: :.. :.:::::::::::::::::::::.:.: eqmrkt Each dot represents 7 points : :: :: :: :: :: :: tbill Beer Data nbeerm: the number of beers male mba students claim they can drink without getting drunk nbeerf: same for females Character Dotplot We call a point like this an outlier. : : : :.. : : : :.. :. : : :.: : : : nbeerm.... : : nbeerf Generally the males claim more, their numbers are centered or located at larger values. 9

10 The Histogram Sometimes the dotplot can look rather jumpy. The histogram gives us a smoother picture of the data. The height of each bar tells us how many observations are in the corresponding interval. nbeerf women have number of beers between.5 and women have number of beers in the interval (.5,4.5) Histogram for nbeerf <= >4.5 nbeerf The choice of bin width determines the degree of resolution. A large bin With is very smooth (flat) with no detail. A small bin has more detail, but maybe too much. Here is the histogram of the Canadian returns. Frequency canada Here it is again with the bin width half the size of the one above. Frequency canada 0. There is no special way of determining the bin width. I use my eye. 0

11 . The Time Series Plot We just looked at two data kinds of data. the returns data the number of beers For the returns data, each number corresponds to a month. For the beers data, each number corresponds to a person. The returns data has an important feature the beer data does not. It has an order! There is a first one, a second one, and... A sequence of observations taken over time is called a time series. We could have daily data (closing price stock price) quarterly data (some macro data, like GDP) Monthly data (real estate index prices) Daily data (like daily stock returns) Trade by trade transaction prices. For time series data, the time series plot is an important way to look at our data.

12 The returns data, time series plot of the Canadian returns: Time series plot of canada On the vertical axis we have the return. canada On the horizontal axis we have time Do you see a pattern? date Its also interesting to plot multiple time series on the same plot for comparison.

13 Chicago Household income and Home Prices INCNEW HOME_PRICE.3 Summarizing a Single Numeric Variable We have looked at graphs. Suppose we are now interested in having numerical summaries of the data rather than graphical representations. Two important features of any numeric variable are: ) What is a typical or average value? ) How spread out or variable are the values? 3

14 Monthly returns on Canadian portfolio and Japanese portfolio. They seem to be centered roughly at the same place but Japan has more spread. How can we summarize this? =AVERAGE(nbeerf) 4. =AVERAGE(nbeerm) In the picture, I think of the mean as the center of the data. Character Dotplot.... : : nbeerf. 4. : : : :.. : : : :.. :. : : :. : : : : nbeerm

15 Let s compare the means of the Canadian and Japanese returns. AVERAGE canada AVERAGE japan This is a big difference. It was hard to see this difference in the dotplots because the difference is small compared to the variation. The median is another measure of central tendency Arrange the data in ascending order. After this ranking, the middle number in the data set is the median. If there are an even number of data points then the median is the average of the two center numbers. Example : Median=7 Example 4 5 8: Median=4.5 5

16 Unlike the mean, the median is not affected by outliers. Consider the two data sets: : Median=7 Mean= : Median=7 Mean=4 Mean and Median Comparison The median is often nice to report when there are extreme values in the data Think about US household income (05). Mean=$7,000 Bill Gates Median = $58,476 Income (000 s) 6

17 .4 Summation Notation We are going to be doing a lot of adding up and averaging. To talk about this in general we need some special notation. First of all, we denote a general set of numbers by: x, x, x 3,x n the first number the last number of data points, or the sample size. In general the average of the numbers x is: x x x n n mean x xi n n i We often use the numbers x. x symbol to denote the mean of the We call it x bar. 7

18 Let s look summation in more detail. n i x i means, for each value of i from to n add to the sum the value indicated, in this case x i. add in this value for each i Example: Suppose we have data x and y with n=4. Think of each row as an observation of both x and y. To make things concrete, think of each row as corresponding to a year and let x and y be annual returns on two different assets. x y year In year asset x had return 7%. In year 4 asset y had return 3%. 8

19 The average 4 xi x i 4 4 y yi i But we can do lots of other things with summation notation: x y year xi x i The Sample Variance and Standard Deviation Character Dotplot of x and y: x y The y numbers are more spread out than the x numbers. We want a numerical measure of variation or spread. 9

20 To examine the variation of the data x, we look at: x i x A lot of variation means these numbers are big. In our example, x x x y y y Character Dotplot x x y y

21 Now we need an overall measure of how big the differences are. We can t just sum them because the negative distances cancel out the positive ones. We average the squared distances. n n i ( x x) i For technical reasons (we ll see why later) the sample variance of the data x is defined to be: Sample Variance: s x n ( xi x) n i With large n, there is little difference between dividing by n and (n ). Think of it as the average squared distance from the mean (center).

22 What is the smallest value a variance can be? What are the units of the variance?? It will be helpful to have a measure of spread which is in the original units. The sample standard deviation is: Sample Standard Deviation: s x s x The units of the standard deviation are the same as those of the original data.

23 Let s try these formulas on our example. s y n i n i y y x x x y y y s y The sample standard deviation for the y data is bigger than that of the x data. This numerically captures the fact that y has more variation about it s mean than x. s s x x n i n i x x.0.0 3

24 Returns Example the standard deviations Character Dotplot measure the fact that there is more spread in the Japanese. returns : : : :: :.::: :.: : : :::: :::: ::: :::: :::: :::. : :::: :::: :::: ::: canada.. ::.. :.. :::.:: :.:. : :::.::: :::: : : :.:: :::: :::: :::: : :: : :. : japan Summary measures for selected variables canada japan Count Mean Standard deviation The Empirical Rule We now have the two summaries x s x where the data is The mean is pretty easy to understand. What are the units? how spread out, or variable the data is We know that the bigger s x is, the more variable the data is, but how do we interpret the number? What is a big s x, what is a small one? What are the units of s x? 4

25 The empirical rule will help us understand s x and relate the summaries back to the dotplot (or histogram). Empirical rule: For mound shaped data : Approximately 68% of the data is in the interval ( xs, xs ) xs x x x Approximately 95% of the data is in the interval ( xs, xs ) xs x x x Let s see this with the Canadian returns. x x s x x s x s x The empirical rule says that roughly 95% of the observations are between the dashed line and 68% between the dotted. Looks reasonable. Percent x s x canada x s x 5

26 Comparing Mutual Funds Let s use means and sd s to compare mutual funds. For 9 different assets we compute the mean and sd. Then plot mean vs sd. The assets are: #C - R Drefus (growth) #C- R30 Fidelity Trend fund (growth) #c3- R55 Keystone Speculative fund (max capital gain) #c4- R9 Putnam Income Fund (income) #c5- R99 Scudder Income #c6- R9 Windsor Fund (growth) #c7- equally weighted market #c8- value weighted market #c9- tbill rate # sample period monthly returns : Summary measures for selected variables drefus fidel keystne Putinc scudinc windsr eqmrkt valmrkt tbill Count Mean Standard deviation

27 It is considered good to have a large mean return and a small standard deviation. 0.0 eqmrkt How does this graph relate back to the dotplots we saw before?? Mean tbill windsor drefus valmrkt Putnminc keystne fidel scudinc Standard deviation Let s compare some countries: Based on monthly returns usa singapor france honkong Mean 0.0 canada belgium australi germany finland italy japan Standard deviation 7

28 .7 Other statistics We computed the mean by averaging the data. We computed the variance by essentially taking the average of the squared deviations of the data from the mean. We can take averages of higher powers too like the 3 rd or 4 th power. In doing so, we learn other interesting summary statistics about the data. Consider the S&P500 daily returns 3000 Histogram Frequency Frequency What do you notice about the symmetry? r 8

29 Skewness Skewness another sample statistic that measures the degree of symmetry in the data. Technically, it is computed as: Skew N i N i 3 r r 3 s Where s is the sample standard deviations of the returns and r is the sample average of the returns. When we take the 3 rd power, we retain the sign. When we cube a data point, we make extreme values even more extreme. The skew examines whether there are more extreme value in the right tail or in the left tail. A distribution with a longer right tail will have a positive skew. A distribution with a longer left tail will have a negative skew. Hence, we look at the sign of the skew. 9

30 Longer left tail 3000 Histogram 500 Frequency Frequency More r The skew for this dataset is:.6483 =skew(data) Remember the household income data we spoke about before? That data would have a positive skew because the right tail is longer than the left tail. A perfectly symmetric distribution will have a skew of zero. 30

31 Kurtosis Kurtosis measures how much weight there is in the tails of the distribution relative to the middle (we call this a measure of the fatness of the tails). How is this different from the variance? Let s look at two distributions with the SAME variance, but different relative fatness of the tails Chart Title Series Series Both distributions have a variance of. The Blue distribution has more weight in the middle AND more weight in the tails! 3

32 The kurtosis is defined as: Kurt N i N i r r 4 s 4 The Blue distribution has a kurtosis of 4.5, the red distribution has a kurtosis of 3. The larger the kurtosis the more peaked and fatter the tails of the distribution. The Normal or Gaussian distribution (more to come later) has a Kurtosis of 3. Sometimes we report the Excess Kurtosis. Excess Kurtosis = Kurtosis 3. Hence if the data were normal, the excess kurtosis should be about zero. Distributions with positive excess kurtosis have fatter tails than the Normal. Most returns data have positive excess kurtosis. This is called leptokurtic. A distribution with negative excess kurtosis is called plattykurtic this is rare. 3

33 .8 Covariance and Correlation The mean and sd help us summarize a bunch of numbers which are measurements of just one thing. A fundamental and totally different question is how one thing relates to another. We just used a scatterplot to look at two things: the mean and sd of different assets. In this section of the notes we look at scatterplots and how correlation can be used to summarize them. Example nbeer weight i nbeer # beers without getting drunk vs weight weight Now we think of each pair of numbers as an observation. Each pair corresponds to a person. Each person has two numbers associated with him/her, #beers and weight. 33

34 Example Are returns on a mutual fund related to market returns? 0.4 Each point corresponds to a month. windsor valmrkt In general we have observations ( x, y ) i i the ith observation is a pair of numbers and each point on the plot corresponds to an observation. Our data looks like: x y i The plot enables us to see the relationship between x and y

35 In both examples it does look like there is a relationship. Even more, the relationship looks linear in that it looks like we could draw a line through the plot to capture the pattern. Let me first introduce covariance and correlation then we will consider the interpretation and use of each. The sample covariance between x and y is: s n x x y y n ( )( ) xy i i i The sample correlation between x and y is: r xy s xy ss x y so the correlation is just the covariance divided by the two standard deviations. 35

36 We ll get some intuition about these formulas, but first let s see them in action. How do they summarize data for us? Correlation FACTS: r xy The closer r is to the stronger the linear relationship is with a positive slope. That is, the largest values of y tend to be associated with largest value of x (and vice versa). The closer r is to the stronger the linear relationship is with a negative slope. That is, large values of y tend to be associated with small value of x (and vice versa). The correlations corresponding to the two scatterplots we looked at are: Correlation of valmrkt and windsor = 0.93 Correlation of nbeer and weight = 0.69 The larger correlation between valmrkt and windsor indicates that the linear relationship is stronger. Let s look at some more examples. 36

37 Correlation of y and x = 0.09 y x 3 3 ycorrelation of y and x = x Correlation of y3 and x3 = y x3 3 3 Correlation of y4 and x4 = y x4 3 37

38 What s the correlation here??? Correlation of x and y = 0.00 (basically 0) Correlation only measures the linear relationship. 38

39 Example: The Countries Data Which countries go up and down together? I have data on 3 countries. That would be a lot of plots!!! 0. canada usa 0. To summarize we can compute all pairwise correlations. australi belgium canada finalnd france germany honkong italy belgium 0.89 canada finalnd why is this blank? france germany honkong italy japan usa singapor japan usa usa 0.46 singapor This got wrapped around 39

40 Understanding the Cov and Corr Formulas Why are the formulas for cov and corr capturing the relationship? To get a feeling for this, let s go back to the simple example and compute the correlation. x y First let s compute the covariance: n ( xi x)( yi y) n i (( )(.. 07) ( )( ) ( )( ) ( )( )) 3 (. 0* * (. 0) (. )*. 0 (. 0) * (. 04)) (.... ) (. 00) =.0004 x x x y y y Each of the 4 points makes a contribution to the sum. Let s see which point does what. 40

41 ( x3 x)( y3 y) (. 0)* x ( x x)( y y). 0* y (III) (II) (I) (IV) y x ( x4 x)( y4 y) (. 0)*(. 04). 008 ( x x)( y y). 0*(. 0). 000 Points in (I) have both x and y bigger than their mean so we get a positive contribution. Points in (II) have both less than their mean so we get a positive contribution. In (III) and (IV) one of x and y is less than its mean and the other is greater so we get a negative contribution. The farther out the point is, the bigger the contribution. just a few relatively small contributions Lots of positive contributions windsor valmrkt Lots of positive contributions just a few relatively small contributions 4

42 So, A positive covariance means that when one variable is above it s average the other one tends to be as well. They move up and down together. A negative covariance means that when one is up the other tends to be down. They move in opposite direction. to finish the example, r xy (. 0365)(. 083). 6 The division by the standard deviations standardizes the covariance so that the correlation is always between +/. What are the units of the covariance? What are the units of the correlation? 4

43 .9 Linear Functions In this section of the notes we study the situation where two variables are exactly linear related. That is if I give you the value of x you know exactly what the value of y must be. Example Suppose we have these temps in Celsius and Fahrenheit. cel fahr How are the F values related to the C values? F = 3 + (9/5)C 43

44 .0 Mean and Variance of a Linear Function Suppose y is a linear of function of x. How are the mean and variance (standard deviation) of y related to those of x? Let s look at our temperature example. Information on the Worksheet Summary measures for selected variables cel fahr Suppose we first multiply by (9/5) and then add 3. Count Create a new column, mul, in Excel =(9/5)*cel Let s get some intuition about what happens when we multiply and add numbers to a data set to obtain a new data set. Celsius cel Mul=(9/5)C mul F = 3 + (9/5)C fahr cel fahr mul Mean Standard deviation

45 Suppose then, y c c x 0 y c c x 0 s y c sx s c s y x Example We convert from fractions to percent by taking y 00x What is mean of y compared to x? What is the standard deviation of y compared to? We convert from $ to 000 s of $ by taking y x 000 Now what are the mean and standard deviation of y? s x 45

46 Why? You can verify on your own. y c c x x n i i n y ( c0 cxi ) n 0 i n i c n n c n 0 x i n c x i c x i 0 i s x x n x ( i ) n i s c c x c c x n y ( 0 i 0 ) n i n n i i ( c 0 cx c i 0 cx) n c ( x x) c s n i x. Linear Combinations When a variable y is linearly related to several others, we call it a linear combination. y c0 cxcx ckxk y is a linear combination of the x s. c i is the coefficient of x i. We have k data sets (notation has different meaning here). 46

47 . Portfolios and Linear Combinations Suppose you have $00 to invest. Let x be the return on asset. If x =., and you put all your money into asset you will have $0 at the end of the period. Let x be the return on asset. If x =.5, and you put all your money into asset you will have $5 at the end of the period. Suppose you put / your money into and / into. What will happen? At the end of the period you will have E=E+E where E is the end of period wealth from your investment in asset and E is the end of period wealth from your investment in asset. E=(+.)*.5*00 +(+.5)*.5*00 E E Hence the return on the total investment of $00 is E=(+r)*B=(+.5*.+.5*.5)*00 r r=.5*. +.5*.5=.5. 47

48 To generalize, let w be the fraction of your wealth you invest in asset. Let w be the fraction of your wealth you invest in asset. Let B be the wealth. The w s are called the portfolio weights. What are the restrictions on the weights? Then at the end of the period you have: E x wb x wb w xw w xw B xw xw B r p Hence the portfolio return is, r w x w x p Suppose we have k assets. The return on the ith asset is x i. Put w i fraction of your wealth into asset i.. Your portfolio is determined by the portfolio weights w i. Then the return on the portfolio is: R k wx p i i i What is return on the equally weighted portfolio? 48

49 Example (the country data again) Let s use our country data and suppose that we had put.5 into usa and.5 into Hongkong. What would our returns have been? In Excel, create a new column, entitled port : =.5*(honkong) +.5*(usa) honkong usa port for each month, we get the portfolio return as / hongkong + / usa. How do the returns on this portfolio compare with those of hongkong and usa? It looks like the mean for my portfolio is right in between the means of usa and hongkong. Mean usa port honkong What about the sd? Standard deviation 49

50 Let s try a portfolio with three stocks. The weights must add up to one, but they can be negative, this is called going short in the asset. = -.5*(canada)+ usa +.5*(honkong) 0.0 Clearly, forming portfolios is an interesting thing to do!! Mean usa canada port honkong Standard deviation Why would we form a portfolio? Maybe the port has a nice mean and variance. It would be nice to be able to figure out what the mean and variance are of any portfolio as a function of the portfolio weights. We need some basic equations to relate means and variances of the assets to means and variances of the different possible portfolios obtained by different weights. There are some basic formulas that relate the mean sd of a linear combination to the means, variances, and covariances of the input x variables. 50

51 .3 Mean and Variance of a Linear Combination First we consider the case where we have two inputs. Suppose y c0 cxcx then, y c0 cxcx s c s c s c c s y x x x x Example =.5*(hongkong) +.5*(usa) honkong usa port for each month, we get the portfolio return as / hongkong + / usa. The mean returns on port, usa, and honkong are.074,.0346, and =.5* *.003 5

52 off diagonals are covariances Table of covariances (variances on the diagonal) hongkong usa port hongkong usa port = (.5)*(.5)* (.5)*(.5)*.00 + *(.5)*(.5)*.00 =.5* * *.00 Let s do one more: =.5*(usa)+.75*(hongkong) Table of covariances (variances on the diagonal) honkong usa port honkong usa port = (.5)*(.5)*.00 + (.75)*(.75)*.005+()*(.5)*(.75)*(.0003) 5

53 The variance of the portfolio depends upon the variances of the individual assets as well as the covariance. The covariance tells us something about the relationship between the returns on the individual assets. Why should this relationship matter? We ll look at several examples to get a feel. Example -0. y =.5x +.5 x At each point we plot the value of y. The variances and covariance are: x x x x x The variance of y is - 0 x =.5*.5* *.5*.06 +*.5*.5*( ) Why is the variance of y so much smaller than those of the x s? 53

54 Example y =.5x +.5 x At each point we plot the value of y. The variances and covariance are: x x x.5867 x x x The variance of y is.053 =.5*.5* *.5*.96 + *.5*.5*.0465 Why is the variance of y similar to those of the x s? Example y =.5x +.5 x At each point we plot the value of y. The variances and covariance are: x x x x x The variance of y is x The dashed lines are drawn at the mean of x and x =.5*.5* *.5* *.5*.5*.976 Why is the variance of y less than that of x and x? 54

55 Suppose y c c x c x c x c x then, y c c x c x c x c x k k k k s y c s x c s x... c all the possible combinations of covariance terms k s x k Example y c0 cxcx c3x3 y c0 cxcx c3x3 s c s c s c s y x x 3 x 3 c cs c cs c cs xx 3 xx 3 x x

56 Example =.*(fidel)+.4*(eqmrkt)+.5*(windsor) Table of covariances (variances on the diagonal) fidel windsor eqmrkt port fidel windsor eqmrkt port = (.)*(.)* (.4)*(.4)* (.5)*(.5)* *((.)*(.4)* (.)*(.5)*.004+(.4)*(.5)*.0099) Example By forming portfolios we can reduce variance of returns!!! This means people should hold portfolios!!!! 56

57 Cut from a Finance Textbook: 57

58 .4 Statistics as Estimates Consider the following sample averages of the daily S&P500 returns data using different time periods Sample Mean 980 s s s present The different samples give us different values of the sample mean. If you are thinking of investing in the S&P500 over the next decade, which of these should characterize the rate of return you should expect to get? 58

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html Math 2311 Class

More information

Descriptive Statistics (Devore Chapter One)

Descriptive Statistics (Devore Chapter One) Descriptive Statistics (Devore Chapter One) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 0 Perspective 1 1 Pictorial and Tabular Descriptions of Data 2 1.1 Stem-and-Leaf

More information

Business Statistics 41000: Probability 3

Business Statistics 41000: Probability 3 Business Statistics 41000: Probability 3 Drew D. Creal University of Chicago, Booth School of Business February 7 and 8, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office: 404

More information

Numerical Descriptive Measures. Measures of Center: Mean and Median

Numerical Descriptive Measures. Measures of Center: Mean and Median Steve Sawin Statistics Numerical Descriptive Measures Having seen the shape of a distribution by looking at the histogram, the two most obvious questions to ask about the specific distribution is where

More information

Lecture 2 Describing Data

Lecture 2 Describing Data Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, 2013 Abstract Review summary statistics and measures of location. Discuss the placement exam as an exercise

More information

CHAPTER 2 Describing Data: Numerical

CHAPTER 2 Describing Data: Numerical CHAPTER Multiple-Choice Questions 1. A scatter plot can illustrate all of the following except: A) the median of each of the two variables B) the range of each of the two variables C) an indication of

More information

Simple Descriptive Statistics

Simple Descriptive Statistics Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency

More information

Business Statistics 41000: Probability 4

Business Statistics 41000: Probability 4 Business Statistics 41000: Probability 4 Drew D. Creal University of Chicago, Booth School of Business February 14 and 15, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:

More information

Describing Data: One Quantitative Variable

Describing Data: One Quantitative Variable STAT 250 Dr. Kari Lock Morgan The Big Picture Describing Data: One Quantitative Variable Population Sampling SECTIONS 2.2, 2.3 One quantitative variable (2.2, 2.3) Statistical Inference Sample Descriptive

More information

IOP 201-Q (Industrial Psychological Research) Tutorial 5

IOP 201-Q (Industrial Psychological Research) Tutorial 5 IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,

More information

2 Exploring Univariate Data

2 Exploring Univariate Data 2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting

More information

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3 Sections from Text and MIT Video Lecture: Sections 2.1 through 2.5 http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systemsanalysis-and-applied-probability-fall-2010/video-lectures/lecture-1-probability-models-and-axioms/

More information

A useful modeling tricks.

A useful modeling tricks. .7 Joint models for more than two outcomes We saw that we could write joint models for a pair of variables by specifying the joint probabilities over all pairs of outcomes. In principal, we could do this

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

1 Describing Distributions with numbers

1 Describing Distributions with numbers 1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write

More information

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative STAT:10 Statistical Methods and Computing Normal Distributions Lecture 4 Feb. 6, 17 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowa.edu 1 2 Using density curves to describe the distribution of values of

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Checking for

More information

Frequency Distribution and Summary Statistics

Frequency Distribution and Summary Statistics Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai i at Mānoa Outline 1. Stemplot 2. Frequency table 3. Summary

More information

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics Graphical and Tabular Methods in Descriptive Statistics MATH 3342 Section 1.2 Descriptive Statistics n Graphs and Tables n Numerical Summaries Sections 1.3 and 1.4 1 Why graph data? n The amount of data

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Ani Manichaikul amanicha@jhsph.edu 16 April 2007 1 / 40 Course Information I Office hours For questions and help When? I ll announce this tomorrow

More information

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual. Chapter 06: The Standard Deviation as a Ruler and the Normal Model This is the worst chapter title ever! This chapter is about the most important random variable distribution of them all the normal distribution.

More information

Establishing a framework for statistical analysis via the Generalized Linear Model

Establishing a framework for statistical analysis via the Generalized Linear Model PSY349: Lecture 1: INTRO & CORRELATION Establishing a framework for statistical analysis via the Generalized Linear Model GLM provides a unified framework that incorporates a number of statistical methods

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Review of previous

More information

2. Modeling Uncertainty

2. Modeling Uncertainty 2. Modeling Uncertainty Models for Uncertainty (Random Variables): Big Picture We now move from viewing the data to thinking about models that describe the data. Since the real world is uncertain, our

More information

appstats5.notebook September 07, 2016 Chapter 5

appstats5.notebook September 07, 2016 Chapter 5 Chapter 5 Describing Distributions Numerically Chapter 5 Objective: Students will be able to use statistics appropriate to the shape of the data distribution to compare of two or more different data sets.

More information

Business Statistics Midterm Exam Fall 2013 Russell

Business Statistics Midterm Exam Fall 2013 Russell Name Business Statistics Midterm Exam Fall 2013 Russell Do not turn over this page until you are told to do so. You will have 2 hours to complete the exam. There are a total of 100 points divided into

More information

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of

More information

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.) Starter Ch. 6: A z-score Analysis Starter Ch. 6 Your Statistics teacher has announced that the lower of your two tests will be dropped. You got a 90 on test 1 and an 85 on test 2. You re all set to drop

More information

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley. Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1

More information

STAB22 section 1.3 and Chapter 1 exercises

STAB22 section 1.3 and Chapter 1 exercises STAB22 section 1.3 and Chapter 1 exercises 1.101 Go up and down two times the standard deviation from the mean. So 95% of scores will be between 572 (2)(51) = 470 and 572 + (2)(51) = 674. 1.102 Same idea

More information

Descriptive Statistics

Descriptive Statistics Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations

More information

Exploring Data and Graphics

Exploring Data and Graphics Exploring Data and Graphics Rick White Department of Statistics, UBC Graduate Pathways to Success Graduate & Postdoctoral Studies November 13, 2013 Outline Summarizing Data Types of Data Visualizing Data

More information

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives Basic Statistics for the Healthcare Professional 1 F R A N K C O H E N, M B B, M P A D I R E C T O R O F A N A L Y T I C S D O C T O R S M A N A G E M E N T, LLC Purpose of Statistic 2 Provide a numerical

More information

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics You can t see this text! Introduction to Computational Finance and Financial Econometrics Descriptive Statistics Eric Zivot Summer 2015 Eric Zivot (Copyright 2015) Descriptive Statistics 1 / 28 Outline

More information

Example: Histogram for US household incomes from 2015 Table:

Example: Histogram for US household incomes from 2015 Table: 1 Example: Histogram for US household incomes from 2015 Table: Income level Relative frequency $0 - $14,999 11.6% $15,000 - $24,999 10.5% $25,000 - $34,999 10% $35,000 - $49,999 12.7% $50,000 - $74,999

More information

Software Tutorial ormal Statistics

Software Tutorial ormal Statistics Software Tutorial ormal Statistics The example session with the teaching software, PG2000, which is described below is intended as an example run to familiarise the user with the package. This documented

More information

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and

More information

STAT 157 HW1 Solutions

STAT 157 HW1 Solutions STAT 157 HW1 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/10/spring/stats157.dir/ Problem 1. 1.a: (6 points) Determine the Relative Frequency and the Cumulative Relative Frequency (fill

More information

David Tenenbaum GEOG 090 UNC-CH Spring 2005

David Tenenbaum GEOG 090 UNC-CH Spring 2005 Simple Descriptive Statistics Review and Examples You will likely make use of all three measures of central tendency (mode, median, and mean), as well as some key measures of dispersion (standard deviation,

More information

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management BA 386T Tom Shively PROBABILITY CONCEPTS AND NORMAL DISTRIBUTIONS The fundamental idea underlying any statistical

More information

I. Return Calculations (20 pts, 4 points each)

I. Return Calculations (20 pts, 4 points each) University of Washington Winter 015 Department of Economics Eric Zivot Econ 44 Midterm Exam Solutions This is a closed book and closed note exam. However, you are allowed one page of notes (8.5 by 11 or

More information

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed. We will discuss the normal distribution in greater detail in our unit on probability. However, as it is often of use to use exploratory data analysis to determine if the sample seems reasonably normally

More information

Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation,

Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation, Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation, Hour 2 Hypothesis testing for correlation (Pearson) Correlation and regression. Correlation vs association

More information

Skewness and the Mean, Median, and Mode *

Skewness and the Mean, Median, and Mode * OpenStax-CNX module: m46931 1 Skewness and the Mean, Median, and Mode * OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Consider the following

More information

Putting Things Together Part 2

Putting Things Together Part 2 Frequency Putting Things Together Part These exercise blend ideas from various graphs (histograms and boxplots), differing shapes of distributions, and values summarizing the data. Data for, and are in

More information

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either

More information

How Wealthy Are Europeans?

How Wealthy Are Europeans? How Wealthy Are Europeans? Grades: 7, 8, 11, 12 (course specific) Description: Organization of data of to examine measures of spread and measures of central tendency in examination of Gross Domestic Product

More information

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION Subject Paper No and Title Module No and Title Paper No.2: QUANTITATIVE METHODS Module No.7: NORMAL DISTRIBUTION Module Tag PSY_P2_M 7 TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Properties

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 05 Normal Distribution So far we have looked at discrete distributions

More information

NCSS Statistical Software. Reference Intervals

NCSS Statistical Software. Reference Intervals Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and

More information

ECON 214 Elements of Statistics for Economists

ECON 214 Elements of Statistics for Economists ECON 214 Elements of Statistics for Economists Session 7 The Normal Distribution Part 1 Lecturer: Dr. Bernardin Senadza, Dept. of Economics Contact Information: bsenadza@ug.edu.gh College of Education

More information

STAB22 section 2.2. Figure 1: Plot of deforestation vs. price

STAB22 section 2.2. Figure 1: Plot of deforestation vs. price STAB22 section 2.2 2.29 A change in price leads to a change in amount of deforestation, so price is explanatory and deforestation the response. There are no difficulties in producing a plot; mine is in

More information

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s). We will look the three common and useful measures of spread. The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s). 1 Ameasure of the center

More information

The Normal Distribution

The Normal Distribution Stat 6 Introduction to Business Statistics I Spring 009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:300:50 a.m. Chapter, Section.3 The Normal Distribution Density Curves So far we

More information

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,

More information

Mathematics 1000, Winter 2008

Mathematics 1000, Winter 2008 Mathematics 1000, Winter 2008 Lecture 4 Sheng Zhang Department of Mathematics Wayne State University January 16, 2008 Announcement Monday is Martin Luther King Day NO CLASS Today s Topics Curves and Histograms

More information

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Measures of Dispersion (Range, standard deviation, standard error) Introduction Measures of Dispersion (Range, standard deviation, standard error) Introduction We have already learnt that frequency distribution table gives a rough idea of the distribution of the variables in a sample

More information

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw MAS1403 Quantitative Methods for Business Management Semester 1, 2018 2019 Module leader: Dr. David Walshaw Additional lecturers: Dr. James Waldren and Dr. Stuart Hall Announcements: Written assignment

More information

Probability & Statistics Modular Learning Exercises

Probability & Statistics Modular Learning Exercises Probability & Statistics Modular Learning Exercises About The Actuarial Foundation The Actuarial Foundation, a 501(c)(3) nonprofit organization, develops, funds and executes education, scholarship and

More information

Chapter 18: The Correlational Procedures

Chapter 18: The Correlational Procedures Introduction: In this chapter we are going to tackle about two kinds of relationship, positive relationship and negative relationship. Positive Relationship Let's say we have two values, votes and campaign

More information

LINEAR COMBINATIONS AND COMPOSITE GROUPS

LINEAR COMBINATIONS AND COMPOSITE GROUPS CHAPTER 4 LINEAR COMBINATIONS AND COMPOSITE GROUPS So far, we have applied measures of central tendency and variability to a single set of data or when comparing several sets of data. However, in some

More information

Probability. An intro for calculus students P= Figure 1: A normal integral

Probability. An intro for calculus students P= Figure 1: A normal integral Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided

More information

Module 6 Portfolio risk and return

Module 6 Portfolio risk and return Module 6 Portfolio risk and return Prepared by Pamela Peterson Drake, Ph.D., CFA 1. Overview Security analysts and portfolio managers are concerned about an investment s return, its risk, and whether it

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Lecture 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Summary Statistic Consider as an example of our analysis

More information

Section 6-1 : Numerical Summaries

Section 6-1 : Numerical Summaries MAT 2377 (Winter 2012) Section 6-1 : Numerical Summaries With a random experiment comes data. In these notes, we learn techniques to describe the data. Data : We will denote the n observations of the random

More information

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet. 1 Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet. Warning to the Reader! If you are a student for whom this document is a historical artifact, be aware that the

More information

Lecture 6: Non Normal Distributions

Lecture 6: Non Normal Distributions Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192 Financial Econometrics Spring 2015 Overview Non-normalities in (standardized) residuals from asset return

More information

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line. Introduction We continue our study of descriptive statistics with measures of dispersion, such as dot plots, stem and leaf displays, quartiles, percentiles, and box plots. Dot plots, a stem-and-leaf display,

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

Fundamentals of Statistics

Fundamentals of Statistics CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct

More information

Business Statistics Midterm Exam Fall 2013 Russell

Business Statistics Midterm Exam Fall 2013 Russell Name SOLUTION Business Statistics Midterm Exam Fall 2013 Russell Do not turn over this page until you are told to do so. You will have 2 hours to complete the exam. There are a total of 100 points divided

More information

Section 2.2 One Quantitative Variable: Shape and Center

Section 2.2 One Quantitative Variable: Shape and Center Section 2.2 One Quantitative Variable: Shape and Center Outline One Quantitative Variable Visualization: dotplot and histogram Shape: symmetric, skewed Measures of center: mean and median Outliers and

More information

3.1 Measures of Central Tendency

3.1 Measures of Central Tendency 3.1 Measures of Central Tendency n Summation Notation x i or x Sum observation on the variable that appears to the right of the summation symbol. Example 1 Suppose the variable x i is used to represent

More information

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012 The Normal Distribution & Descriptive Statistics Kin 304W Week 2: Jan 15, 2012 1 Questionnaire Results I received 71 completed questionnaires. Thank you! Are you nervous about scientific writing? You re

More information

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis Descriptive Statistics (Part 2) 4 Chapter Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis McGraw-Hill/Irwin Copyright 2009 by The McGraw-Hill Companies, Inc. Chebyshev s Theorem

More information

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of

More information

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny. Distributions February 11 Random variables Anything that can be measured or categorized is called a variable If the value that a variable takes on is subject to variability, then it the variable is a random

More information

M249 Diagnostic Quiz

M249 Diagnostic Quiz THE OPEN UNIVERSITY Faculty of Mathematics and Computing M249 Diagnostic Quiz Prepared by the Course Team [Press to begin] c 2005, 2006 The Open University Last Revision Date: May 19, 2006 Version 4.2

More information

We use probability distributions to represent the distribution of a discrete random variable.

We use probability distributions to represent the distribution of a discrete random variable. Now we focus on discrete random variables. We will look at these in general, including calculating the mean and standard deviation. Then we will look more in depth at binomial random variables which are

More information

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12) Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12) Descriptive statistics: - Measures of centrality (Mean, median, mode, trimmed mean) - Measures of spread (MAD, Standard deviation, variance) -

More information

Review: Types of Summary Statistics

Review: Types of Summary Statistics Review: Types of Summary Statistics We re often interested in describing the following characteristics of the distribution of a data series: Central tendency - where is the middle of the distribution?

More information

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine Models of Patterns Lecture 3, SMMD 2005 Bob Stine Review Speculative investing and portfolios Risk and variance Volatility adjusted return Volatility drag Dependence Covariance Review Example Stock and

More information

Confidence Intervals. σ unknown, small samples The t-statistic /22

Confidence Intervals. σ unknown, small samples The t-statistic /22 Confidence Intervals σ unknown, small samples The t-statistic 1 /22 Homework Read Sec 7-3. Discussion Question pg 365 Do Ex 7-3 1-4, 6, 9, 12, 14, 15, 17 2/22 Objective find the confidence interval for

More information

Statistics, Measures of Central Tendency I

Statistics, Measures of Central Tendency I Statistics, Measures of Central Tendency I We are considering a random variable X with a probability distribution which has some parameters. We want to get an idea what these parameters are. We perfom

More information

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions YEAR 12 Trial Exam Paper 2018 FURTHER MATHEMATICS Written examination 1 Worked solutions This book presents: worked solutions explanatory notes tips on how to approach the exam. This trial examination

More information

starting on 5/1/1953 up until 2/1/2017.

starting on 5/1/1953 up until 2/1/2017. An Actuary s Guide to Financial Applications: Examples with EViews By William Bourgeois An actuary is a business professional who uses statistics to determine and analyze risks for companies. In this guide,

More information

On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2

On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2 Continuous Random Variable If I spin a spinner, what is the probability the pointer lands... On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2 )? 360 = 1 180.

More information

Section-2. Data Analysis

Section-2. Data Analysis Section-2 Data Analysis Short Questions: Question 1: What is data? Answer: Data is the substrate for decision-making process. Data is measure of some ad servable characteristic of characteristic of a set

More information

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range. MA 115 Lecture 05 - Measures of Spread Wednesday, September 6, 017 Objectives: Introduce variance, standard deviation, range. 1. Measures of Spread In Lecture 04, we looked at several measures of central

More information

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Categorical. A general name for non-numerical data; the data is separated into categories of some kind. Chapter 5 Categorical A general name for non-numerical data; the data is separated into categories of some kind. Nominal data Categorical data with no implied order. Eg. Eye colours, favourite TV show,

More information

STAT 113 Variability

STAT 113 Variability STAT 113 Variability Colin Reimer Dawson Oberlin College September 14, 2017 1 / 48 Outline Last Time: Shape and Center Variability Boxplots and the IQR Variance and Standard Deviaton Transformations 2

More information

STT 315 Handout and Project on Correlation and Regression (Unit 11)

STT 315 Handout and Project on Correlation and Regression (Unit 11) STT 315 Handout and Project on Correlation and Regression (Unit 11) This material is self contained. It is an introduction to regression that will help you in MSC 317 where you will study the subject in

More information

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017

More information

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form: 1 Exercise One Note that the data is not grouped! 1.1 Calculate the mean ROI Below you find the raw data in tabular form: Obs Data 1 18.5 2 18.6 3 17.4 4 12.2 5 19.7 6 5.6 7 7.7 8 9.8 9 19.9 10 9.9 11

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Let s make our own sampling! If we use a random sample (a survey) or if we randomly assign treatments to subjects (an experiment) we can come up with proper, unbiased conclusions

More information

Math 227 Elementary Statistics. Bluman 5 th edition

Math 227 Elementary Statistics. Bluman 5 th edition Math 227 Elementary Statistics Bluman 5 th edition CHAPTER 6 The Normal Distribution 2 Objectives Identify distributions as symmetrical or skewed. Identify the properties of the normal distribution. Find

More information