Chapter 11 Part 6. Correlation Continued. LOWESS Regression

Size: px
Start display at page:

Download "Chapter 11 Part 6. Correlation Continued. LOWESS Regression"

Transcription

1 Chapter 11 Part 6 Correlation Continued LOWESS Regression February 17, 2009

2 Goal: To review the properties of the correlation coefficient. To introduce you to the various tools that can be used to decide if a distribution is normally distributed. To introduce you to tests of normality, skewness and kurtosis. To introduce you to the use of LOWESS regression as a diagnostic tool. To introduce you to looking carefully at your data. Skills: You should be able to determine whether or not a distribution is normal. You should know how to use and interpret both parametric and non-parametric correlation coefficients. You should know how to obtain and interpret a LOWESS line. You should now have the experience to know better than to just run Stata commands without looking at the data. Commands: pnorm qnorm sfrancia swilk sktest Dataset: allhat1000old.dta GreeneTouchold.dta Example151Goodold.dta

3 Last class we looked at the correlation (i.e. linear relationship) between the baseline diastolic blood pressure and weight at baseline. At the time I pointed out that in order to statistically test whether there was a linear relationship, both of the variables (DBP and weight) would have to be normally distributed. Last class we just focused on how to get and interpret correlation coefficients and ignored the need for normality. Today we are going back to check on the normality of the variables. We are going to first consider graphical techniques for deciding if the variables are normally distributed. For each of baseline DBP and baseline weight I have plotted a histogram, a Q-Q plot (quantiles of the normal distribution, Stata command qnorm) and a standardized normal probability plot (Stata command pnorm). Notice that for both the Q-Q plot (also called a quantile-normal plot)and the standardized normal probability plot the straight line represents the normal distribution. So ideally all of our points would fall on this normal line. Deviations from the line are indications of non-normality. There is no statistical test that goes with these plots although there are statistical tests for normality. So we just look at the plots and make our best judgement. The Q-Q plot is sensitive to non-normality near the tails of the plot. The standardized normal probability plot is sensitive to non-normality in the middle range of the data. Below I have added baseline triglycerides to the mixed because it is skewed in a different way from the other two variables. So we will include the correlation of triglycerides with the other two variables. After using the plots to consider normality we are going to use the summarize command with detail to get the skewness and kurtosis of each variable. Finally, we are going to use the Shapiro-Wilk normality test (for data sets where 4 n 2000, the Stata command is swilk), the Shapiro-Francia normality test (for 5 n 5000, the Stata command is sfrancia) and a skewness and kurtosis test for normality (the Stata command is sktest). Usually statistical tests are considered better than plots but in this case the plots are actually preferred. Page -1-

4 Frequency Baseline visit 1 DBP allhat1000.dta Avg DBP at Visit 1 Notice the tall bars at 80 mm Hg and 90 mm Hg. This indicates what is called a digit preference (i.e. people tend to round off to 80 and 90). The distribution is skewed to the left. Avg DBP at Visit Baseline visit 1 DBP Quantiles of normal distribution plot Inverse Normal The Stata command is "qnorm". This plot is sensitive to non-normality near the tails. This is also called a Q-Q plot or a normal quantile plot. Notice that the histogram is skewed to the left and the qnorm plot curves downward. The plot deviates from the normal line more on the left end of the graph. We know how to use the dropdown menus to get the histogram but how do we use them to get the Q-Q plot? Page -2-

5 You have the option of selecting a plot type (3 rd tab from the left), but the usual choice is the scatterplot which in this case is the default plot. The scatterplot is what is pictured for the Q-Q plot on the page above. Page -3-

6 The Stata command is "pnorm". The plot is sensitive to non-normality in the middle range of the data. The 2 spots pointed out by the arrows are probably related to the tall bars at 80 and 90 mm Hg. The plot above is obtained using the normal probability plot (pnorm). Page -4-

7 The distribution of weight is skewed to the right. The Stata command is qnorm. This plot is sensitive to non-normality near the tails. This is called the Q-Q plot. Notice that histogram is skewed to the right and the qnorm plot curves upward. The Stata command is pnorm. The plot is sensitive to non-normality in the middle range of the data. Page -5-

8 Frequency Baseline triglycerides allhat1000.dta Baseline triglycerides for the antihypertensive study Notice that the triglyceride distribution is very skewed to the right. Triglycerides-BL Anti Baseline triglycerides for the antihypertensive study Quantiles of normal distribution plot Inverse Normal The Stata command is "qnorm". This plot is sensitive to non-normality near the tails. This is also called a Q-Q plot. Notice that the histogram is skewed to the right and the qnorm plot curves upward. Notice that the spaced out dots on the right side of the plot go with the spaced out bars in the histogram above. Normal F[(ATRIG-m)/s] Baseline triglycerides for the antihypertension study Standardized normal probability plot Empirical P[i] = i/(n+1) The Stata command is "pnorm". The plot is sensitive to non-normality in the middle range of the data. Page -6-

9 A distribution can be skewed to both the left and the right in which case the curve in the Q-Q plot can go up at one end and down at the other.. sum(bv1dbp),det Avg DBP at Visit Percentiles Smallest 1% % % Obs % Sum of Wgt % 85 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis sum(blwgt),det Weight(lbs) at Baseline Percentiles Smallest 1% % % Obs % Sum of Wgt % 178 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis sum(atrig),det Triglycerides-BL Anti Percentiles Smallest 1% % % Obs % Sum of Wgt % 138 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis Page -7-

10 Variable Skewness Kurtosis baseline DBP baseline weight baseline triglycerides (TG) DBP has skewness < 0 so it is skewed to the left (something we have already seen in the graph). Weight and TG are both skewed to the right (i.e. skewness > 0). DBP is the least skewed (i.e. the skewness value is the closest to zero) and TG is the most skewed (i.e. the skewness value is the furthest from 0, the skewness value of the normal distribution). The kurtosis value for the normal distribution is 3. The DBP has the kurtosis value the closest to 3 and weight has the kurtosis value the most distant from 3. Tests for normality and for skewness and kurtosis: help swilk, help sfrancia dialogs: swilk sfrancia Title Syntax [R] swilk -- Shapiro-Wilk and Shapiro-Francia tests for normality Shapiro-Wilk normality test swilk varlist [if] [in] [, options] Shapiro-Francia normality test sfrancia varlist [if] [in] Description swilk performs the Shapiro-Wilk W test for normality, and sfrancia performs the Shapiro-Francia W' test for normality. swilk can be used with 4<=n<=2,000 observations, and sfrancia can be used with 5<=n<=5,000 observations; see [R] sktest for a test allowing more observations. Page -8-

11 help sktest dialog: sktest Title Syntax [R] sktest -- Skewness and kurtosis test for normality Description sktest varlist [if] [in] [weight] [, noadjust] aweights and fweights are allowed; see weight. For each variable in varlist, sktest presents a test for normality based on skewness and another based on kurtosis and then combines the two tests into an overall test statistic. sktest requires a minimum of 8 observations to make its calculations. Option Main noadjust suppresses the empirical adjustment made by Royston (1991) to the overall chi-squared and its significance level and presents the unaltered test as described by D'Agostino, Balanger, and D'Agostino Jr. (1990). I usually go with the default since Stata usually chooses as the default the most commonly used statistic. For each of the three normality tests above, the null hypothesis is that the distribution is normal. So if we reject the null hypothesis we have declared that the distribution is not normal. The skewness and kurtosis test tests the skewness and kurtosis individually, as well as, presenting a combined test for overall normality. You ll find the Shapiro-Francia and Shapiro-Wilks tests under swilk in the Stata manuals. For the Shapiro-Francia test the W ' V ' W ' is the test statistic and the is a transform of the. Both give the same information. The median value of. sfrancia BV1DBP BLWGT ATRIG V ' is 1 and large values indicate non-normality. Shapiro-Francia W' test for normal data Variable Obs W' V' z Prob>z BV1DBP BLWGT ATRIG The interpretation of the W and V for the Shapiro-Wilk test is similar to that of the Page -9-

12 test statistics of the Shapiro-Francia test.. swilk BV1DBP BLWGT ATRIG Shapiro-Wilk W test for normal data Variable Obs W V z Prob>z BV1DBP BLWGT ATRIG The sktest skewness and kurtosis test for normality you ll find under sktest in the Stata manuals.. sktest BV1DBP BLWGT ATRIG Skewness/Kurtosis tests for Normality joint Variable Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi BV1DBP BLWGT ATRIG Well in each case we have rejected the normality of the variables. Last time we considered these same three variables using the spearman and ktau commands for the non-parametric tests Spearman and Kendall s tau. For both tests we failed to reject normality for each of the 3 variables. The parametric tests are considered too sensitive. So what should we do? The histograms are somewhat skewed but the Q-Q plots and the standardized normal probability plots aren t too bad and the nonparametric tests say the normality is ok. I would just report the parametric correlation results given below.. pwcorr BV1DBP BLWGT ATRIG,obs sig BV1DBP BLWGT ATRIG BV1DBP BLWGT ATRIG Each of the three p-values say that we fail to reject the null hypothesis that the correlation is equal to zero (i.e. the pairs of variables are not linearly correlated). Page -10-

13 An example to try to convince you that the assumption behind the linear regression graph with the normal curves is not as strange as it might seem. μ yxt μ yxp x t x p I have repeatedly used the graph above so you would get the idea that the regression line is a line through the means of a series of normal curves. Below I have used the data set allhat1000.dta to try to show you with real data that a line through means makes sense. In the plot below the Visit 1 DBP for 1000 people is on the x-axis and the Visit 2 DBP for the same 1000 people is on the y-axis. The dataset is allhat1000.dta. Notice that for Visit 1 DBP = 70 there are 27 values for Visit 2 DBP with mean = 77.3, for Visit 1 DBP = 80 there are 95 values for Visit 2 DBP with mean = 82.2 and for Visit 1 DBP = 90 there are 108 values for Visit 2 DBP with mean = sum(bv2dbp) if BV1DBP == 70 Variable Obs Mean Std. Dev. Min Max BV2DBP sum(bv2dbp) if BV1DBP == 80 Variable Obs Mean Std. Dev. Min Max BV2DBP sum(bv2dbp) if BV1DBP == 90 Variable Obs Mean Std. Dev. Min Max BV2DBP The plot below gives the scatter of Visit 1 (x-axis) and Visit 2 (y-axis) values of DBP. The solid dots are for Visit 1 DBP = 70, 80 and 90. The squares are the points (70, 77.3), (80, 82.2) and (90, 87.8). Page -11-

14 Visit 2 DBP mmhg Visit 1 DBP mmhg Below I have added the least squares regression line and 3 more points. The additional points (x = 82, 85 and 100) were obtained in the same manner as the 3 in the graph above. Notice that the 6 points are not so far off the regression line people for whom DBP measured at 2 different time points Diastolic Blood Pressure at Visit The dark circles are the points ( xy, x ) For x = 70, 80, 82, 85, 90 and Diastolic Blood Pressure at Visit 1 Page -12-

15 Now of course we are dealing with a finite number of y values at each value of x, whereas for the theoretical graph there would be a whole distribution's worth of y- values. But I hope you can see that the idea of a normal distribution of y's at each value of x with the mean of the distribution of y's being on the regression line does make sense (we of course can't show the normal part). LOWESS regression (a diagnostic tool for regression): I have mentioned LOWESS regression before and even graphed one but haven't given you any real details. Ordinary Least Squares regression fits a line to the data even if it is clear that a line is not appropriate (see page 18 and 19 below). LOWESS regression doesn't make any model assumptions. The LOWESS (locally weighted scatterplot smoother) regression curve is one of our diagnostic tools for regression. The idea is that each point ( xi, yi) in the dataset is fitted to a separate linear regression line based on adjacent observations. These points are weighted so that the further away the x value is from x i, the less impact it has on determining the estimate $y i estimate is called the bandwidth. $y i. The proportion of the total data that is used to create each In Stata the default bandwidth is 0.8 (i.e. 80% of the dataset) which works for mid-size data sets. For large data sets using a bandwidth of 0.3 or 0.4 is recommended; a bandwidth of 0.99 is recommended for small data sets. The wider the bandwidth the smoother the curve. Narrow bandwidths produce curves that are more sensitive to local perturbations in the data. The recommendation is to experiment with different bandwidths. (William D. Dupont, Statistical Modeling for Biomedical Researchers, Cambridge University Press, 2002). There is no statistical test related to LOWESS regression. We simply eyeball the graphs. The LOWESS curve below shows that fitting a line to the DBP visit 1 and visit 2 data is a pretty good idea. Page -13-

16 Lowess Curve and OLS Regression Line DBP at Visit Lowess curve is solid Regression line is dashed DBP at Visit 1 Below we are still using the data set allhat1000.dta, but now baseline weight is the predictor and visit 1 DBP is the outcome. DBP at Visit Lowess Curve and OLS Regression Line Regression line = dashed Lowess curve = solid Weight at baseline (lbs) Page -14-

17 The LOWESS curve immediately above is pretty flat. Below I have given y (i.e. the mean of the visit 1 DBP = 84.5) and the LOWESS curve. DBP at Visit Lowess Curve and Mean of DBP at Visit 1 (84.5) Lowess curve = solid Mean of DBP = dashed Weight at baseline (lbs) Notice there is not a lot of difference between the two graphs above. The regression output is given below.. regress BV1DBP BLWGT Source SS df MS Number of obs = F( 1, 997) = 0.87 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = BV1DBP Coef. Std. Err. t P> t [95% Conf. Interval] BLWGT _cons We have F = 0.87, df = 1 and 997 with p = Well that is a definite failure to reject the null hypothesis (i.e. fail to reject slope = 0). Notice that the estimate of the slope is That is about as close to a flat line as you can get. Notice that the y-intercept = 83.1 is pretty close to the mean of the Visit 1 DBP (84.5). No wonder the two graphs look alike. Of course, in this case we didn t really need either the regression output or the LOWESS curve to make that determination. Just looking at the scatter plot should have given us a pretty good idea of what the answer would be. Page -15-

18 Let us go back and look at the Greene-Touchstone data (i.e. the estriol/birth weight problem). Remember I said that it was not really a wonderful set of data to use as an example for regression, but it is a good example of some of the problems with regression. Birthweigth in gms Greene-Touchstone Study OLS regression line (dashed) and LOWESS (solid) Bandwidth = Estriol mg/24 hrs Birthweigth in gms Greene-Touchstone Study OLS regression line (dashed) and LOWESS (solid) Bandwidth = Birthweigth in gms Greene-Touchstone Study OLS regression line (dashed) and LOWESS (solid) Bandwidth = Estriol mg/24 hrs Estriol mg/24 hrs Notice that the smoothest LOWESS curve has bandwidth = 0.99 and the most jagged has bandwidth = 0.3. The LOWESS curve (regardless of bandwidth) says that fitting a regression line is not exactly the best choice we could make. The regression output below shows that while we rejected the null hypothesis (i.e. we concluded that the slope was different from zero), R 2 is only 0.37 indicating that the regression line accounts for only 37% of the variability in the data. Page -16-

19 . regress bwt100 estriol Source SS df MS Number of obs = F( 1, 29) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = bwt100 Coef. Std. Err. t P> t [95% Conf. Interval] estriol _cons We need to be careful about interpreting regression results without looking at the graph. The data file used below is Example151Goodold.dta. The example is taken from Common Errors in Statistics (and How to Avoid Them) by P.I. Good and J.W. Hardin. Each of the 4 regression runs below has the same R 2 and the same estimates for the slope and the y-intercept. Do you think they are all the same?. regress y1 x1 Source SS df MS Number of obs = F( 1, 9) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = y1 Coef. Std. Err. t P> t [95% Conf. Interval] x _cons regress y2 x2 Source SS df MS Number of obs = F( 1, 9) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = y2 Coef. Std. Err. t P> t [95% Conf. Interval] x _cons regress y3 x3 Source SS df MS Number of obs = F( 1, 9) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = y3 Coef. Std. Err. t P> t [95% Conf. Interval] x _cons regress y4 x4 Page -17-

20 Source SS df MS Number of obs = F( 1, 9) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = y4 Coef. Std. Err. t P> t [95% Conf. Interval] x _cons The lines in the 4 graphs below show be written y$ = x y = y3= x05. x R R= 067. = y = x R = y1 8 6 y OLS regression line Lowess regression curve 4 2 OLS regression line Lowess regression curve x1 x y = x R = y = x R = OLS regression line y3 8 6 y OLS regression line Lowess regression curve 4 2 Lowess regression curve x x4 Page -18-

21 Inappropriate use of regression Baseline medications Race. tab RACE Race Freq. Percent Cum White Black Asian/Pacific Islander Other Total 1, tab BLMEDS Baseline Medications Freq. Percent Cum On 1-2 drugs ge 2 months On drugs lt 2 months Currently untreated Total 1, label list racelbl racelbl: 1 White 2 Black 3 Amer Indian/Alaskan native 4 Asian/Pacific Islander 5 Other. label list blmedlbl blmedlbl: 1 On 1-2 drugs ge 2 months 2 On drugs lt 2 months 3 Currently untreated Page -19-

22 Response to question in class: If you think you should transform your data, how do you decide which would be the best transformation to use. I would use gladder or qladder which give you an array of potential transformations so you can see how normal that transformation would make your variable of interest. gladder gives histograms and qladder gives Q-Q plots. Notice that Ladder-ofpowers histograms is highlighted but that the line below histograms is the command for the Q-Q plots. just typing in gladder BLWGT gets you the same results. Page -20-

Solutions for Session 5: Linear Models

Solutions for Session 5: Linear Models Solutions for Session 5: Linear Models 30/10/2018. do solution.do. global basedir http://personalpages.manchester.ac.uk/staff/mark.lunt. global datadir $basedir/stats/5_linearmodels1/data. use $datadir/anscombe.

More information

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1 GGraph 9 Gender : R Linear =.43 : R Linear =.769 8 7 6 5 4 3 5 5 Males Only GGraph Page R Linear =.43 R Loess 9 8 7 6 5 4 5 5 Explore Case Processing Summary Cases Valid Missing Total N Percent N Percent

More information

Chapter 6 Part 3 October 21, Bootstrapping

Chapter 6 Part 3 October 21, Bootstrapping Chapter 6 Part 3 October 21, 2008 Bootstrapping From the internet: The bootstrap involves repeated re-estimation of a parameter using random samples with replacement from the original data. Because the

More information

Final Exam - section 1. Thursday, December hours, 30 minutes

Final Exam - section 1. Thursday, December hours, 30 minutes Econometrics, ECON312 San Francisco State University Michael Bar Fall 2013 Final Exam - section 1 Thursday, December 19 1 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.

More information

Heteroskedasticity. . reg wage black exper educ married tenure

Heteroskedasticity. . reg wage black exper educ married tenure Heteroskedasticity. reg Source SS df MS Number of obs = 2,380 -------------+---------------------------------- F(2, 2377) = 72.38 Model 14.4018246 2 7.20091231 Prob > F = 0.0000 Residual 236.470024 2,377.099482551

More information

Handout seminar 6, ECON4150

Handout seminar 6, ECON4150 Handout seminar 6, ECON4150 Herman Kruse March 17, 2013 Introduction - list of commands This week, we need a couple of new commands in order to solve all the problems. hist var1 if var2, options - creates

More information

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17 Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17 Answer all questions in the space provided on the exam. Total of 36 points (and worth 22.5% of final grade). Read each question carefully,

More information

Model fit assessment via marginal model plots

Model fit assessment via marginal model plots The Stata Journal (2010) 10, Number 2, pp. 215 225 Model fit assessment via marginal model plots Charles Lindsey Texas A & M University Department of Statistics College Station, TX lindseyc@stat.tamu.edu

More information

Data screening, transformations: MRC05

Data screening, transformations: MRC05 Dale Berger Data screening, transformations: MRC05 This is a demonstration of data screening and transformations for a regression analysis. Our interest is in predicting current salary from education level

More information

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998 Economics 312 Sample Project Report Jeffrey Parker Introduction This project is based on Exercise 2.12 on page 81 of the Hill, Griffiths, and Lim text. It examines how the sale price of houses in Stockton,

More information

NCSS Statistical Software. Reference Intervals

NCSS Statistical Software. Reference Intervals Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and

More information

Quantitative Techniques Term 2

Quantitative Techniques Term 2 Quantitative Techniques Term 2 Laboratory 7 2 March 2006 Overview The objective of this lab is to: Estimate a cost function for a panel of firms; Calculate returns to scale; Introduce the command cluster

More information

You created this PDF from an application that is not licensed to print to novapdf printer (http://www.novapdf.com)

You created this PDF from an application that is not licensed to print to novapdf printer (http://www.novapdf.com) Monday October 3 10:11:57 2011 Page 1 (R) / / / / / / / / / / / / Statistics/Data Analysis Education Box and save these files in a local folder. name:

More information

Econometrics is. The estimation of relationships suggested by economic theory

Econometrics is. The estimation of relationships suggested by economic theory Econometrics is Econometrics is The estimation of relationships suggested by economic theory Econometrics is The estimation of relationships suggested by economic theory The application of mathematical

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213. Econ 371 Problem Set #4 Answer Sheet 6.2 This question asks you to use the results from column (1) in the table on page 213. a. The first part of this question asks whether workers with college degrees

More information

Two-Sample T-Test for Superiority by a Margin

Two-Sample T-Test for Superiority by a Margin Chapter 219 Two-Sample T-Test for Superiority by a Margin Introduction This procedure provides reports for making inference about the superiority of a treatment mean compared to a control mean from data

More information

Technical Documentation for Household Demographics Projection

Technical Documentation for Household Demographics Projection Technical Documentation for Household Demographics Projection REMI Household Forecast is a tool to complement the PI+ demographic model by providing comprehensive forecasts of a variety of household characteristics.

More information

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

Two-Sample T-Test for Non-Inferiority

Two-Sample T-Test for Non-Inferiority Chapter 198 Two-Sample T-Test for Non-Inferiority Introduction This procedure provides reports for making inference about the non-inferiority of a treatment mean compared to a control mean from data taken

More information

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014 Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014 In class, Lecture 11, we used a new dataset to examine labor force participation and wages across groups.

More information

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement İnsan TUNALI 8 November 2018 Econ 511: Econometrics I ASSIGNMENT 7 STATA Supplement. use "F:\COURSES\GRADS\ECON511\SHARE\wages1.dta", clear. generate =ln(wage). scatter sch Q. Do you see a relationship

More information

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1 *1A Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1 Variable Obs Mean Std Dev Min Max --- housereg 21 2380952

More information

u panel_lecture . sum

u panel_lecture . sum u panel_lecture sum Variable Obs Mean Std Dev Min Max datastre 639 9039644 6369418 900228 926665 year 639 1980 2584012 1976 1984 total_sa 639 9377839 3212313 682 441e+07 tot_fixe 639 5214385 1988422 642

More information

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority Chapter 235 Analysis of 2x2 Cross-Over Designs using -ests for Non-Inferiority Introduction his procedure analyzes data from a two-treatment, two-period (2x2) cross-over design where the goal is to demonstrate

More information

Chapter 18: The Correlational Procedures

Chapter 18: The Correlational Procedures Introduction: In this chapter we are going to tackle about two kinds of relationship, positive relationship and negative relationship. Positive Relationship Let's say we have two values, votes and campaign

More information

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to

More information

Introduction to R (2)

Introduction to R (2) Introduction to R (2) Boxplots Boxplots are highly efficient tools for the representation of the data distributions. The five number summary can be located in boxplots. Additionally, we can distinguish

More information

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research... iii Table of Contents Preface... xiii Purpose... xiii Outline of Chapters... xiv New to the Second Edition... xvii Acknowledgements... xviii Chapter 1: Introduction... 1 1.1: Social Research... 1 Introduction...

More information

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION XLSTAT makes accessible to anyone a powerful, complete and user-friendly data analysis and statistical solution. Accessibility to

More information

Stat 328, Summer 2005

Stat 328, Summer 2005 Stat 328, Summer 2005 Exam #2, 6/18/05 Name (print) UnivID I have neither given nor received any unauthorized aid in completing this exam. Signed Answer each question completely showing your work where

More information

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} PS 4 Monday August 16 01:00:42 2010 Page 1 tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} log: C:\web\PS4log.smcl log type: smcl opened on:

More information

Assignment #5 Solutions: Chapter 14 Q1.

Assignment #5 Solutions: Chapter 14 Q1. Assignment #5 Solutions: Chapter 14 Q1. a. R 2 is.037 and the adjusted R 2 is.033. The adjusted R 2 value becomes particularly important when there are many independent variables in a multiple regression

More information

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment Dummy variables Treatment 22 1 1 Control 3 2 Y Y1 0 1 2 Y X X i identifies treatment 1 1 1 1 1 1 0 0 0 X i =1 if in treatment group X i =0 if in control H o : u n =u u Are wages different across union/nonunion

More information

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No (Your online answer will be used to verify your response.) Directions There are two parts to the final exam.

More information

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of

More information

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018 ` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.

More information

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018 Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 3, 208 [This handout draws very heavily from Regression Models for Categorical

More information

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt. Categorical Outcomes Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Nominal Ordinal 28/11/2017 R by C Table: Example Categorical,

More information

Advanced Econometrics

Advanced Econometrics Advanced Econometrics Instructor: Takashi Yamano 11/14/2003 Due: 11/21/2003 Homework 5 (30 points) Sample Answers 1. (16 points) Read Example 13.4 and an AER paper by Meyer, Viscusi, and Durbin (1995).

More information

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA] Tutorial #3 This example uses data in the file 16.09.2011.dta under Tutorial folder. It contains 753 observations from a sample PSID data on the labor force status of married women in the U.S in 1975.

More information

Problem Set 9 Heteroskedasticty Answers

Problem Set 9 Heteroskedasticty Answers Problem Set 9 Heteroskedasticty Answers /* INVESTIGATION OF HETEROSKEDASTICITY */ First graph data. u hetdat2. gra manuf gdp, s([country].) xlab ylab 300000 manufacturing output (US$ miilio 200000 100000

More information

The Multivariate Regression Model

The Multivariate Regression Model The Multivariate Regression Model Example Determinants of College GPA Sample of 4 Freshman Collect data on College GPA (4.0 scale) Look at importance of ACT Consider the following model CGPA ACT i 0 i

More information

Data Distributions and Normality

Data Distributions and Normality Data Distributions and Normality Definition (Non)Parametric Parametric statistics assume that data come from a normal distribution, and make inferences about parameters of that distribution. These statistical

More information

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine Models of Patterns Lecture 3, SMMD 2005 Bob Stine Review Speculative investing and portfolios Risk and variance Volatility adjusted return Volatility drag Dependence Covariance Review Example Stock and

More information

Summary of Statistical Analysis Tools EDAD 5630

Summary of Statistical Analysis Tools EDAD 5630 Summary of Statistical Analysis Tools EDAD 5630 Test Name Program Used Purpose Steps Main Uses/Applications in Schools Principal Component Analysis SPSS Measure Underlying Constructs Reliability SPSS Measure

More information

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods 1 SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 Lecture 10: Multinomial regression baseline category extension of binary What if we have multiple possible

More information

Problem Set 6 ANSWERS

Problem Set 6 ANSWERS Economics 20 Part I. Problem Set 6 ANSWERS Prof. Patricia M. Anderson The first 5 questions are based on the following information: Suppose a researcher is interested in the effect of class attendance

More information

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points Economics 102: Analysis of Economic Data Cameron Spring 2015 April 23 Department of Economics, U.C.-Davis First Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data Statistical Failings that Keep Us All in the Dark Normal and non normal distributions: Why understanding distributions are important when designing experiments and Conflict of Interest Disclosure I have

More information

SPSS I: Menu Basics Practice Exercises Target Software & Version: SPSS V Last Updated on January 17, 2007 Created by Jennifer Ortman

SPSS I: Menu Basics Practice Exercises Target Software & Version: SPSS V Last Updated on January 17, 2007 Created by Jennifer Ortman SPSS I: Menu Basics Practice Exercises Target Software & Version: SPSS V. 14.02 Last Updated on January 17, 2007 Created by Jennifer Ortman PRACTICE EXERCISES Exercise A Obtain descriptive statistics (mean,

More information

Time series data: Part 2

Time series data: Part 2 Plot of Epsilon over Time -- Case 1 1 Time series data: Part Epsilon - 1 - - - -1 1 51 7 11 1 151 17 Time period Plot of Epsilon over Time -- Case Plot of Epsilon over Time -- Case 3 1 3 1 Epsilon - Epsilon

More information

chapter 2-3 Normal Positive Skewness Negative Skewness

chapter 2-3 Normal Positive Skewness Negative Skewness chapter 2-3 Testing Normality Introduction In the previous chapters we discussed a variety of descriptive statistics which assume that the data are normally distributed. This chapter focuses upon testing

More information

Lecture 6: Non Normal Distributions

Lecture 6: Non Normal Distributions Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192 Financial Econometrics Spring 2015 Overview Non-normalities in (standardized) residuals from asset return

More information

Chapter 6 Part 6. Confidence Intervals chi square distribution binomial distribution

Chapter 6 Part 6. Confidence Intervals chi square distribution binomial distribution Chapter 6 Part 6 Confidence Intervals chi square distribution binomial distribution October 8, 008 Brief review of what we covered last time. In order to get a confidence interval for the population mean

More information

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions ELE 525: Random Processes in Information Systems Hisashi Kobayashi Department of Electrical Engineering

More information

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers Diploma Part 2 Quantitative Methods Examiner s Suggested Answers Question 1 (a) The binomial distribution may be used in an experiment in which there are only two defined outcomes in any particular trial

More information

Establishing a framework for statistical analysis via the Generalized Linear Model

Establishing a framework for statistical analysis via the Generalized Linear Model PSY349: Lecture 1: INTRO & CORRELATION Establishing a framework for statistical analysis via the Generalized Linear Model GLM provides a unified framework that incorporates a number of statistical methods

More information

starting on 5/1/1953 up until 2/1/2017.

starting on 5/1/1953 up until 2/1/2017. An Actuary s Guide to Financial Applications: Examples with EViews By William Bourgeois An actuary is a business professional who uses statistics to determine and analyze risks for companies. In this guide,

More information

How To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion

How To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion How To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus July 17, 2005 Introduction For individuals concerned with the quality of the goods and services that they

More information

Point-Biserial and Biserial Correlations

Point-Biserial and Biserial Correlations Chapter 302 Point-Biserial and Biserial Correlations Introduction This procedure calculates estimates, confidence intervals, and hypothesis tests for both the point-biserial and the biserial correlations.

More information

Simple Descriptive Statistics

Simple Descriptive Statistics Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

The relationship between GDP, labor force and health expenditure in European countries

The relationship between GDP, labor force and health expenditure in European countries Econometrics-Term paper The relationship between GDP, labor force and health expenditure in European countries Student: Nguyen Thu Ha Contents 1. Background:... 2 2. Discussion:... 2 3. Regression equation

More information

SAS Simple Linear Regression Example

SAS Simple Linear Regression Example SAS Simple Linear Regression Example This handout gives examples of how to use SAS to generate a simple linear regression plot, check the correlation between two variables, fit a simple linear regression

More information

1) The Effect of Recent Tax Changes on Taxable Income

1) The Effect of Recent Tax Changes on Taxable Income 1) The Effect of Recent Tax Changes on Taxable Income In the most recent issue of the Journal of Policy Analysis and Management, Bradley Heim published a paper called The Effect of Recent Tax Changes on

More information

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017

More information

Probability & Statistics Modular Learning Exercises

Probability & Statistics Modular Learning Exercises Probability & Statistics Modular Learning Exercises About The Actuarial Foundation The Actuarial Foundation, a 501(c)(3) nonprofit organization, develops, funds and executes education, scholarship and

More information

Multiple regression - a brief introduction

Multiple regression - a brief introduction Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict

More information

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017 Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 0, 207 [This handout draws very heavily from Regression Models for Categorical

More information

Homework Assignment Section 3

Homework Assignment Section 3 Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.

More information

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and

More information

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Review of previous

More information

F^3: F tests, Functional Forms and Favorite Coefficient Models

F^3: F tests, Functional Forms and Favorite Coefficient Models F^3: F tests, Functional Forms and Favorite Coefficient Models Favorite coefficient model: otherteams use "nflpricedata Bdta", clear *Favorite coefficient model: otherteams reg rprice pop pop2 rpci wprcnt1

More information

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children Longitudinal Logistic Regression: Breastfeeding of Nepalese Children Scientific Question Determine whether the breastfeeding of Nepalese children varies with child age and/or sex of child. Data: Nepal

More information

ECON Introductory Econometrics Seminar 2, 2015

ECON Introductory Econometrics Seminar 2, 2015 ECON4150 - Introductory Econometrics Seminar 2, 2015 Stock and Watson EE4.1, EE5.2 Stock and Watson EE4.1, EE5.2 ECON4150 - Introductory Econometrics Seminar 2, 2015 1 / 14 Seminar 2 Author: Andrea University

More information

The SAS System 11:03 Monday, November 11,

The SAS System 11:03 Monday, November 11, The SAS System 11:3 Monday, November 11, 213 1 The CONTENTS Procedure Data Set Name BIO.AUTO_PREMIUMS Observations 5 Member Type DATA Variables 3 Engine V9 Indexes Created Monday, November 11, 213 11:4:19

More information

Question scores. Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d M ult:choice Points

Question scores. Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d M ult:choice Points Economics 02: Analysis of Economic Data Cameron Winter 204 January 30 Department of Economics, U.C.-Davis First Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

Determinants of FII Inflows:India

Determinants of FII Inflows:India MPRA Munich Personal RePEc Archive Determinants of FII Inflows:India Ravi Saraogi February 2008 Online at https://mpra.ub.uni-muenchen.de/22850/ MPRA Paper No. 22850, posted 22. May 2010 23:04 UTC Determinants

More information

Descriptive Statistics

Descriptive Statistics Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations

More information

Regression and Simulation

Regression and Simulation Regression and Simulation This is an introductory R session, so it may go slowly if you have never used R before. Do not be discouraged. A great way to learn a new language like this is to plunge right

More information

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 For this assignment use the Diamonds dataset in the Stat2Data library. The dataset is used in examples

More information

The instructions on this page also work for the TI-83 Plus and the TI-83 Plus Silver Edition.

The instructions on this page also work for the TI-83 Plus and the TI-83 Plus Silver Edition. The instructions on this page also work for the TI-83 Plus and the TI-83 Plus Silver Edition. The position of the graphically represented keys can be found by moving your mouse on top of the graphic. Turn

More information

Descriptive Analysis

Descriptive Analysis Descriptive Analysis HERTANTO WAHYU SUBAGIO Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable

More information

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Lecture 3 Reading: Sections 5.7 54 Remember, when you finish a chapter make sure not to miss the last couple of boxes: What Can Go

More information

Econometric Methods for Valuation Analysis

Econometric Methods for Valuation Analysis Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 26 Correlation Analysis Simple Regression

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

Advanced Industrial Organization I Identi cation of Demand Functions

Advanced Industrial Organization I Identi cation of Demand Functions Advanced Industrial Organization I Identi cation of Demand Functions Måns Söderbom, University of Gothenburg January 25, 2011 1 1 Introduction This is primarily an empirical lecture in which I will discuss

More information

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions Preliminaries 1. Basic Regression. reg y x1 Source SS df MS

More information

M249 Diagnostic Quiz

M249 Diagnostic Quiz THE OPEN UNIVERSITY Faculty of Mathematics and Computing M249 Diagnostic Quiz Prepared by the Course Team [Press to begin] c 2005, 2006 The Open University Last Revision Date: May 19, 2006 Version 4.2

More information

Model Construction & Forecast Based Portfolio Allocation:

Model Construction & Forecast Based Portfolio Allocation: QBUS6830 Financial Time Series and Forecasting Model Construction & Forecast Based Portfolio Allocation: Is Quantitative Method Worth It? Members: Bowei Li (303083) Wenjian Xu (308077237) Xiaoyun Lu (3295347)

More information

Monte Carlo Simulation (General Simulation Models)

Monte Carlo Simulation (General Simulation Models) Monte Carlo Simulation (General Simulation Models) Revised: 10/11/2017 Summary... 1 Example #1... 1 Example #2... 10 Summary Monte Carlo simulation is used to estimate the distribution of variables when

More information

WEB APPENDIX 8A 7.1 ( 8.9)

WEB APPENDIX 8A 7.1 ( 8.9) WEB APPENDIX 8A CALCULATING BETA COEFFICIENTS The CAPM is an ex ante model, which means that all of the variables represent before-the-fact expected values. In particular, the beta coefficient used in

More information

Frequency Distribution and Summary Statistics

Frequency Distribution and Summary Statistics Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai i at Mānoa Outline 1. Stemplot 2. Frequency table 3. Summary

More information

Stat3011: Solution of Midterm Exam One

Stat3011: Solution of Midterm Exam One 1 Stat3011: Solution of Midterm Exam One Fall/2003, Tiefeng Jiang Name: Problem 1 (30 points). Choose one appropriate answer in each of the following questions. 1. (B ) The mean age of five people in a

More information

Data Analysis. BCF106 Fundamentals of Cost Analysis

Data Analysis. BCF106 Fundamentals of Cost Analysis Data Analysis BCF106 Fundamentals of Cost Analysis June 009 Chapter 5 Data Analysis 5.0 Introduction... 3 5.1 Terminology... 3 5. Measures of Central Tendency... 5 5.3 Measures of Dispersion... 7 5.4 Frequency

More information

Effect of Education on Wage Earning

Effect of Education on Wage Earning Effect of Education on Wage Earning Group Members: Quentin Talley, Thomas Wang, Geoff Zaski Abstract The scope of this project includes individuals aged 18-65 who finished their education and do not have

More information

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line. Introduction We continue our study of descriptive statistics with measures of dispersion, such as dot plots, stem and leaf displays, quartiles, percentiles, and box plots. Dot plots, a stem-and-leaf display,

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information