Chapter 6 Part 3 October 21, Bootstrapping

Size: px
Start display at page:

Download "Chapter 6 Part 3 October 21, Bootstrapping"

Transcription

1 Chapter 6 Part 3 October 21, 2008 Bootstrapping

2 From the internet: The bootstrap involves repeated re-estimation of a parameter using random samples with replacement from the original data. Because the sampling is with replacement, some items in the data set are selected two or more times and others are not selected at all. When this is repeated a hundred or a thousand times, we get pseudo-samples that behave similarly to the underlying distribution of the data. Bootstrapping by Hand - Sampling with replacement Original 5000 observations:. sum(aftrig),det Fast Triglycerides-BL Anti Percentiles Smallest 1% % % Obs % Sum of Wgt % 140 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis return list scalars: r(n) = 5000 r(sum_w) = 5000 r(mean) = r(var) = Frequency I have omitted the x-axis to make it easier for you to see the very small bars on the right hand tail. You can see that we have a very skewed distribution. skewness = 2.4 (as opposed to 0) Fasting triglycerides for original dataset of 5000 Page -1-

3 Below we have bootstrapping by hand. I have selected 4 samples each of size 10 and obtained the mean of each set of 10. log: W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\Chapter6Part3.log log type: text opened on: 20 Oct 2008, 16:32:28. *dofile used is sample4setsof10.do. use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sum(aftrig) AFTRIG sort ID. set seed 50. bsample 10. list ID AFTRIG ID AFTRIG sum(aftrig) AFTRIG clear Page -2-

4 . use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sort ID. set seed 51. bsample 10. list ID AFTRIG ID AFTRIG sum(aftrig) AFTRIG clear. use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sort ID. set seed 52. bsample 10. list ID AFTRIG ID AFTRIG sum(aftrig) AFTRIG clear Page -3-

5 . use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sort ID. set seed 53. bsample 10. list ID AFTRIG ID AFTRIG sum(aftrig) AFTRIG log close log: W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\Chapter6Part3.log log type: text closed on: 20 Oct 2008, 16:32:28 So from each of the 4 sets of 10 observations we obtained an estimate of the mean value of the population of 5000 fasting triglycerides. If we had selected a 1000 samples of size 10, we would have a 1000 estimates of the means of fasting triglycerides (one for each sample). You could then create a histogram of the 1000 means. This sample of 1000 means is called the sampling distribution of means. If for each of the 1000 samples, we had asked for the variance instead of the mean, then we would have the sampling distribution of variances and we could obtain the mean of the 1000 variances. Below is how you get the means for the sampling distribution of means using single command rather than a separate command for each sample. We are assuming that the dataset samplingaht_5000.dta represents a population of people. That is, instead of treating it like the sample that it is, we are going to treat it as though it is a population. Results from dofile: Page -4-

6 . do "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\sample4setsof10singlecommand.do". clear. log using "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\Chapter6Part3No2.log log: W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\Chapter6Part3No2.log log type: text opened on: 20 Oct 2008, 18:28:17. *dofile used is sample4setsof10singlecommand.do. use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sort ID. set seed 50. bs TGmeans = r(mean) TGvariances = r(var), reps(4) size(10) noisily saving(tg_r4_s10) :summarize AFTRIG bootstrap: First call to summarize with data as is: This first summarize is for all 5000 people. AFTRIG Warning: Since summarize is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data. Bootstrap replications (4) This is the mean of the first sample of size 10. AFTRIG This is the mean of the second sample of size 10. Page -5-

7 AFTRIG This is the mean of the third sample of size 10. AFTRIG This is the mean of the fourth sample of size 10. AFTRIG Bootstrap results Number of obs = 5000 Replications = 4 command: summarize AFTRIG TGmeans: r(mean) TGvariances: r(var) Observed Bootstrap Normal-based Coef. Std. Err. z P> z [95% Conf. Interval] TGmeans TGvariances clear The observed coefficients above are the mean (169.1) and variance ( ) of AFTRIG with all 5000 participants. As part of the bootstrapping routine we asked Stata to save the mean and variance for each of the 4 samples of size 10 in a data set that we called TG_R4_S10.dta. Notice below that the data set has only 4 observations because we asked for only 4 replications.. use TG_R4_S10.dta (bootstrap: summarize) Page -6-

8 . des Contains data from TG_R4_S10.dta obs: 4 bootstrap: summarize vars: 2 20 Oct :28 size: 48 (99.9% of memory free) storage display value variable name type format label variable label TGmeans float %9.0g r(mean) TGvariances float %9.0g r(var) Sorted by:. list TGmeans TGvariances TGmeans TGvari~s Notice that the only one of the means that is the same as the means obtained by hand is the first one because both by hand version and by single command version use the same seed (50). Notice below that the mean of the 4 sample means is which is not very close to 169.1, the mean of the population. We need to select more than 4 samples and samples larger than 10 to get a good estimate of the population mean.. sum(tgmeans),det r(mean) Percentiles Smallest 1% % % Obs 4 25% Sum of Wgt. 4 50% Mean Largest Std. Dev % % Variance % Skewness % Kurtosis end of do-file Below is a data set of 1000 samples of size 100 which we obtained from the original dataset of Page -7-

9 Notice below that the mean of our 1000 samples of size 100 is Now we are getting closer to the , the mean of the original distribution of size clear. use "W:\WP51\Biometry\AAAABiostatFall2008\Handouts\Chapter 6\Data\samplingAHT_5000.dta", clear. log using W:\WP51\Biometry\AAAABiostatFall2008\Handouts\Chapter 6\Data\classbootstrap.log log: classbootstrap.log log type: text opened on: 20 Oct 2008, 22:31:13. set more off. sort ID. set seed 50. bs TGmeans = r(mean) TGvariances = r(var), reps(1000) size(100) saving(tg_r1000_s100):summarize AFTRIG (running summarize on estimation sample) Warning: Since summarize is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data. Bootstrap replications (1000) Bootstrap results Number of obs = 5000 Replications = 1000 command: summarize AFTRIG TGmeans: r(mean) TGvariances: r(var) Observed Bootstrap Normal-based Coef. Std. Err. z P> z [95% Conf. Interval] TGmeans TGvariances clear Note that the observed coefficient above gives the mean and variance of the original data set of Page -8-

10 . use "W:\WP51\Biometry\AAAABiostatFall2008\Handouts\Chapter 6\Data\TG_R1000_S100.dta (bootstrap: summarize). des Contains data from TG_R1000_S100.dta obs: 1,000 bootstrap: summarize vars: 2 11 Oct :01 size: 12,000 (99.9% of memory free) storage display value variable name type format label variable label TGmeans float %9.0g r(mean) TGvariances float %9.0g r(var) Sorted by:. sum(tgmeans),det r(mean) Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis Notice that we have a better estimate (168.6) of the population mean list TGmeans TGvariances in 1/ TGmeans TGvari~s The graph below is a histogram of the 1000 means we got above. Notice that this histogram is rather symmetric looking and not at all like the very skewed histogram of the original variable AFTRIG. Page -9-

11 Frequency Bootstrapping using samplingaht_5000.dta seed = 50, reps = 1000 and size = r(mean) Now let us get more samples and larger samples than we have before. Below we have the output and histogram of 5000 samples of size 3000 selected from the original distribution of AFTRIG.. use "W:\WP51\Biometry\AAAABiostatFall2008\Handouts\Chapter 6\Data\samplingAHT_5000.dta",clear. set more off. sort ID. set seed 50. bs TGmeans = r(mean) TGvariances = r(var), reps(5000) size(3000) saving(tg_r5000_s3000):summarize AFTRIG bootstrap: First call to summarize with data as is: AFTRIG Warning: Since summarize is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data. Bootstrap replications (5000) Page -10-

12 Below is a partial list of the sample means. Sample 1 AFTRIG Sample 2 AFTRIG Sample 3 AFTRIG Sample 4 Sample 5 AFTRIG AFTRIG etc. Let us look at the results of all 5000 samples of size use TG_R5000_S3000.dta (bootstrap: summarize) Page -11-

13 . des Contains data from TG_R5000_S3000.dta obs: 5,000 bootstrap: summarize vars: 2 20 Oct :11 size: 60,000 (99.8% of memory free) - storage display value variable name type format label variable label - TGmeans float %9.0g r(mean) TGvariances float %9.0g r(var) - Sorted by: Below is the mean of the 5000 means.. sum(tgmeans),det r(mean) Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean this is a pretty Largest Std. Dev good estimate of 75% % Variance % Skewness % Kurtosis list TGmeans TGvariances in 1/ TGmeans TGvari~s Notice that these 5 means match up with the means of the 5 samples listed above The variances are the SDs squared from the samples above Page -12-

14 Frequency Bootstrapping using samplingaht_5000.dta seed = 50, reps = 5000 and size = r(mean) The normal curve that I have superimposed on the histogram has the same mean ( ) as the distribution of the 5000 AFTRIG means. The distribution of the 5000 AFTRIG means has kurtosis 3.03 (that is pretty close to 3) and skewness 0.08 (which is pretty close to 0). So the distribution matches up pretty well with a normal distribution. You will see that the table below now matches up with the Stata runs above. The n in the table below is the size of the samples. Page -13-

15 Page -14-

16 There are a number of things to notice in the table above. 1. As the number of repetitions and the sample size get larger the values in the column labeled means get closer to (the mean of the 5000 AFTRIG values). This is Fact 1: μ X = μ X 2. As the number of repetitions and the sample size get larger the values in the column labeled SD (i.e. the standard deviation of the distribution of means) σ n begins to look like the column labeled. The is the size of the samples selected. This is Fact 2: 2 σ 2 X Var( X ) Var( X ) = σ = = X n n or taking the square root of each of the terms above we get The SD of the sampling distributions of means is the SEM. n 3. As the number of repetitions and the sample size get larger the values in the column labeled Min get larger and those in the column labeled Max get smaller. 4. As the number of repetitions and the sample size get larger the values in the column labeled skewness get closer to zero and the values in the column labeled kurtosis get closer to 3. Page -15-

You created this PDF from an application that is not licensed to print to novapdf printer (http://www.novapdf.com)

You created this PDF from an application that is not licensed to print to novapdf printer (http://www.novapdf.com) Monday October 3 10:11:57 2011 Page 1 (R) / / / / / / / / / / / / Statistics/Data Analysis Education Box and save these files in a local folder. name:

More information

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points Economics 102: Analysis of Economic Data Cameron Spring 2015 April 23 Department of Economics, U.C.-Davis First Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213. Econ 371 Problem Set #4 Answer Sheet 6.2 This question asks you to use the results from column (1) in the table on page 213. a. The first part of this question asks whether workers with college degrees

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

Monte Carlo Simulation (General Simulation Models)

Monte Carlo Simulation (General Simulation Models) Monte Carlo Simulation (General Simulation Models) Revised: 10/11/2017 Summary... 1 Example #1... 1 Example #2... 10 Summary Monte Carlo simulation is used to estimate the distribution of variables when

More information

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

Chapter 11 Part 6. Correlation Continued. LOWESS Regression Chapter 11 Part 6 Correlation Continued LOWESS Regression February 17, 2009 Goal: To review the properties of the correlation coefficient. To introduce you to the various tools that can be used to decide

More information

Handout seminar 6, ECON4150

Handout seminar 6, ECON4150 Handout seminar 6, ECON4150 Herman Kruse March 17, 2013 Introduction - list of commands This week, we need a couple of new commands in order to solve all the problems. hist var1 if var2, options - creates

More information

Chapter 6 Part 6. Confidence Intervals chi square distribution binomial distribution

Chapter 6 Part 6. Confidence Intervals chi square distribution binomial distribution Chapter 6 Part 6 Confidence Intervals chi square distribution binomial distribution October 8, 008 Brief review of what we covered last time. In order to get a confidence interval for the population mean

More information

Normal populations. Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi

Normal populations. Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi In previous labs where we investigated the distribution of the sample mean and sample proportion, we often noticed that the distribution

More information

Final Exam - section 1. Thursday, December hours, 30 minutes

Final Exam - section 1. Thursday, December hours, 30 minutes Econometrics, ECON312 San Francisco State University Michael Bar Fall 2013 Final Exam - section 1 Thursday, December 19 1 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.

More information

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS Note: This section uses session window commands instead of menu choices CENTRAL LIMIT THEOREM (SECTION 7.2 OF UNDERSTANDABLE STATISTICS) The Central Limit

More information

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1 *1A Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1 Variable Obs Mean Std Dev Min Max --- housereg 21 2380952

More information

Rationale. Learning about return and risk from the historical record and beta estimation. T Bills and Inflation

Rationale. Learning about return and risk from the historical record and beta estimation. T Bills and Inflation Learning about return and risk from the historical record and beta estimation Reference: Investments, Bodie, Kane, and Marcus, and Investment Analysis and Behavior, Nofsinger and Hirschey Nattawut Jenwittayaroje,

More information

ECO220Y, Term Test #2

ECO220Y, Term Test #2 ECO220Y, Term Test #2 December 4, 2015, 9:10 11:00 am U of T e-mail: @mail.utoronto.ca Surname (last name): Given name (first name): UTORID: (e.g. lihao8) Instructions: You have 110 minutes. Keep these

More information

IOP 201-Q (Industrial Psychological Research) Tutorial 5

IOP 201-Q (Industrial Psychological Research) Tutorial 5 IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,

More information

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean Measure of Center Measures of Center The value at the center or middle of a data set 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) 1 2 Mean Notation The measure of center obtained by adding the values

More information

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} PS 4 Monday August 16 01:00:42 2010 Page 1 tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} log: C:\web\PS4log.smcl log type: smcl opened on:

More information

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1 Module 9: Single-level and Multilevel Models for Ordinal Responses Pre-requisites Modules 5, 6 and 7 Stata Practical 1 George Leckie, Tim Morris & Fiona Steele Centre for Multilevel Modelling If you find

More information

u panel_lecture . sum

u panel_lecture . sum u panel_lecture sum Variable Obs Mean Std Dev Min Max datastre 639 9039644 6369418 900228 926665 year 639 1980 2584012 1976 1984 total_sa 639 9377839 3212313 682 441e+07 tot_fixe 639 5214385 1988422 642

More information

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Lecture 3 Reading: Sections 5.7 54 Remember, when you finish a chapter make sure not to miss the last couple of boxes: What Can Go

More information

Question scores. Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d M ult:choice Points

Question scores. Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d M ult:choice Points Economics 02: Analysis of Economic Data Cameron Winter 204 January 30 Department of Economics, U.C.-Davis First Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution PSY 464 Advanced Experimental Design Describing and Exploring Data The Normal Distribution 1 Overview/Outline Questions-problems? Exploring/Describing data Organizing/summarizing data Graphical presentations

More information

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html Math 2311 Class

More information

NCSS Statistical Software. Reference Intervals

NCSS Statistical Software. Reference Intervals Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and

More information

Simple Descriptive Statistics

Simple Descriptive Statistics Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

Descriptive Analysis

Descriptive Analysis Descriptive Analysis HERTANTO WAHYU SUBAGIO Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable

More information

The Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis

The Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis The Central Limit Theorem (Solutions) COR1-GB1305 Statistics and Data Analysis 1 You draw a random sample of size n = 64 from a population with mean µ = 50 and standard deviation σ = 16 From this, you

More information

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions SGSB Workshop: Using Statistical Data to Make Decisions Module 2: The Logic of Statistical Inference Dr. Tom Ilvento January 2006 Dr. Mugdim Pašić Key Objectives Understand the logic of statistical inference

More information

Description Remarks and examples References Also see

Description Remarks and examples References Also see Title stata.com example 41g Two-level multinomial logistic regression (multilevel) Description Remarks and examples References Also see Description We demonstrate two-level multinomial logistic regression

More information

Descriptive Statistics

Descriptive Statistics Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations

More information

R & R Study. Chapter 254. Introduction. Data Structure

R & R Study. Chapter 254. Introduction. Data Structure Chapter 54 Introduction A repeatability and reproducibility (R & R) study (sometimes called a gauge study) is conducted to determine if a particular measurement procedure is adequate. If the measurement

More information

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

Financial Econometrics Jeffrey R. Russell Midterm 2014

Financial Econometrics Jeffrey R. Russell Midterm 2014 Name: Financial Econometrics Jeffrey R. Russell Midterm 2014 You have 2 hours to complete the exam. Use can use a calculator and one side of an 8.5x11 cheat sheet. Try to fit all your work in the space

More information

Monte Carlo Simulation (Random Number Generation)

Monte Carlo Simulation (Random Number Generation) Monte Carlo Simulation (Random Number Generation) Revised: 10/11/2017 Summary... 1 Data Input... 1 Analysis Options... 6 Summary Statistics... 6 Box-and-Whisker Plots... 7 Percentiles... 9 Quantile Plots...

More information

Hydrology 4410 Class 29. In Class Notes & Exercises Mar 27, 2013

Hydrology 4410 Class 29. In Class Notes & Exercises Mar 27, 2013 Hydrology 4410 Class 29 In Class Notes & Exercises Mar 27, 2013 Log Normal Distribution We will not work an example in class. The procedure is exactly the same as in the normal distribution, but first

More information

David Tenenbaum GEOG 090 UNC-CH Spring 2005

David Tenenbaum GEOG 090 UNC-CH Spring 2005 Simple Descriptive Statistics Review and Examples You will likely make use of all three measures of central tendency (mode, median, and mean), as well as some key measures of dispersion (standard deviation,

More information

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018 Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 3, 208 [This handout draws very heavily from Regression Models for Categorical

More information

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient Statistics & Flood Frequency Chapter 3 Dr. Philip B. Bedient Predicting FLOODS Flood Frequency Analysis n Statistical Methods to evaluate probability exceeding a particular outcome - P (X >20,000 cfs)

More information

Computing Statistics ID1050 Quantitative & Qualitative Reasoning

Computing Statistics ID1050 Quantitative & Qualitative Reasoning Computing Statistics ID1050 Quantitative & Qualitative Reasoning Single-variable Statistics We will be considering six statistics of a data set Three measures of the middle Mean, median, and mode Two measures

More information

Chapter 7 1. Random Variables

Chapter 7 1. Random Variables Chapter 7 1 Random Variables random variable numerical variable whose value depends on the outcome of a chance experiment - discrete if its possible values are isolated points on a number line - continuous

More information

4.2 Probability Distributions

4.2 Probability Distributions 4.2 Probability Distributions Definition. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The probability distribution of a random variable tells us what the

More information

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line. Introduction We continue our study of descriptive statistics with measures of dispersion, such as dot plots, stem and leaf displays, quartiles, percentiles, and box plots. Dot plots, a stem-and-leaf display,

More information

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections Sampling Distribution of and Simulation Methods Lecture 11 Reading: Sections 1.3 1.5 1 Ontario Public Sector Salaries Public Sector Salary Disclosure Act, 1996 Requires organizations that receive public

More information

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of

More information

1 Describing Distributions with numbers

1 Describing Distributions with numbers 1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write

More information

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017

More information

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY 1 THIS WEEK S PLAN Part I: Theory + Practice ( Interval Estimation ) Part II: Theory + Practice ( Interval Estimation ) z-based Confidence Intervals for a Population

More information

chapter 2-3 Normal Positive Skewness Negative Skewness

chapter 2-3 Normal Positive Skewness Negative Skewness chapter 2-3 Testing Normality Introduction In the previous chapters we discussed a variety of descriptive statistics which assume that the data are normally distributed. This chapter focuses upon testing

More information

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD MAJOR POINTS Sampling distribution of the mean revisited Testing hypotheses: sigma known An example Testing hypotheses:

More information

Problem Set 6 ANSWERS

Problem Set 6 ANSWERS Economics 20 Part I. Problem Set 6 ANSWERS Prof. Patricia M. Anderson The first 5 questions are based on the following information: Suppose a researcher is interested in the effect of class attendance

More information

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998 Economics 312 Sample Project Report Jeffrey Parker Introduction This project is based on Exercise 2.12 on page 81 of the Hill, Griffiths, and Lim text. It examines how the sale price of houses in Stockton,

More information

Lecture 12: The Bootstrap

Lecture 12: The Bootstrap Lecture 12: The Bootstrap Reading: Chapter 5 STATS 202: Data mining and analysis October 20, 2017 1 / 16 Announcements Midterm is on Monday, Oct 30 Topics: chapters 1-5 and 10 of the book everything until

More information

Describing Data: One Quantitative Variable

Describing Data: One Quantitative Variable STAT 250 Dr. Kari Lock Morgan The Big Picture Describing Data: One Quantitative Variable Population Sampling SECTIONS 2.2, 2.3 One quantitative variable (2.2, 2.3) Statistical Inference Sample Descriptive

More information

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17 Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17 Answer all questions in the space provided on the exam. Total of 36 points (and worth 22.5% of final grade). Read each question carefully,

More information

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 7 Sampling Distributions and Point Estimation of Parameters Chapter 7 Sampling Distributions and Point Estimation of Parameters Part 1: Sampling Distributions, the Central Limit Theorem, Point Estimation & Estimators Sections 7-1 to 7-2 1 / 25 Statistical Inferences

More information

Summary of Statistical Analysis Tools EDAD 5630

Summary of Statistical Analysis Tools EDAD 5630 Summary of Statistical Analysis Tools EDAD 5630 Test Name Program Used Purpose Steps Main Uses/Applications in Schools Principal Component Analysis SPSS Measure Underlying Constructs Reliability SPSS Measure

More information

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers Cumulative frequency Diploma in Business Administration Part Quantitative Methods Examiner s Suggested Answers Question 1 Cumulative Frequency Curve 1 9 8 7 6 5 4 3 1 5 1 15 5 3 35 4 45 Weeks 1 (b) x f

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form: 1 Exercise One Note that the data is not grouped! 1.1 Calculate the mean ROI Below you find the raw data in tabular form: Obs Data 1 18.5 2 18.6 3 17.4 4 12.2 5 19.7 6 5.6 7 7.7 8 9.8 9 19.9 10 9.9 11

More information

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority Chapter 235 Analysis of 2x2 Cross-Over Designs using -ests for Non-Inferiority Introduction his procedure analyzes data from a two-treatment, two-period (2x2) cross-over design where the goal is to demonstrate

More information

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of

More information

Application of the Bootstrap Estimating a Population Mean

Application of the Bootstrap Estimating a Population Mean Application of the Bootstrap Estimating a Population Mean Movie Average Shot Lengths Sources: Barry Sands Average Shot Length Movie Database L. Chihara and T. Hesterberg (2011). Mathematical Statistics

More information

Description Quick start Menu Syntax Options Remarks and examples Acknowledgment Also see

Description Quick start Menu Syntax Options Remarks and examples Acknowledgment Also see Title stata.com collapse Make dataset of summary statistics Description Quick start Menu Syntax Options Remarks and examples Acknowledgment Also see Description collapse converts the dataset in memory

More information

Chapter 7. Random Variables

Chapter 7. Random Variables Chapter 7 Random Variables Making quantifiable meaning out of categorical data Toss three coins. What does the sample space consist of? HHH, HHT, HTH, HTT, TTT, TTH, THT, THH In statistics, we are most

More information

Lecture 2 Describing Data

Lecture 2 Describing Data Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms

More information

(ii) Give the name of the California website used to find the various insurance plans offered under the Affordable care Act (Obamacare).

(ii) Give the name of the California website used to find the various insurance plans offered under the Affordable care Act (Obamacare). Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Winter 18 Answer all questions in the space provided on the exam. Total of 36 points (and worth 22.5% of final grade). Read each question carefully,

More information

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014 Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014 In class, Lecture 11, we used a new dataset to examine labor force participation and wages across groups.

More information

Descriptive Statistics (Devore Chapter One)

Descriptive Statistics (Devore Chapter One) Descriptive Statistics (Devore Chapter One) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 0 Perspective 1 1 Pictorial and Tabular Descriptions of Data 2 1.1 Stem-and-Leaf

More information

Introduction to Descriptive Statistics

Introduction to Descriptive Statistics Introduction to Descriptive Statistics 17.871 Types of Variables ~Nominal (Quantitative) Nominal (Qualitative) categorical Ordinal Interval or ratio Describing data Moment Non-mean based measure Center

More information

Statistical Intervals (One sample) (Chs )

Statistical Intervals (One sample) (Chs ) 7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and

More information

Statistics for Business and Economics

Statistics for Business and Economics Statistics for Business and Economics Chapter 7 Estimation: Single Population Copyright 010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-1 Confidence Intervals Contents of this chapter: Confidence

More information

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) = Solutions to End-of-Section and Chapter Review Problems 225 CHAPTER 6 6.1 (a) P(Z < 1.20) = 0.88493 P(Z > 1.25) = 1 0.89435 = 0.10565 P(1.25 < Z < 1.70) = 0.95543 0.89435 = 0.06108 (d) P(Z < 1.25) or Z

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Checking for

More information

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1 GGraph 9 Gender : R Linear =.43 : R Linear =.769 8 7 6 5 4 3 5 5 Males Only GGraph Page R Linear =.43 R Loess 9 8 7 6 5 4 5 5 Explore Case Processing Summary Cases Valid Missing Total N Percent N Percent

More information

Random Effects ANOVA

Random Effects ANOVA Random Effects ANOVA Grant B. Morgan Baylor University This post contains code for conducting a random effects ANOVA. Make sure the following packages are installed: foreign, lme4, lsr, lattice. library(foreign)

More information

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017 Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 0, 207 [This handout draws very heavily from Regression Models for Categorical

More information

Numerical Descriptive Measures. Measures of Center: Mean and Median

Numerical Descriptive Measures. Measures of Center: Mean and Median Steve Sawin Statistics Numerical Descriptive Measures Having seen the shape of a distribution by looking at the histogram, the two most obvious questions to ask about the specific distribution is where

More information

Statistical Tables Compiled by Alan J. Terry

Statistical Tables Compiled by Alan J. Terry Statistical Tables Compiled by Alan J. Terry School of Science and Sport University of the West of Scotland Paisley, Scotland Contents Table 1: Cumulative binomial probabilities Page 1 Table 2: Cumulative

More information

Lecture Data Science

Lecture Data Science Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics Foundations JProf. Dr. Claudia Wagner Learning Goals How to describe sample data? What is mode/median/mean?

More information

Financial Econometrics Jeffrey R. Russell. Midterm 2014 Suggested Solutions. TA: B. B. Deng

Financial Econometrics Jeffrey R. Russell. Midterm 2014 Suggested Solutions. TA: B. B. Deng Financial Econometrics Jeffrey R. Russell Midterm 2014 Suggested Solutions TA: B. B. Deng Unless otherwise stated, e t is iid N(0,s 2 ) 1. (12 points) Consider the three series y1, y2, y3, and y4. Match

More information

The following content is provided under a Creative Commons license. Your support

The following content is provided under a Creative Commons license. Your support MITOCW Recitation 6 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make

More information

Valid Missing Total. N Percent N Percent N Percent , ,0% 0,0% 2 100,0% 1, ,0% 0,0% 2 100,0% 2, ,0% 0,0% 5 100,0%

Valid Missing Total. N Percent N Percent N Percent , ,0% 0,0% 2 100,0% 1, ,0% 0,0% 2 100,0% 2, ,0% 0,0% 5 100,0% dimension1 GET FILE= validacaonestscoremédico.sav' (só com os 59 doentes) /COMPRESSED. SORT CASES BY UMcpEVA (D). EXAMINE VARIABLES=UMcpEVA BY NoRespostasSignif /PLOT BOXPLOT HISTOGRAM NPPLOT /COMPARE

More information

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion Web Appendix Are the effects of monetary policy shocks big or small? Olivier Coibion Appendix 1: Description of the Model-Averaging Procedure This section describes the model-averaging procedure used in

More information

Description of Data I

Description of Data I Description of Data I (Summary and Variability measures) Objectives: Able to understand how to summarize the data Able to understand how to measure the variability of the data Able to use and interpret

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 22 January :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 22 January :00 16:00 Two Hours MATH38191 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER STATISTICAL MODELLING IN FINANCE 22 January 2015 14:00 16:00 Answer ALL TWO questions

More information

DECISION SUPPORT Risk handout. Simulating Spreadsheet models

DECISION SUPPORT Risk handout. Simulating Spreadsheet models DECISION SUPPORT MODELS @ Risk handout Simulating Spreadsheet models using @RISK 1. Step 1 1.1. Open Excel and @RISK enabling any macros if prompted 1.2. There are four on-line help options available.

More information

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011 Standard Deviation Lecture 18 Section 5.3.4 Robb T. Koether Hampden-Sydney College Mon, Sep 26, 2011 Robb T. Koether (Hampden-Sydney College) Standard Deviation Mon, Sep 26, 2011 1 / 42 Outline 1 Variability

More information

Business Statistics 41000: Probability 4

Business Statistics 41000: Probability 4 Business Statistics 41000: Probability 4 Drew D. Creal University of Chicago, Booth School of Business February 14 and 15, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:

More information

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics You can t see this text! Introduction to Computational Finance and Financial Econometrics Descriptive Statistics Eric Zivot Summer 2015 Eric Zivot (Copyright 2015) Descriptive Statistics 1 / 28 Outline

More information

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Statistics 431 Spring 2007 P. Shaman. Preliminaries Statistics 4 Spring 007 P. Shaman The Binomial Distribution Preliminaries A binomial experiment is defined by the following conditions: A sequence of n trials is conducted, with each trial having two possible

More information

Simulation Lecture Notes and the Gentle Lentil Case

Simulation Lecture Notes and the Gentle Lentil Case Simulation Lecture Notes and the Gentle Lentil Case General Overview of the Case What is the decision problem presented in the case? What are the issues Sanjay must consider in deciding among the alternative

More information

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions Random Variables Examples: Random variable a variable (typically represented by x) that takes a numerical value by chance. Number of boys in a randomly selected family with three children. Possible values:

More information

Fundamentals of Statistics

Fundamentals of Statistics CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct

More information

MATH 264 Problem Homework I

MATH 264 Problem Homework I MATH Problem Homework I Due to December 9, 00@:0 PROBLEMS & SOLUTIONS. A student answers a multiple-choice examination question that offers four possible answers. Suppose that the probability that the

More information

Chapter 7. Inferences about Population Variances

Chapter 7. Inferences about Population Variances Chapter 7. Inferences about Population Variances Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from

More information

The normal distribution is a theoretical model derived mathematically and not empirically.

The normal distribution is a theoretical model derived mathematically and not empirically. Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.

More information

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,

More information