Chapter 6 Part 3 October 21, Bootstrapping
|
|
- Vernon Potter
- 5 years ago
- Views:
Transcription
1 Chapter 6 Part 3 October 21, 2008 Bootstrapping
2 From the internet: The bootstrap involves repeated re-estimation of a parameter using random samples with replacement from the original data. Because the sampling is with replacement, some items in the data set are selected two or more times and others are not selected at all. When this is repeated a hundred or a thousand times, we get pseudo-samples that behave similarly to the underlying distribution of the data. Bootstrapping by Hand - Sampling with replacement Original 5000 observations:. sum(aftrig),det Fast Triglycerides-BL Anti Percentiles Smallest 1% % % Obs % Sum of Wgt % 140 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis return list scalars: r(n) = 5000 r(sum_w) = 5000 r(mean) = r(var) = Frequency I have omitted the x-axis to make it easier for you to see the very small bars on the right hand tail. You can see that we have a very skewed distribution. skewness = 2.4 (as opposed to 0) Fasting triglycerides for original dataset of 5000 Page -1-
3 Below we have bootstrapping by hand. I have selected 4 samples each of size 10 and obtained the mean of each set of 10. log: W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\Chapter6Part3.log log type: text opened on: 20 Oct 2008, 16:32:28. *dofile used is sample4setsof10.do. use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sum(aftrig) AFTRIG sort ID. set seed 50. bsample 10. list ID AFTRIG ID AFTRIG sum(aftrig) AFTRIG clear Page -2-
4 . use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sort ID. set seed 51. bsample 10. list ID AFTRIG ID AFTRIG sum(aftrig) AFTRIG clear. use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sort ID. set seed 52. bsample 10. list ID AFTRIG ID AFTRIG sum(aftrig) AFTRIG clear Page -3-
5 . use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sort ID. set seed 53. bsample 10. list ID AFTRIG ID AFTRIG sum(aftrig) AFTRIG log close log: W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\Chapter6Part3.log log type: text closed on: 20 Oct 2008, 16:32:28 So from each of the 4 sets of 10 observations we obtained an estimate of the mean value of the population of 5000 fasting triglycerides. If we had selected a 1000 samples of size 10, we would have a 1000 estimates of the means of fasting triglycerides (one for each sample). You could then create a histogram of the 1000 means. This sample of 1000 means is called the sampling distribution of means. If for each of the 1000 samples, we had asked for the variance instead of the mean, then we would have the sampling distribution of variances and we could obtain the mean of the 1000 variances. Below is how you get the means for the sampling distribution of means using single command rather than a separate command for each sample. We are assuming that the dataset samplingaht_5000.dta represents a population of people. That is, instead of treating it like the sample that it is, we are going to treat it as though it is a population. Results from dofile: Page -4-
6 . do "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\sample4setsof10singlecommand.do". clear. log using "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\Chapter6Part3No2.log log: W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\Chapter6Part3No2.log log type: text opened on: 20 Oct 2008, 18:28:17. *dofile used is sample4setsof10singlecommand.do. use "W:\WP51\Biometry\AAAABiostat1725_Fall2008\Handouts\Chapter 6\data\samplingAHT_5000.dta". sort ID. set seed 50. bs TGmeans = r(mean) TGvariances = r(var), reps(4) size(10) noisily saving(tg_r4_s10) :summarize AFTRIG bootstrap: First call to summarize with data as is: This first summarize is for all 5000 people. AFTRIG Warning: Since summarize is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data. Bootstrap replications (4) This is the mean of the first sample of size 10. AFTRIG This is the mean of the second sample of size 10. Page -5-
7 AFTRIG This is the mean of the third sample of size 10. AFTRIG This is the mean of the fourth sample of size 10. AFTRIG Bootstrap results Number of obs = 5000 Replications = 4 command: summarize AFTRIG TGmeans: r(mean) TGvariances: r(var) Observed Bootstrap Normal-based Coef. Std. Err. z P> z [95% Conf. Interval] TGmeans TGvariances clear The observed coefficients above are the mean (169.1) and variance ( ) of AFTRIG with all 5000 participants. As part of the bootstrapping routine we asked Stata to save the mean and variance for each of the 4 samples of size 10 in a data set that we called TG_R4_S10.dta. Notice below that the data set has only 4 observations because we asked for only 4 replications.. use TG_R4_S10.dta (bootstrap: summarize) Page -6-
8 . des Contains data from TG_R4_S10.dta obs: 4 bootstrap: summarize vars: 2 20 Oct :28 size: 48 (99.9% of memory free) storage display value variable name type format label variable label TGmeans float %9.0g r(mean) TGvariances float %9.0g r(var) Sorted by:. list TGmeans TGvariances TGmeans TGvari~s Notice that the only one of the means that is the same as the means obtained by hand is the first one because both by hand version and by single command version use the same seed (50). Notice below that the mean of the 4 sample means is which is not very close to 169.1, the mean of the population. We need to select more than 4 samples and samples larger than 10 to get a good estimate of the population mean.. sum(tgmeans),det r(mean) Percentiles Smallest 1% % % Obs 4 25% Sum of Wgt. 4 50% Mean Largest Std. Dev % % Variance % Skewness % Kurtosis end of do-file Below is a data set of 1000 samples of size 100 which we obtained from the original dataset of Page -7-
9 Notice below that the mean of our 1000 samples of size 100 is Now we are getting closer to the , the mean of the original distribution of size clear. use "W:\WP51\Biometry\AAAABiostatFall2008\Handouts\Chapter 6\Data\samplingAHT_5000.dta", clear. log using W:\WP51\Biometry\AAAABiostatFall2008\Handouts\Chapter 6\Data\classbootstrap.log log: classbootstrap.log log type: text opened on: 20 Oct 2008, 22:31:13. set more off. sort ID. set seed 50. bs TGmeans = r(mean) TGvariances = r(var), reps(1000) size(100) saving(tg_r1000_s100):summarize AFTRIG (running summarize on estimation sample) Warning: Since summarize is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data. Bootstrap replications (1000) Bootstrap results Number of obs = 5000 Replications = 1000 command: summarize AFTRIG TGmeans: r(mean) TGvariances: r(var) Observed Bootstrap Normal-based Coef. Std. Err. z P> z [95% Conf. Interval] TGmeans TGvariances clear Note that the observed coefficient above gives the mean and variance of the original data set of Page -8-
10 . use "W:\WP51\Biometry\AAAABiostatFall2008\Handouts\Chapter 6\Data\TG_R1000_S100.dta (bootstrap: summarize). des Contains data from TG_R1000_S100.dta obs: 1,000 bootstrap: summarize vars: 2 11 Oct :01 size: 12,000 (99.9% of memory free) storage display value variable name type format label variable label TGmeans float %9.0g r(mean) TGvariances float %9.0g r(var) Sorted by:. sum(tgmeans),det r(mean) Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis Notice that we have a better estimate (168.6) of the population mean list TGmeans TGvariances in 1/ TGmeans TGvari~s The graph below is a histogram of the 1000 means we got above. Notice that this histogram is rather symmetric looking and not at all like the very skewed histogram of the original variable AFTRIG. Page -9-
11 Frequency Bootstrapping using samplingaht_5000.dta seed = 50, reps = 1000 and size = r(mean) Now let us get more samples and larger samples than we have before. Below we have the output and histogram of 5000 samples of size 3000 selected from the original distribution of AFTRIG.. use "W:\WP51\Biometry\AAAABiostatFall2008\Handouts\Chapter 6\Data\samplingAHT_5000.dta",clear. set more off. sort ID. set seed 50. bs TGmeans = r(mean) TGvariances = r(var), reps(5000) size(3000) saving(tg_r5000_s3000):summarize AFTRIG bootstrap: First call to summarize with data as is: AFTRIG Warning: Since summarize is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data. Bootstrap replications (5000) Page -10-
12 Below is a partial list of the sample means. Sample 1 AFTRIG Sample 2 AFTRIG Sample 3 AFTRIG Sample 4 Sample 5 AFTRIG AFTRIG etc. Let us look at the results of all 5000 samples of size use TG_R5000_S3000.dta (bootstrap: summarize) Page -11-
13 . des Contains data from TG_R5000_S3000.dta obs: 5,000 bootstrap: summarize vars: 2 20 Oct :11 size: 60,000 (99.8% of memory free) - storage display value variable name type format label variable label - TGmeans float %9.0g r(mean) TGvariances float %9.0g r(var) - Sorted by: Below is the mean of the 5000 means.. sum(tgmeans),det r(mean) Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean this is a pretty Largest Std. Dev good estimate of 75% % Variance % Skewness % Kurtosis list TGmeans TGvariances in 1/ TGmeans TGvari~s Notice that these 5 means match up with the means of the 5 samples listed above The variances are the SDs squared from the samples above Page -12-
14 Frequency Bootstrapping using samplingaht_5000.dta seed = 50, reps = 5000 and size = r(mean) The normal curve that I have superimposed on the histogram has the same mean ( ) as the distribution of the 5000 AFTRIG means. The distribution of the 5000 AFTRIG means has kurtosis 3.03 (that is pretty close to 3) and skewness 0.08 (which is pretty close to 0). So the distribution matches up pretty well with a normal distribution. You will see that the table below now matches up with the Stata runs above. The n in the table below is the size of the samples. Page -13-
15 Page -14-
16 There are a number of things to notice in the table above. 1. As the number of repetitions and the sample size get larger the values in the column labeled means get closer to (the mean of the 5000 AFTRIG values). This is Fact 1: μ X = μ X 2. As the number of repetitions and the sample size get larger the values in the column labeled SD (i.e. the standard deviation of the distribution of means) σ n begins to look like the column labeled. The is the size of the samples selected. This is Fact 2: 2 σ 2 X Var( X ) Var( X ) = σ = = X n n or taking the square root of each of the terms above we get The SD of the sampling distributions of means is the SEM. n 3. As the number of repetitions and the sample size get larger the values in the column labeled Min get larger and those in the column labeled Max get smaller. 4. As the number of repetitions and the sample size get larger the values in the column labeled skewness get closer to zero and the values in the column labeled kurtosis get closer to 3. Page -15-
You created this PDF from an application that is not licensed to print to novapdf printer (http://www.novapdf.com)
Monday October 3 10:11:57 2011 Page 1 (R) / / / / / / / / / / / / Statistics/Data Analysis Education Box and save these files in a local folder. name:
More informationQuestion 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points
Economics 102: Analysis of Economic Data Cameron Spring 2015 April 23 Department of Economics, U.C.-Davis First Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course
More informationEcon 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.
Econ 371 Problem Set #4 Answer Sheet 6.2 This question asks you to use the results from column (1) in the table on page 213. a. The first part of this question asks whether workers with college degrees
More informationBasic Procedure for Histograms
Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that
More informationMonte Carlo Simulation (General Simulation Models)
Monte Carlo Simulation (General Simulation Models) Revised: 10/11/2017 Summary... 1 Example #1... 1 Example #2... 10 Summary Monte Carlo simulation is used to estimate the distribution of variables when
More informationChapter 11 Part 6. Correlation Continued. LOWESS Regression
Chapter 11 Part 6 Correlation Continued LOWESS Regression February 17, 2009 Goal: To review the properties of the correlation coefficient. To introduce you to the various tools that can be used to decide
More informationHandout seminar 6, ECON4150
Handout seminar 6, ECON4150 Herman Kruse March 17, 2013 Introduction - list of commands This week, we need a couple of new commands in order to solve all the problems. hist var1 if var2, options - creates
More informationChapter 6 Part 6. Confidence Intervals chi square distribution binomial distribution
Chapter 6 Part 6 Confidence Intervals chi square distribution binomial distribution October 8, 008 Brief review of what we covered last time. In order to get a confidence interval for the population mean
More informationNormal populations. Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi
Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi In previous labs where we investigated the distribution of the sample mean and sample proportion, we often noticed that the distribution
More informationFinal Exam - section 1. Thursday, December hours, 30 minutes
Econometrics, ECON312 San Francisco State University Michael Bar Fall 2013 Final Exam - section 1 Thursday, December 19 1 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.
More informationCHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS
CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS Note: This section uses session window commands instead of menu choices CENTRAL LIMIT THEOREM (SECTION 7.2 OF UNDERSTANDABLE STATISTICS) The Central Limit
More information*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1
*1A Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1 Variable Obs Mean Std Dev Min Max --- housereg 21 2380952
More informationRationale. Learning about return and risk from the historical record and beta estimation. T Bills and Inflation
Learning about return and risk from the historical record and beta estimation Reference: Investments, Bodie, Kane, and Marcus, and Investment Analysis and Behavior, Nofsinger and Hirschey Nattawut Jenwittayaroje,
More informationECO220Y, Term Test #2
ECO220Y, Term Test #2 December 4, 2015, 9:10 11:00 am U of T e-mail: @mail.utoronto.ca Surname (last name): Given name (first name): UTORID: (e.g. lihao8) Instructions: You have 110 minutes. Keep these
More informationIOP 201-Q (Industrial Psychological Research) Tutorial 5
IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,
More informationMeasures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean
Measure of Center Measures of Center The value at the center or middle of a data set 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) 1 2 Mean Notation The measure of center obtained by adding the values
More informationtm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}
PS 4 Monday August 16 01:00:42 2010 Page 1 tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} log: C:\web\PS4log.smcl log type: smcl opened on:
More informationModule 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1
Module 9: Single-level and Multilevel Models for Ordinal Responses Pre-requisites Modules 5, 6 and 7 Stata Practical 1 George Leckie, Tim Morris & Fiona Steele Centre for Multilevel Modelling If you find
More informationu panel_lecture . sum
u panel_lecture sum Variable Obs Mean Std Dev Min Max datastre 639 9039644 6369418 900228 926665 year 639 1980 2584012 1976 1984 total_sa 639 9377839 3212313 682 441e+07 tot_fixe 639 5214385 1988422 642
More informationPercentiles, STATA, Box Plots, Standardizing, and Other Transformations
Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Lecture 3 Reading: Sections 5.7 54 Remember, when you finish a chapter make sure not to miss the last couple of boxes: What Can Go
More informationQuestion scores. Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d M ult:choice Points
Economics 02: Analysis of Economic Data Cameron Winter 204 January 30 Department of Economics, U.C.-Davis First Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course
More informationOverview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution
PSY 464 Advanced Experimental Design Describing and Exploring Data The Normal Distribution 1 Overview/Outline Questions-problems? Exploring/Describing data Organizing/summarizing data Graphical presentations
More informationMath 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment
Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html Math 2311 Class
More informationNCSS Statistical Software. Reference Intervals
Chapter 586 Introduction A reference interval contains the middle 95% of measurements of a substance from a healthy population. It is a type of prediction interval. This procedure calculates one-, and
More informationSimple Descriptive Statistics
Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency
More informationSome Characteristics of Data
Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key
More informationDescriptive Analysis
Descriptive Analysis HERTANTO WAHYU SUBAGIO Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable
More informationThe Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis
The Central Limit Theorem (Solutions) COR1-GB1305 Statistics and Data Analysis 1 You draw a random sample of size n = 64 from a population with mean µ = 50 and standard deviation σ = 16 From this, you
More informationKey Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions
SGSB Workshop: Using Statistical Data to Make Decisions Module 2: The Logic of Statistical Inference Dr. Tom Ilvento January 2006 Dr. Mugdim Pašić Key Objectives Understand the logic of statistical inference
More informationDescription Remarks and examples References Also see
Title stata.com example 41g Two-level multinomial logistic regression (multilevel) Description Remarks and examples References Also see Description We demonstrate two-level multinomial logistic regression
More informationDescriptive Statistics
Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations
More informationR & R Study. Chapter 254. Introduction. Data Structure
Chapter 54 Introduction A repeatability and reproducibility (R & R) study (sometimes called a gauge study) is conducted to determine if a particular measurement procedure is adequate. If the measurement
More informationStat 101 Exam 1 - Embers Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.
More informationDATA SUMMARIZATION AND VISUALIZATION
APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296
More informationFinancial Econometrics Jeffrey R. Russell Midterm 2014
Name: Financial Econometrics Jeffrey R. Russell Midterm 2014 You have 2 hours to complete the exam. Use can use a calculator and one side of an 8.5x11 cheat sheet. Try to fit all your work in the space
More informationMonte Carlo Simulation (Random Number Generation)
Monte Carlo Simulation (Random Number Generation) Revised: 10/11/2017 Summary... 1 Data Input... 1 Analysis Options... 6 Summary Statistics... 6 Box-and-Whisker Plots... 7 Percentiles... 9 Quantile Plots...
More informationHydrology 4410 Class 29. In Class Notes & Exercises Mar 27, 2013
Hydrology 4410 Class 29 In Class Notes & Exercises Mar 27, 2013 Log Normal Distribution We will not work an example in class. The procedure is exactly the same as in the normal distribution, but first
More informationDavid Tenenbaum GEOG 090 UNC-CH Spring 2005
Simple Descriptive Statistics Review and Examples You will likely make use of all three measures of central tendency (mode, median, and mean), as well as some key measures of dispersion (standard deviation,
More informationMaximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018
Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 3, 208 [This handout draws very heavily from Regression Models for Categorical
More informationStatistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient
Statistics & Flood Frequency Chapter 3 Dr. Philip B. Bedient Predicting FLOODS Flood Frequency Analysis n Statistical Methods to evaluate probability exceeding a particular outcome - P (X >20,000 cfs)
More informationComputing Statistics ID1050 Quantitative & Qualitative Reasoning
Computing Statistics ID1050 Quantitative & Qualitative Reasoning Single-variable Statistics We will be considering six statistics of a data set Three measures of the middle Mean, median, and mode Two measures
More informationChapter 7 1. Random Variables
Chapter 7 1 Random Variables random variable numerical variable whose value depends on the outcome of a chance experiment - discrete if its possible values are isolated points on a number line - continuous
More information4.2 Probability Distributions
4.2 Probability Distributions Definition. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The probability distribution of a random variable tells us what the
More informationDot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.
Introduction We continue our study of descriptive statistics with measures of dispersion, such as dot plots, stem and leaf displays, quartiles, percentiles, and box plots. Dot plots, a stem-and-leaf display,
More informationSampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections
Sampling Distribution of and Simulation Methods Lecture 11 Reading: Sections 1.3 1.5 1 Ontario Public Sector Salaries Public Sector Salary Disclosure Act, 1996 Requires organizations that receive public
More informationMATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)
LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of
More information1 Describing Distributions with numbers
1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write
More informationSummarising Data. Summarising Data. Examples of Types of Data. Types of Data
Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017
More informationLESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY
LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY 1 THIS WEEK S PLAN Part I: Theory + Practice ( Interval Estimation ) Part II: Theory + Practice ( Interval Estimation ) z-based Confidence Intervals for a Population
More informationchapter 2-3 Normal Positive Skewness Negative Skewness
chapter 2-3 Testing Normality Introduction In the previous chapters we discussed a variety of descriptive statistics which assume that the data are normally distributed. This chapter focuses upon testing
More informationHypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD
Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD MAJOR POINTS Sampling distribution of the mean revisited Testing hypotheses: sigma known An example Testing hypotheses:
More informationProblem Set 6 ANSWERS
Economics 20 Part I. Problem Set 6 ANSWERS Prof. Patricia M. Anderson The first 5 questions are based on the following information: Suppose a researcher is interested in the effect of class attendance
More informationThe data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998
Economics 312 Sample Project Report Jeffrey Parker Introduction This project is based on Exercise 2.12 on page 81 of the Hill, Griffiths, and Lim text. It examines how the sale price of houses in Stockton,
More informationLecture 12: The Bootstrap
Lecture 12: The Bootstrap Reading: Chapter 5 STATS 202: Data mining and analysis October 20, 2017 1 / 16 Announcements Midterm is on Monday, Oct 30 Topics: chapters 1-5 and 10 of the book everything until
More informationDescribing Data: One Quantitative Variable
STAT 250 Dr. Kari Lock Morgan The Big Picture Describing Data: One Quantitative Variable Population Sampling SECTIONS 2.2, 2.3 One quantitative variable (2.2, 2.3) Statistical Inference Sample Descriptive
More informationCameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17
Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17 Answer all questions in the space provided on the exam. Total of 36 points (and worth 22.5% of final grade). Read each question carefully,
More informationChapter 7 Sampling Distributions and Point Estimation of Parameters
Chapter 7 Sampling Distributions and Point Estimation of Parameters Part 1: Sampling Distributions, the Central Limit Theorem, Point Estimation & Estimators Sections 7-1 to 7-2 1 / 25 Statistical Inferences
More informationSummary of Statistical Analysis Tools EDAD 5630
Summary of Statistical Analysis Tools EDAD 5630 Test Name Program Used Purpose Steps Main Uses/Applications in Schools Principal Component Analysis SPSS Measure Underlying Constructs Reliability SPSS Measure
More informationDiploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers
Cumulative frequency Diploma in Business Administration Part Quantitative Methods Examiner s Suggested Answers Question 1 Cumulative Frequency Curve 1 9 8 7 6 5 4 3 1 5 1 15 5 3 35 4 45 Weeks 1 (b) x f
More informationECON 214 Elements of Statistics for Economists 2016/2017
ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and
More informationRandom Variables and Probability Distributions
Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering
More information1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:
1 Exercise One Note that the data is not grouped! 1.1 Calculate the mean ROI Below you find the raw data in tabular form: Obs Data 1 18.5 2 18.6 3 17.4 4 12.2 5 19.7 6 5.6 7 7.7 8 9.8 9 19.9 10 9.9 11
More informationAnalysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority
Chapter 235 Analysis of 2x2 Cross-Over Designs using -ests for Non-Inferiority Introduction his procedure analyzes data from a two-treatment, two-period (2x2) cross-over design where the goal is to demonstrate
More informationBoth the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.
Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of
More informationApplication of the Bootstrap Estimating a Population Mean
Application of the Bootstrap Estimating a Population Mean Movie Average Shot Lengths Sources: Barry Sands Average Shot Length Movie Database L. Chihara and T. Hesterberg (2011). Mathematical Statistics
More informationDescription Quick start Menu Syntax Options Remarks and examples Acknowledgment Also see
Title stata.com collapse Make dataset of summary statistics Description Quick start Menu Syntax Options Remarks and examples Acknowledgment Also see Description collapse converts the dataset in memory
More informationChapter 7. Random Variables
Chapter 7 Random Variables Making quantifiable meaning out of categorical data Toss three coins. What does the sample space consist of? HHH, HHT, HTH, HTT, TTT, TTH, THT, THH In statistics, we are most
More informationLecture 2 Describing Data
Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms
More information(ii) Give the name of the California website used to find the various insurance plans offered under the Affordable care Act (Obamacare).
Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Winter 18 Answer all questions in the space provided on the exam. Total of 36 points (and worth 22.5% of final grade). Read each question carefully,
More informationLabor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014
Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014 In class, Lecture 11, we used a new dataset to examine labor force participation and wages across groups.
More informationDescriptive Statistics (Devore Chapter One)
Descriptive Statistics (Devore Chapter One) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 0 Perspective 1 1 Pictorial and Tabular Descriptions of Data 2 1.1 Stem-and-Leaf
More informationIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics 17.871 Types of Variables ~Nominal (Quantitative) Nominal (Qualitative) categorical Ordinal Interval or ratio Describing data Moment Non-mean based measure Center
More informationStatistical Intervals (One sample) (Chs )
7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and
More informationStatistics for Business and Economics
Statistics for Business and Economics Chapter 7 Estimation: Single Population Copyright 010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-1 Confidence Intervals Contents of this chapter: Confidence
More informationCHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =
Solutions to End-of-Section and Chapter Review Problems 225 CHAPTER 6 6.1 (a) P(Z < 1.20) = 0.88493 P(Z > 1.25) = 1 0.89435 = 0.10565 P(1.25 < Z < 1.70) = 0.95543 0.89435 = 0.06108 (d) P(Z < 1.25) or Z
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 10 (MWF) Checking for normality of the data using the QQplot Suhasini Subba Rao Checking for
More informationGGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1
GGraph 9 Gender : R Linear =.43 : R Linear =.769 8 7 6 5 4 3 5 5 Males Only GGraph Page R Linear =.43 R Loess 9 8 7 6 5 4 5 5 Explore Case Processing Summary Cases Valid Missing Total N Percent N Percent
More informationRandom Effects ANOVA
Random Effects ANOVA Grant B. Morgan Baylor University This post contains code for conducting a random effects ANOVA. Make sure the following packages are installed: foreign, lme4, lsr, lattice. library(foreign)
More informationMaximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017
Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 0, 207 [This handout draws very heavily from Regression Models for Categorical
More informationNumerical Descriptive Measures. Measures of Center: Mean and Median
Steve Sawin Statistics Numerical Descriptive Measures Having seen the shape of a distribution by looking at the histogram, the two most obvious questions to ask about the specific distribution is where
More informationStatistical Tables Compiled by Alan J. Terry
Statistical Tables Compiled by Alan J. Terry School of Science and Sport University of the West of Scotland Paisley, Scotland Contents Table 1: Cumulative binomial probabilities Page 1 Table 2: Cumulative
More informationLecture Data Science
Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics Foundations JProf. Dr. Claudia Wagner Learning Goals How to describe sample data? What is mode/median/mean?
More informationFinancial Econometrics Jeffrey R. Russell. Midterm 2014 Suggested Solutions. TA: B. B. Deng
Financial Econometrics Jeffrey R. Russell Midterm 2014 Suggested Solutions TA: B. B. Deng Unless otherwise stated, e t is iid N(0,s 2 ) 1. (12 points) Consider the three series y1, y2, y3, and y4. Match
More informationThe following content is provided under a Creative Commons license. Your support
MITOCW Recitation 6 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make
More informationValid Missing Total. N Percent N Percent N Percent , ,0% 0,0% 2 100,0% 1, ,0% 0,0% 2 100,0% 2, ,0% 0,0% 5 100,0%
dimension1 GET FILE= validacaonestscoremédico.sav' (só com os 59 doentes) /COMPRESSED. SORT CASES BY UMcpEVA (D). EXAMINE VARIABLES=UMcpEVA BY NoRespostasSignif /PLOT BOXPLOT HISTOGRAM NPPLOT /COMPARE
More informationWeb Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion
Web Appendix Are the effects of monetary policy shocks big or small? Olivier Coibion Appendix 1: Description of the Model-Averaging Procedure This section describes the model-averaging procedure used in
More informationDescription of Data I
Description of Data I (Summary and Variability measures) Objectives: Able to understand how to summarize the data Able to understand how to measure the variability of the data Able to use and interpret
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 22 January :00 16:00
Two Hours MATH38191 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER STATISTICAL MODELLING IN FINANCE 22 January 2015 14:00 16:00 Answer ALL TWO questions
More informationDECISION SUPPORT Risk handout. Simulating Spreadsheet models
DECISION SUPPORT MODELS @ Risk handout Simulating Spreadsheet models using @RISK 1. Step 1 1.1. Open Excel and @RISK enabling any macros if prompted 1.2. There are four on-line help options available.
More informationStandard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011
Standard Deviation Lecture 18 Section 5.3.4 Robb T. Koether Hampden-Sydney College Mon, Sep 26, 2011 Robb T. Koether (Hampden-Sydney College) Standard Deviation Mon, Sep 26, 2011 1 / 42 Outline 1 Variability
More informationBusiness Statistics 41000: Probability 4
Business Statistics 41000: Probability 4 Drew D. Creal University of Chicago, Booth School of Business February 14 and 15, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:
More informationIntroduction to Computational Finance and Financial Econometrics Descriptive Statistics
You can t see this text! Introduction to Computational Finance and Financial Econometrics Descriptive Statistics Eric Zivot Summer 2015 Eric Zivot (Copyright 2015) Descriptive Statistics 1 / 28 Outline
More informationStatistics 431 Spring 2007 P. Shaman. Preliminaries
Statistics 4 Spring 007 P. Shaman The Binomial Distribution Preliminaries A binomial experiment is defined by the following conditions: A sequence of n trials is conducted, with each trial having two possible
More informationSimulation Lecture Notes and the Gentle Lentil Case
Simulation Lecture Notes and the Gentle Lentil Case General Overview of the Case What is the decision problem presented in the case? What are the issues Sanjay must consider in deciding among the alternative
More informationExamples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions
Random Variables Examples: Random variable a variable (typically represented by x) that takes a numerical value by chance. Number of boys in a randomly selected family with three children. Possible values:
More informationFundamentals of Statistics
CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct
More informationMATH 264 Problem Homework I
MATH Problem Homework I Due to December 9, 00@:0 PROBLEMS & SOLUTIONS. A student answers a multiple-choice examination question that offers four possible answers. Suppose that the probability that the
More informationChapter 7. Inferences about Population Variances
Chapter 7. Inferences about Population Variances Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from
More informationThe normal distribution is a theoretical model derived mathematically and not empirically.
Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.
More informationAP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE
AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,
More information