Lab 9 Distributions and the Central Limit Theorem

Similar documents
Statistics/BioSci 141, Fall 2006 Lab 2: Probability and Probability Distributions October 13, 2006

Assessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

It is common in the field of mathematics, for example, geometry, to have theorems or postulates

QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL

The normal distribution is a theoretical model derived mathematically and not empirically.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

4. Basic distributions with R

BIOINFORMATICS MSc PROBABILITY AND STATISTICS SPLUS SHEET 1

CHAPTER TOPICS STATISTIK & PROBABILITAS. Copyright 2017 By. Ir. Arthur Daniel Limantara, MM, MT.

Lecture 6: Chapter 6

23.1 Probability Distributions

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Probability and Statistics

Frequency Distributions

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Central Limit Theorem (CLT) RLS

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Statistics 251: Statistical Methods Sampling Distributions Module

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Introduction to R (2)

Lecture 2. Probability Distributions Theophanis Tsandilas

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Continuous Probability Distributions

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

LAB 2 Random Variables, Sampling Distributions of Counts, and Normal Distributions

Continuous random variables

R Lab Session : Part 2

Describing Uncertain Variables

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Distributions in Excel

Probability and distributions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Introduction to Statistical Data Analysis II

Commonly Used Distributions

M249 Diagnostic Quiz

Sampling Distributions and the Central Limit Theorem

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

Statistics and Probability

The Normal Distribution

Prepared By. Handaru Jati, Ph.D. Universitas Negeri Yogyakarta.

Lecture 1: Review and Exploratory Data Analysis (EDA)

Probability. An intro for calculus students P= Figure 1: A normal integral

Essential Question: What is a probability distribution for a discrete random variable, and how can it be displayed?

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Section 6-1 : Numerical Summaries

Lean Six Sigma: Training/Certification Books and Resources

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Unit2: Probabilityanddistributions. 3. Normal distribution

Basic Procedure for Histograms

Lecture 3: Probability Distributions (cont d)

Chapter 6: Normal Probability Distributions

CHAPTERS 5 & 6: CONTINUOUS RANDOM VARIABLES

Corso di Identificazione dei Modelli e Analisi dei Dati

CS 237: Probability in Computing

Part V - Chance Variability

Stochastic Components of Models

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Chapter 6 Part 3 October 21, Bootstrapping

Statistics for Managers Using Microsoft Excel 7 th Edition

UNIT 4 MATHEMATICAL METHODS

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Data Analysis and Statistical Methods Statistics 651

Notes on bioburden distribution metrics: The log-normal distribution

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

Binomial and Normal Distributions

REINSURANCE RATE-MAKING WITH PARAMETRIC AND NON-PARAMETRIC MODELS

Chapter 4 Probability and Probability Distributions. Sections

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.

Populations and Samples Bios 662

STAT 157 HW1 Solutions

DECISION SUPPORT Risk handout. Simulating Spreadsheet models

Lecture 2 Describing Data

Parametric Statistics: Exploring Assumptions.

Review. Binomial random variable

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Data Analysis and Statistical Methods Statistics 651

Introduction to the Practice of Statistics using R: Chapter 4

Chapter 4 and 5 Note Guide: Probability Distributions

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Math 243 Lecture Notes

Exploring Data and Graphics

Lecture 3: Review of Probability, MATLAB, Histograms

Statistical Methods in Practice STAT/MATH 3379

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Normal Probability Distributions

Unit 5: Sampling Distributions of Statistics

Transcription:

Lab 9 Distributions and the Central Limit Theorem Distributions: You will need to become familiar with at least 5 types of distributions in your Introductory Statistics study: the Normal distribution, N(m,s), where the mean = m and standard deviation = s the Binomial distribution B(n,p), where n is the number of trials and p is probability of success on each trial the Uniform distribution, U(a,b), where a is the minimum value and b is the maximum value the Exponential distribution exp(l), where the density function is le -lx the Poisson distribution pois(l), where pois(k events in interval) = l k e -l /k! We have already experienced the normal distribution, and your text book talks about the usefulness of the others. Normal Distribution: This distribution, whose density function is shown below, is symmetrical, bell-shaped, and is completely described with 2 parameters, the mean m and standard deviation s. It is a continuous distribution. # Distributions # normal N(12,3) curve(dnorm(x, mean=12, sd=3), xlim=c(2, 22), ylim=c(0,.15), ylab="density", main="n(12,3)") abline(v=12, lty=2) Binomial Distribution: The next distribution is the Binomial, where B(n, p) stands for the binomial which has n trials, and each trial has an independent probability of p to be a success. The binomial is a discrete distribution. -1-

# binomial B(15,.2) heights <- dbinom(0:15, size=15, prob=.2) plot(0:15, heights, type="h", main="spike plot of binom(x)", xlab="k", ylab="p.d.f.") points(0:15, heights, pch=20, cex=1) Uniform Distribution: The Uniform distribution, a continuous distribution, has the form U(a, b), where a is the minimum value and b is the maximum value of x. # uniform(3,12) curve(dunif(x, min=3, max=12), xlim=c(0,15), ylab="density", main="u(3,12)") Poisson Distribution: The Poisson distribution is a discrete distribution, with only one parameter, l. -2-

Below is pois(7). # poisson B(7) heights <- dpois(0:15, lambda=7) plot(0:15, heights, type="h", main="spike plot of pois(x)", xlab="k", ylab="p.d.f.") points(0:15, heights, pch=20, cex=1) The Poisson can sometimes look like the Normal, except it is discrete, whereas the Normal is continuous. Exponential Distribution: The Exponential distribution (decay function) is a continuous distribution, with only one parameter, l, the decay parameter. It is very right skewed. Below is exp(.35). # exponential curve(dexp(x, rate=.35), xlim=c(0,25), ylab="density", main="exponential exp(.35)") -3-

Notice on all of the discrete distributions that we had to use different R coding from the usual curve() command, instead using plot() and points(), to make spike plots of the values, distinguishing it from the continuous distributions. Homework [1]: I made 6 variables in lab9.csv (labeled sample1 through sample6) from the list below, using the rnorm(), runif(), rbinom(), rexp(), and rpois(). These various R commands sample 20 values from the respective distributions. Make histograms of these sample1-sample6 and match them up with their parent distributions in the table below. Use your detective skills and techniques used in previous labs/study to accomplish this task. Homework [2]: Make a vector of 200 values sampled from the pois(7) distribution, find the mean of the sample and compare with the theoretical mean expected. Homework [3]: Repeat [2] with a vector of 200 sampled from the exp(.35) distribution. Quantile-quantile plots: Before we superimposed density plots of normal curves on our histograms, to sort of compare them for normality of the histogram distribution. We have another way to compare distributions to their normal counterpart, the QQ plot. See picture below.

In the figure 3.4 we have the 5 th percentile values of our skewed distribution on the y axis and the 5 th percentiles of values of its corresponding normally distributed distribution on the x axis. The dots plotted are the respective 5 th percentiles (normal, y distribution). The more normally distributed the tested distribution (on the y axis) is to normal, the more the dots will line up in a straight line. In our figure above, we see a large bow in the dots, indicating that we do not have a normally distributed distribution on the y axis. The graphs below show, from left to right, QQ plots which result from short symmetric, average symmetric, and long symmetric distributions on the y axis. The QQ plots below are from y distributions which are, from left to right, short skew, regular skew, and long skew. Note that the short skew distribution is also discrete. Note also that the skew right distributions tend to bow down (concave), and skew left tend to bow up (convex). Below is a QQ plot using the shown R code for a 50 element sample from the N(10,3) distribution along the y axis. -5-

# qq plots vec1 <-rnorm(n=50, mean=10, sd=3) qqnorm(vec1) qqline(vec1) We also plotted the line which the points should follow if the distribution (y axis) being compared to its companion normal (x axis), using the qqline() command. Homework [4]: Take the 6 samples from lab9.csv, and run qqnorm() plots, with qqline() reference drawn. Comment on your results. Extra Information with this lab: [1] I recommend creating data and doing various editing of the data in the EXCEL spreadsheet environment, saving the data as a.csv (comma delimited) or.txt (space delimited) file. You can do editing in the Studio environment, where you type/execute edit(data1) in the Console window, assuming the name of your data is data1. A spreadsheet view of the data will appear, where you can edit the data, then return to the program with the edited file. To save the result for later use type save(data1, file= file1.rda ) in your R workspace. To retrieve it later from the workspace type load( file1.rda ). [2] On scatter plots you can make at least 25 basic kinds of dots, using the pch= command. See below for the types. -6-

[3] You can use text() to type text within the plot and mtext() to type text within the margins. For text, you can use either the (x,y) coordinates we have used in previous labs or you can use side=1 (bottom), side=2 (left), side=3 (top) or side=4 (right). For using mtext() add line=4,which would place the text 4 (or however many) lines to the bottom/top/right/left of the plot, depending what you used for the side= command. Use col= red or whatever color you want to color the text. Using cex=.7 (or whatever number, where default is cex=1) adjusts the type size. Using adj=1 justifies text far right, adj=0 justifies far left, and using a number between 0 and 1 justifies between the right and left this usually is used to print text next to (left of, right of, etc.) of points. [4] ggplot2 information: graphs are made in layers, using aesthetics aes(), geoms(), and various layers of other items. The reference web page shown below gives much more detailed analysis of these ggplot2 items. Next -7-

Below are some aesthetics used with geoms in ggplot2. Next next -8-

A generic example is shown below. mydata is the data set, variable are the x and y variables from the data set, and the name given in this case is mygraph next next, shows how to color the graphs by category gender. Now, we add another layer. -9-

Next next next, a generic histogram. Next -10-

In another lab we will continue with some actual examples of code used to make various graphs and give more features of the ggplot2 package. -11-