Introduction to Statistical Data Analysis II

Similar documents
2011 Pearson Education, Inc

Module 4: Probability

Theoretical Foundations

The normal distribution is a theoretical model derived mathematically and not empirically.

Midterm Exam III Review

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Part V - Chance Variability

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Chapter 7: Point Estimation and Sampling Distributions

Review of the Topics for Midterm I

2017 Fall QMS102 Tip Sheet 2

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Basic Procedure for Histograms

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Statistics 6 th Edition

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

DATA SUMMARIZATION AND VISUALIZATION

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Introduction to Statistics I

Chapter 5. Statistical inference for Parametric Models

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Chapter Four: Introduction To Inference 1/50

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Counting Basics. Venn diagrams

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Sampling and sampling distribution

Probability Distributions

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

The Binomial Probability Distribution

Chapter 5: Statistical Inference (in General)

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

Statistical Methods in Practice STAT/MATH 3379

Probability. An intro for calculus students P= Figure 1: A normal integral

MidTerm 1) Find the following (round off to one decimal place):

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Distributions in Excel

CHAPTER 5 Sampling Distributions

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

5.3 Statistics and Their Distributions

Lecture 9 - Sampling Distributions and the CLT

Data Analysis and Statistical Methods Statistics 651

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Statistics, Measures of Central Tendency I

Data Distributions and Normality

value BE.104 Spring Biostatistics: Distribution and the Mean J. L. Sherley

Statistics for IT Managers

Lecture 6: Confidence Intervals

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Some Characteristics of Data

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Descriptive Analysis

Statistics 431 Spring 2007 P. Shaman. Preliminaries

MATH 3200 Exam 3 Dr. Syring

Chapter 7 1. Random Variables

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Learning Objectives for Ch. 5

Confidence Intervals Introduction

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Statistics for Managers Using Microsoft Excel 7 th Edition

Describing Uncertain Variables

Random Variables and Probability Distributions

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

Lecture 3: Probability Distributions (cont d)

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

The topics in this section are related and necessary topics for both course objectives.

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Topic 8: Model Diagnostics

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

6.2 Normal Distribution. Normal Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

Chapter 4. The Normal Distribution

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Chapter 4 Probability Distributions

The Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Chapter 15: Sampling distributions

4.3 Normal distribution

Numerical Descriptions of Data

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

PROBABILITY DISTRIBUTIONS

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

MAS187/AEF258. University of Newcastle upon Tyne

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

UNIT 4 MATHEMATICAL METHODS

Lecture 6: Chapter 6

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Statistics 251: Statistical Methods Sampling Distributions Module

Lecture 2. Probability Distributions Theophanis Tsandilas

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Transcription:

Introduction to Statistical Data Analysis II JULY 2011 Afsaneh Yazdani

Preface Major branches of Statistics: - Descriptive Statistics - Inferential Statistics

Preface What is Inferential Statistics?

Preface What is Inferential Statistics? Making statements about population based on information contained in the sample of that population. Population Sample

Preface What is Inferential Statistics? Making statements about population based on information contained in the sample of that population. Need to assess degree of accuracy to which Population the sample represents the population Sample

Preface What is Inferential Statistics? Making statements about population based on information contained in the sample of that population. Presence of Population uncertainty Sample

Preface Probability is the: Language of uncertainty Tool for making inferences

Probability Probability Definitions: Classical Interpretation: Each possible distinct result is called an outcome; An event is identified as a collection of outcomes. Then probability of an event E is: Pr event E = Number of outcomes favorable to event E (N e) Total number of possible outcomes (N)

Probability Probability Definitions: Relative frequency Interpretation: Is an empirical approach to probability; if an experiment is conducted n different times and if event E occurs on n e of these trials, then the probability of event E is approximately: Pr event E n e n

Probability Probability Definitions: Relative frequency Interpretation: Is an empirical approach to probability; if an experiment is conducted n different times and if event E occurs on n e of these trials, then the probability of event E is approximately: Pr event E n e n very large number of observations or repetitions

Probability Probability Definitions: Subjective Interpretation: Subjective or personal probability, the problem is that they can vary from person to person and they cannot be checked.

Probability Basic Event Relations and Probability Laws: The probability of an event, say event A, will always satisfy the property: 0 P(A) 1

Probability Basic Event Relations and Probability Laws: The probability of an event, say event A, will always satisfy the property: 0 P(A) 1 Impossible Event Sure Event

Probability Basic Event Relations and Probability Laws: Two events A and B are said to be mutually exclusive if the occurrence of one of the events excludes the possibility of the occurrence of the other event: P(either A or B) = P(A) + P(B)

Probability Basic Event Relations and Probability Laws: The complement of an event A is the event that A does not occur. The complement of A is denoted by the symbol A: P(A) + P(A) = 1

Probability Basic Event Relations and Probability Laws: The union of two events A and B is the set of all outcomes that are included in either A or B (or both). The union is denoted as A B. A B

Probability Basic Event Relations and Probability Laws: The intersection of two events A and B is the set of all outcomes that are included in both A and B. The intersection is denoted as A B. A B

Probability Basic Event Relations and Probability Laws: Consider two events A and B ; the probability of the union of A and B is: P(A B) = P(A) + P(B) P(A B)

Probability Conditional Probability and Independence: Consider two events A and B with nonzero probabilities, P(A) and P(B). The conditional probability of event A given event B P(A B) = P(A B) P(B)

Probability Conditional Probability and Independence: Multiplication Law implies that the probability of the intersection of two events A and B is: P(A B)=P(A B)P B

Probability Conditional Probability and Independence: Two events A and B are independent if: P A B = P A P(A B)=P A P B

Probability Random variable: The quantitative variable Y is called a random variable when the value that Y assumes in a given experiment is a chance or random outcome.

Random Variable Discrete Random Variable: When observations on a quantitative random variable can assume only a countable number of values.

Random Variable Continuous Random Variable: When observations on a quantitative random variable can assume any one of the uncountable number of values in a line interval.

We have drawn a sample from a population We need to make an inference about the population We need to know the probability of observing a particular sample outcome We need to know the probability associated with each value of the variable Y We need to know the probability distribution of the variable Y

Probability Distributions Discrete Random Variables The Binomial A binomial experiment has the following properties: - The experiment consists of n identical trials. - Each trial results in one of two outcomes (a success/a failure). - The probability of success on a single trial is equal to π and π remains the same from trial to trial.

Probability Distributions Discrete Random Variables The Binomial A binomial experiment has the following properties: - The trials are independent; that is, the outcome of one trial does not influence the outcome of any other trial. - The random variable Y is the number of successes observed during the n trials.

Probability Distributions Discrete Random Variables The Binomial The probability of observing y successes in n trials of a binomial experiment is: Pr Y = y = n! y! n y! πy (1 π) n y Where π is the probability of success.

Probability Distributions Discrete Random Variables The Binomial The probability of observing y successes in n trials of a binomial experiment is: Pr Y = y = n! y! n y! πy (1 π) n y Where π is the probability of success. σ = μ = nπ nπ(1 π)

Probability Distributions Discrete Random Variables The Poisson Applicable for modeling of events of a particular time over a unit of time or space.

Probability Distributions Discrete Random Variables The Poisson Let Y be the number of events occurring during a fixed time interval of length t. Then the probability distribution of Y is Poisson, provided following conditions: - Events occur one at a time; two or more events do not occur precisely at the same time

Probability Distributions Discrete Random Variables The Poisson Let Y be the number of events occurring during a fixed time interval of length t. Then the probability distribution of Y is Poisson, provided following conditions: - Occurrence (or nonoccurrence) of an event during one period does not affect the probability of an event occurring at some other time.

Probability Distributions Discrete Random Variables The Poisson Let Y be the number of events occurring during a fixed time interval of length t. Then the probability distribution of Y is Poisson, provided following conditions: - The expected number of events during one period is the same as the expected number of events in any other period.

Probability Distributions Discrete Random Variables The Poisson Let Y be the number of events occurring during a fixed time interval of length t. Then: Pr Y = y = y e y!

Probability Distributions Discrete Random Variables The Poisson Let Y be the number of events occurring during a fixed time interval of length t. Then: Pr Y = y = y e y! μ= =

Probability Distributions Discrete Random Variables The Binomial & The Poisson When n is large and is small in a binomial experiment, the Poisson distribution (with = n ) provides a good approximation to the binomial distribution.

Probability Distributions Continuous Random Variables The Normal Normal distribution (that has a smooth bell-shaped curve, symmetrical about the mean, ) plays an important role in statistical inference. f Y (y) = 1 2πσ (y μ)2 e 2σ 2

Probability Distributions Continuous Random Variables The Normal Empirical Law

Probability Distributions Continuous Random Variables The Normal -3-2 -1 0 1 2 3 z = y μ σ

Random Sampling A sample of n measurements selected from a population is said to be a random sample if every different sample of size n from the population has a non-zero probability of being selected.

Random Sampling A sample of n measurements selected from a population is said to be a random sample if every different sample of size n from the population has a non-zero probability of being selected. Sample data selected in a nonrandom fashion are frequently distorted by a selection bias. A selection bias exists whenever there is a systematic tendency to over-represent or underrepresent some part of the population.

Random Sampling Sample Statistic: - Is a function of sample values - Is a random variable - It is subject to random variation because it is based on a random sample of measurements selected from the population of interest. - Like any other random variable, has a probability distribution.

Random Sampling Sample Statistic: - Is a function of sample values - Is a random variable - It is subject to random variation because it is based on a random sample of measurements selected from the population of interest. - Like any other random variable, has a probability distribution. Sampling Distribution

Sampling Distribution Central Limit Theorem (for y): Let: - y be sample mean computed from a random sample of n measurements from a population having a mean, and finite standard deviation - μ y and σ y be the mean and standard deviation of the sampling distribution of y, respectively. Based on repeated random samples of size n from the population, we can conclude the following:

Sampling Distribution Central Limit Theorem (for y): - μ y = μ - σ y = σ n - When n is large the sampling distribution of y will be approximately normal. - When the population distribution is normal, sampling distribution of y is exactly normal for any sample size n.

Sampling Distribution The Shape of Sampling Distribution is affected by - Sample Size n - Shape of distribution of population measurements

Sampling Distribution The Shape of Sampling Distribution is affected by - Sample Size n - Shape of distribution of population measurements if symmetric, CLT hold for n 30 if heavily skewed, n should be larger

Sampling Distribution Central Limit Theorem (for y = y): Let: - y be the sum of a random sample of n measurements from a population having a mean, and finite standard deviation - μ y and σ y be the mean and standard deviation of the sampling distribution of y respectively. Based on repeated random samples of size n from the population, we can conclude the following:

Sampling Distribution Central Limit Theorem (for y): - μ y = nμ - σ y = nσ - When n is large the sampling distribution of y will be approximately normal. - When the population distribution is normal, sampling distribution of y is exactly normal for any sample size n.

Sampling Distribution Central Limit Theorem (for y): - μ y = nμ - σ y = nσ Similar - When n is large theorems sampling exist for distribution the of y will be approximately sample normal. median, sample - When the population standard deviation, distribution and is the normal, sampling distribution of y sample is exactly proportion. normal for any sample size n.

We have drawn a sample from a population We need to make an inference about the population We use sample statistic to estimate a population parameter We need to know how accurate the estimate is. We need to know the sampling distribution We seldom know the sampling distribution We use normal approximation from CLT

Be aware of the unfortunate similarity between two phrases: Sampling Distribution (the theoretically derived probability distribution of a statistic) Sample Distribution (the histogram of individual values actually observed in a particular sample)

Sampling Distribution Normal Approximation to the Binomial Probability Distribution For large n and not too near 0 or 1, the distribution of a binomial random variable Y may be approximated by a normal distribution with μ = nπ and σ = nπ(1 π)

Sampling Distribution Normal Approximation to the Binomial Probability Distribution This approximation should be used only if nπ 5 and n(1 π) 5

Sampling Distribution Normal Approximation to the Binomial Probability Distribution This approximation should be used only if nπ 5 and n(1 π) 5 Actual binomial distribution is seriously skewed to right Actual binomial distribution is seriously skewed to left

Sampling Distribution Why normality is important: - Helps to draw inferences about population based on the sample - Most statistical procedures require that population distribution be normal or can adequately be approximated by a normal distribution

Sampling Distribution Tools for Evaluating Whether or Not a Population Distribution Is Normal - Graphical Procedure, & - Quantitative Assessment Of how well a normal distribution models the population distribution

Checking Normality Graphical Procedures Histogram Stem-and-leaf plot

Checking Normality Graphical Procedures Normal Probability Plot Compares the quantiles from the data observed from the population to the corresponding quantiles from the standard normal distribution. - Sort the data: y (1), y (2),, y (n) - y (i) = Q i 0.5 n - Plot Q i 0.5 n versus z i 0.5 n

Checking Normality Quantitative Assessment Correlation Coefficient of Q i 0.5 n versus z i 0.5 n

Checking Normality Quantitative Assessment - Kolmogorov-Smirnov - Shapiro Wilk - Shapiro Francia - Cramer-von Mises - Anderson-Darling