Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Similar documents
2 of PU_2015_375 Which of the following measures is more flexible when compared to other measures?

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -26 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Random Variables and Probability Distributions

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Commonly Used Distributions

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

What was in the last lecture?

Statistical Intervals (One sample) (Chs )

Lecture 2. Probability Distributions Theophanis Tsandilas

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

DATA SUMMARIZATION AND VISUALIZATION

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Normal Probability Distributions

Business Statistics 41000: Probability 3

Chapter Seven: Confidence Intervals and Sample Size

1. You are given the following information about a stationary AR(2) model:

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

Homework Problems Stat 479

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Gamma Distribution Fitting

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Objective Bayesian Analysis for Heteroscedastic Regression

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Probability and Statistics

MATH 3200 Exam 3 Dr. Syring

Statistical Tables Compiled by Alan J. Terry

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Random variables. Contents

Experimental Design and Statistics - AGA47A

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

χ 2 distributions and confidence intervals for population variance

Loss Simulation Model Testing and Enhancement

Specific Objectives. Be able to: Apply graphical frequency analysis for data that fit the Log- Pearson Type 3 Distribution

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Probability distributions relevant to radiowave propagation modelling

Describing Uncertain Variables

The Normal Distribution

The topics in this section are related and necessary topics for both course objectives.

Unit 5: Sampling Distributions of Statistics

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Introduction to Statistical Data Analysis II

Statistics for Business and Economics

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Unit 5: Sampling Distributions of Statistics

MVE051/MSG Lecture 7

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Basic Procedure for Histograms

σ e, which will be large when prediction errors are Linear regression model

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Chapter 6 Simple Correlation and

Homework Problems Stat 479

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

ANALYSIS OF THE DISTRIBUTION OF INCOME IN RECENT YEARS IN THE CZECH REPUBLIC BY REGION

BIO5312 Biostatistics Lecture 5: Estimations

Window Width Selection for L 2 Adjusted Quantile Regression

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

M249 Diagnostic Quiz

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Lecture 6: Non Normal Distributions

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Data Analysis. BCF106 Fundamentals of Cost Analysis

Data Analysis and Statistical Methods Statistics 651

Sampling Distribution

Bivariate Birnbaum-Saunders Distribution

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Continuous random variables

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

2011 Pearson Education, Inc

Statistics and Probability

DATA HANDLING Five-Number Summary

SAMPLE STANDARD DEVIATION(s) CHART UNDER THE ASSUMPTION OF MODERATENESS AND ITS PERFORMANCE ANALYSIS

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

Chapter 2. Random variables. 2.3 Expectation

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

STRESS-STRENGTH RELIABILITY ESTIMATION

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

MAS187/AEF258. University of Newcastle upon Tyne

Continuous Distributions

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

Introduction to Population Modeling

Some Characteristics of Data

Exam STAM Practice Exam #1

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

St. Xavier s College Autonomous Mumbai STATISTICS. F.Y.B.Sc. Syllabus For 1 st Semester Courses in Statistics (June 2015 onwards)

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

Transcription:

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions ELE 525: Random Processes in Information Systems Hisashi Kobayashi Department of Electrical Engineering Princeton University September 27, 2013 Textbook: Hisashi Kobayashi, Brian L. Mark and William Turin, Probability, Random Processes and Statistical Analysis (Cambridge University Press, 2012) 9/27/2013 Copyright Hisashi Kobayashi 2013 1

6 Fundamentals of Statistical Data Analysis 6.1 Sample mean and sample variance The sample mean (or the empirical average) is defined as Each sample x i is an instance or realization of the associated RV X i. The sample mean of (6.1) is an instance of the sample mean variable defined by The expectation is The variance is 9/27/2013 Copyright Hisashi Kobayashi 2013 2

The sample variance is defined by which can be viewed as an instance of the sample variance variable which is also often called the sample variance. We can show Equations (6.4) and (6.12) show that the sample mean variable of (6.2) and the sample variance variable (6.10) are unbiased estimates of the (population) mean μ X and the (population) variance σ X2, respectively. The square root of the sample variance (6.9), i.e., s X, is called the sample standard deviation. 9/27/2013 Copyright Hisashi Kobayashi 2013 3

6.2 Relative frequency and histograms Consider observed data of sample size n, and they take on k distinct discrete values. Let n j = number of times that the jth value is observed, j=1, 2,, k. Then is called the relative frequency of the jth value. When the underlying RV X is continuous, we group or classify the data. Divide the range of observations into k class intervals, at points c 0, c 1, c 2,, c k. is called a histogram, and is an estimate of the PDF of the population. 9/27/2013 Copyright Hisashi Kobayashi 2013 4

Cumulative relative frequency Let {x k: : 1 k n} be n observations in the order observed, and {x (i): : 1 i n} be the same observations in order of magnitude. H(x) be the frequency of observations that are smaller than or equal to x : which can be more concisely written as When grouped data are presented as a cumulative relative frequency distribution, it is called the cumulative histogram. The cumulative histogram is far less sensitive to variation in class lengths than the histogram. 9/27/2013 Copyright Hisashi Kobayashi 2013 5

6.3 Graphical presentations 6.3.1 Histogram on probability paper 6.3.1.1 Testing the normal distribution hypothesis For a given distribution function F(x), let The inverse is the value of x that corresponds to the cumulative probability P. x P is called the P-fractile (or P-percentile or P-quantile). Consider the standard normal distribution The fractile u P of the distribution N(0,1) is 9/27/2013 Copyright Hisashi Kobayashi 2013 6

For a given cumulative relative frequency H(x), we wish to test whether holds for some μ and σ. Testing the above is equivalent to testing the relation The plot of u H(x) versus x forms a step (or staircase) curve: The plot in the (x, u)-coordinates of the staircase function is called the fractile diagram, and provides an estimate of the straight line 9/27/2013 Copyright Hisashi Kobayashi 2013 7

The probability paper On the ordinate axis, the values P=Φ(u) are marked, rather than the u values. (n=50) 9/27/2013 Copyright Hisashi Kobayashi 2013 8

The dot diagram: Instead of the step curve, we plot n points (x (i), (i-½)/n), which are situated at the middle points. (n=50) 9/27/2013 Copyright Hisashi Kobayashi 2013 9

6.3.1.2 Testing the log-normal distribution hypothesis The log-normal paper: Modify the probability paper by changing the horizontal axis from the linear scale to the logarithmic scale, i.e., log 10 x. n=50, x i = exp y i where y i is drawn from N(2,4). 9/27/2013 Copyright Hisashi Kobayashi 2013 10

6.3.2 Log-survivor function curve The survivor function or the survival function: The log-survivor function or the log survival function: The sample log-survivor function or empirical log-survivor function: where H(x) is the cumulative relative frequency (for ungrouped data) or the cumulative histogram (for grouped data). For the ungrouped case: Plot against x (i) 9/27/2013 Copyright Hisashi Kobayashi 2013 11

In order to avoid difficulty at i=n, we may modify (6.32) into Example: A mixed exponential (or hyperexponential) distribution: Numerical example: π 2 =0.0526, π 1 =1-π 2, α 2 = 0.1 and α 1 = 2.0 Note: To be consistent with the assumption in (6.29), we should exchangd the subscripts 1 and 2. 9/27/2013 Copyright Hisashi Kobayashi 2013 12

Correction to the figure caption: Exchange the subscripts 1 and 2 of π and α to be consistent with (6.29) and (6.30) 9/27/2013 Copyright Hisashi Kobayashi 2013 13

6.3.3 Hazard function and mean residual life curves The hazard function or the failure rate: which is called the completion rate function, when X represents a service time variable. The survivor function and the hazard function are related by and 9/27/2013 Copyright Hisashi Kobayashi 2013 14

Given that the service time variable X is greater than t, is the residual life conditioned on X > t. The mean residual life function 9/27/2013 Copyright Hisashi Kobayashi 2013 15

9/27/2013 Copyright Hisashi Kobayashi 2013 16

6.3.4 Dot diagram and correlation coefficient Dot or scatter diagram: 9/27/2013 Copyright Hisashi Kobayashi 2013 17

Correlation coefficient where X and Y are said to be properly linearly correlated if depending on whether ab is positive or negative. 9/27/2013 Copyright Hisashi Kobayashi 2013 18

Conversely, if ρ= ± 1, then (Problem 6.17) The sample variance based on observations {(x i, y i ): 1 i n} The sample correlation coefficient 9/27/2013 Copyright Hisashi Kobayashi 2013 19

Let U i, 1 i n be n i.i.d Define RVs with the standard normal distribution N(0,1). The PDF of this RV (Problem 7.2) 9/27/2013 Copyright Hisashi Kobayashi 2013 20

9/27/2013 Copyright Hisashi Kobayashi 2013 21

n=1 n=2 n=3 Mode: n-2 9/27/2013 Copyright Hisashi Kobayashi 2013 22

The relations to other distributions: Let which is a special case λ =1, β =n/2 in the gamma distribution (4.30) The case where n is an even integer: which is the k-stage Erlang distribution with mean k. 9/27/2013 Copyright Hisashi Kobayashi 2013 23

The relation to the Poisson distribution Example 7.1: Independent observations from N( μ, σ 2 ) Case 1: An estimate of σ 2, when the population mean μ is known where Thus, we can write 9/27/2013 Copyright Hisashi Kobayashi 2013 24

An estimate of σ 2 when μ is unknown. We can show (Problem 7.1) Karl Pearson (1857-1936) was a British statistician who applied statistics to biological problems of heredity and evolution 9/27/2013 Copyright Hisashi Kobayashi 2013 25

The sample mean of n independent observations {X 1, X 2,, X n } from N(μ, σ 2 ) is normally distributed according to N(μ, σ 2 /n). Thus, is a standard normal variable. We wish to estimate the population mean μ. If σ is known, we can use the table of the standard normal distribution to test whether U is significantly different from 0. If σ is unknown, we use Using (7.26) 9/27/2013 Copyright Hisashi Kobayashi 2013 26

The distribution of the variable t k degrees of freedom (d.f.). Its PDF is given by (Problem 7.6) is called the (Student s) t-distribution with k k=1, which is called the Cauchy s distribution. k=2, which has zero mean but infinite variance. 9/27/2013 Copyright Hisashi Kobayashi 2013 27

9/27/2013 Copyright Hisashi Kobayashi 2013 28

William S. Gosset (1876-1937) was a statistician of the Guinness brewing company. 9/27/2013 Copyright Hisashi Kobayashi 2013 29

7.3 Fisher s F-distribution RVs V 1 and V 2 are independent and are χ 2 distributed with n 1 and n 2 degrees of freedom (d.f.), respectively. Then the variable F defined by has the following PDF: which is called the F-distribution with (n 1, n 2 ), also called the Snedecor distribution. which exists for -n 1 < 2r < n 2. 9/27/2013 Copyright Hisashi Kobayashi 2013 30

9/27/2013 Copyright Hisashi Kobayashi 2013 31

7.4 Log-normal distribution A positive RV X is said to have the log-normal distribution if Is normally distributed, i.e., In order to find the expectation and variance, we use the moment generating function (MGF) (to be studied in Section 8.1) 9/27/2013 Copyright Hisashi Kobayashi 2013 32

Then From (7.49) and (7.51) we find 9/27/2013 Copyright Hisashi Kobayashi 2013 33