MVE051/MSG Lecture 7

Similar documents
BIO5312 Biostatistics Lecture 5: Estimations

Chapter 7 - Lecture 1 General concepts and criteria

Chapter 5. Statistical inference for Parametric Models

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Chapter 5: Statistical Inference (in General)

MATH 3200 Exam 3 Dr. Syring

Applied Statistics I

Chapter 7: Point Estimation and Sampling Distributions

Exploring Data and Graphics

Basic Procedure for Histograms

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

TOPIC: PROBABILITY DISTRIBUTIONS

The Bernoulli distribution

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Probability & Statistics

Back to estimators...

Lecture 18 Section Mon, Feb 16, 2009

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

1 Introduction 1. 3 Confidence interval for proportion p 6

Statistics and Probability

Lecture 18 Section Mon, Sep 29, 2008

Copyright 2005 Pearson Education, Inc. Slide 6-1

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Review of the Topics for Midterm I

1 Describing Distributions with numbers

AP Statistics Chapter 6 - Random Variables

Statistical Intervals (One sample) (Chs )

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Some Characteristics of Data

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

MAS187/AEF258. University of Newcastle upon Tyne

Chapter 7. Inferences about Population Variances

Commonly Used Distributions

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

appstats5.notebook September 07, 2016 Chapter 5

Chapter 8. Introduction to Statistical Inference

Lecture 10: Point Estimation

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Review of key points about estimators

Review of key points about estimators

The normal distribution is a theoretical model derived mathematically and not empirically.

Chapter 5. Sampling Distributions

E509A: Principle of Biostatistics. GY Zou

Learning From Data: MLE. Maximum Likelihood Estimators

Simple Descriptive Statistics

Chapter 3 Discrete Random Variables and Probability Distributions

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011

4.2 Probability Distributions

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

Chapter 4: Asymptotic Properties of MLE (Part 3)

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Much of what appears here comes from ideas presented in the book:

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

ECON 214 Elements of Statistics for Economists 2016/2017

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 7: Estimation Sections

Simple Random Sample

Value (x) probability Example A-2: Construct a histogram for population Ψ.

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

CHAPTER 6 Random Variables

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 7: Estimation Sections

Chapter 8 Estimation

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Lecture 3: Probability Distributions (cont d)

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Descriptive Statistics (Devore Chapter One)

Chapter 7. Sampling Distributions and the Central Limit Theorem

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

2011 Pearson Education, Inc

χ 2 distributions and confidence intervals for population variance

EE641 Digital Image Processing II: Purdue University VISE - October 29,

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Normal Probability Distributions

DATA SUMMARIZATION AND VISUALIZATION

1. Variability in estimates and CLT

BIOL The Normal Distribution and the Central Limit Theorem

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 3 Discrete Random Variables and Probability Distributions

Lecture Data Science

Econ 300: Quantitative Methods in Economics. 11th Class 10/19/09

Section 0: Introduction and Review of Basic Concepts

6. Genetics examples: Hardy-Weinberg Equilibrium

Midterm Exam. b. What are the continuously compounded returns for the two stocks?

Statistics vs. statistics

Some Discrete Distribution Families

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Transcription:

MVE051/MSG810 2017 Lecture 7 Petter Mostad Chalmers November 20, 2017

The purpose of collecting and analyzing data Purpose: To build and select models for parts of the real world (which can be used for prediction). The first part of data analysis is always to summarize and visualize the data. This is called descriptive statistics. (Most people call it just statistics ). What separates mathematical statistics from descriptive statistics is that we use probaility theory to formulate, build, and select the models for parts of the real world.

Data collection and analysis is always subjective What one decides to study, how one decides to study it, and what data one decides to collect, is necessarily based on ones preconceptions. Your way to summarize and visualize data is always influenced by your preconceptions; indeed, different ways to summarize data can be used to promote different ideas. The choice between different statistical models (and in some settings the choice of different statistical methods) is necessarily subjective.

Summarizing data Graphical summaries: Illustrating the data (or part of the data) in an plot or figure. Numerical summaries: Computing from the data (or part of the data) one or more numbers that tells something important about the data. There are a large number of ways to summarize; you should at least know the ones we go through below.

Numerical summaries Let x 1, x 2,..., x n be observed real values. Mean: x = 1 n (x 1 + x 2 + + x n ). Median: If we sort the data so we write it y 1,..., y n in order of size, then the median is y (n+1)/2 if n is odd and the mean of y n/2 and y n+1 /2 if n is even. Sample variance: s 2 = 1 n 1 n (x i x) 2 The sample standard deviation is the square root of this. Min and max. i=1

Quantiles and percentiles Quantile: A number such that a certain proportion of the data values is smaller than the number. We may also talk about a perentile, where the proportion is specified with a percentage. Example: The 30th percentile is a number such that 30% of the data is smaller than the number. Example: The median is the same thing as the 50th percentile. Example: The first quartile is a number such that a quarter of the data is smaller than the number. Example: The inter-quartile range is the interval between the 25th and 75th percentile. We also talk about quantiles for probability densities. Example: The first quartile of a normal density Normal(µ, σ) is the number z 0 such that Pr(z < z 0 ) = 1/4 when z Normal(µ, σ).

Graphical summaries Scatterplots. Histograms. Note how certain parameters must be selected (and is usually selected automatically by the program making the histogram). Boxplots: Efficient way to illustrate and compare the spread in one or more groups of data. Plots a box with the inter-quartile range and the median, together with whiskers indicating the spread of the data (definitions may vary) and individual observations outside this spread. Exact definitions of parameter defaults in the functions above, and generally the choice of graphical functions, depends on the program you use. We may regard graphical summaries as a step on the way to selecting a probabilistic model for the data. A free and powerful tool for statistics: R (www.r-project.org).

Random variables as models for data The second step in a statistical data analysis is to find a probabilistic model for your data. We describe a population of objects, or maybe possible observations, where our data represents a subset of this population. Example: We have measured the concentration of lead in 10 fish from a lake. The population may be the lead concentrations of all the fish in the lake. (Which species? Only this lake?...). The model of the population could be for example a normal distribution, or a normal distribution of the logged values. We generally have to assume that our data is a random sample from the popultation, i.e., that Each data value is randomly chosen from the population (so each population member has the same chance of being observed, or, given a model, the model specifies the probability (density) of each possible observation). The observations are independent of each other. It is very important to specify the population so that the assumption that your data is a random sample is reasonable!

Finding a probabilistic model for your data In this course, finding a probabilistic (or stochastic) model for your data will have two steps: Step 1: Find the type of model, i.e., a family of probability distributions that fit the context: The Normal family, the Binomial family, the Poisson family, etc. Consider: Are the observed values real numbers or integers? (Could they be real numbers?) Is this a sequence of trials? Etc. In our course, we may use Hypothesis Testing for selecting between possible models, but alternative methods also exist. Step 2: Find the parametres of the model (For example, find values for µ and σ 2 if the model is Normal, or λ if the model is Poisson). In this course, we will use estimators which compute from the data an estimate for the model parameters. It is also possible to use probability theory to obtain probability distributions for the parameters; this is outside this course.

Estimates and estimators Assume we have a model with unknown parameter θ and data x 1,..., x n which we assume is a random sample from the model. We separate between An estimator for θ: A function or formula which from a random sample x 1,..., x n computes a number which may function as a value for the parmeter θ. An estimate for θ: The value of the estimator for specific values of x 1,..., x n. We often write ˆθ for the estimate, but also for the estimator for θ. (So if the parameter is called for example µ, we write ˆµ etc.). A function of a random sample is called a statistic. So an estimator is a statistic. A statistic is also a random variable, as it is a function of random variables. So we can talk about its distribution, expectation, variance, etc.

Constructing an estimator Generally, in this course we will use standard estimators for each context, but here is a discussion on obtaining estimators: There is no general mathematical specification for how to construct an estimator. Instead one may specify some properties one believes a good estimator should have, and try to find estimators fulfilling these criteria. A good property for an estimator: To be unbiased: This means that the expectation of the estimator is equal to the parameter it is estimating. A good property for an estimator: To have as small variance as possible. A common way to construct an estimator (the Maximum Likelihood (ML) method): Write the probability of the observed data as a function of the model parmeters. This is the likelihood function. Find the parameters mazimizing this function. The formula for computing this maximum from the data becomes the estimator.

Estimator for the expectation Assume data is a random sample from Normal(µ, σ), so that they are represented by independent random variables X 1, X 2,..., X n Normal(µ, σ) We want to find an estimator for µ. A natural estimator is ˆµ = 1 n n i=1 X i = X. (This is an ML estimator). The estimator is unbiased, i.e., E [ X ] = µ. (Simple proof). The proof works equally well for any distribution fo X i. So the estimator X is always an unbiased estimator for the expectaiton of a distribution. Example: Assume the observations x 1, x 2,..., x n are a random sample from a Poisson(λ) distribution, where the expectation is equal to the parameter λ. Then X is an unbiased estimator for λ.

Variance of an estimator The estimator X has variance σ 2 /n, where σ 2 is the variance of the distribution for X 1,..., X n. The proof is good to understand. NOTE: The proof uses that, if X and Y are independent random variables, we have Var [X + Y ] = Var [X ] + Var [Y ]. This is also good to understand. Example: The Bernoulli distribution (which is the Binomial distribution with only one trial): The distribution has a parameter p and X Bernoulli(p) has possible values 0 and 1. The expectation is p and the variance is p(1 p). ˆp = X is an unbiased estimator for p. The variance of this estimator is p(1 p)/n.

Estimator for variance If X 1, X 2,..., X n is a random sample from a distribution with expectation µ and variance σ 2, then ˆσ 2 = 1 n 1 is an unbiased estimator for σ 2. n (X i X ) 2 The proof may be useful to understand: E [ˆσ [ ] 2] 1 n = E (X i X ) 2 n 1 i=1 [ ] 1 n = E ((X i µ) (X µ)) 2 = = σ 2 n 1 i=1 This is the reason why we divide with n 1 to compute the sample variance: It makes the estimator unbiased. i=1

The distribution of an estimator To further find out how good an estimator is, we can study its distribution, not only its expectation and variance. Example: If X 1,..., X n Normal(µ, σ), then we know the estimator X has expectation µ and variance σ 2 /n: Also: One can show: If X1,..., X n are independent and normally distributed, then X 1 + X 2 + + X n is normally distributed. We know that if Y is normally distributed then Y /n is also normally distributed. From this we get that X Normal(µ, σ/ n). If X 1,..., X n has a Bernoulli-distribution with parameter p, then we get from the definitions that X 1 + X 2 + + X n has a Binomialdistribution with parmeters n och p. From this we can also get an explicit description of the distribution of X = (X 1 + X 2 + + X n )/n. One can show that if X 1,..., X n Normal(µ, σ) then, for the variance estimator ˆσ 2 we get (n 1)ˆσ 2 /σ 2 χ 2 (n 1) So: (n 1)ˆσ 2 has a distribution that corresponds to σ 2 multiplied with a chi-squared distribution with n 1 degrees of freedom.

We study estimators not estimates Assume we investigate a type of trials which each time result in success (1) or failure (0), and the probability of success is an unknown parameter p. Assume we make some trials and get the results 0, 1, 0, 0, 1, 0, 0, 1 We make the estimate 3/8 = 0.375 for p. How good is this estimate? We cannot say anything about that before we specify the estimator. ALTERNATIVE 1: The estimator consists of making 8 trials, letting x be the number of successes, and computing ˆp = x/8. ALTERNATIVE 2: The estimator consists of making trials until 3 successes have been observed, and letting x be the number of trials needed for this outcome. Then one computes ˆp = 3/x. The two estimators have different properties! One is unbiased and the other is biased.

Example, cont. Let us for example assume that the real value for p is 0.6. We can then study which distributions our two estimators have. ALTERNATIVE 1: We have X Binomial(8, 0.6). The possible values for ˆp = X /8 and their probabilities are found in the table below: 0/8 1/8 2/8 3/8 4/8 5/8 6/8 7/8 8/8 0.001 0.008 0.041 0.124 0.232 0.279 0.209 0.090 0.017 The estimator has expectation 0.6; it is unbiased. ALTERNATIVE 2: We get X Neg-Binomial(3, 0.6). The possible values for ˆp = 3/X and their probabilities are found in the table below: 3/3 3/4 3/5 3/6 3/7 3/8 3/9 3/10 3/11 0.216 0.259 0.207 0.138 0.083 0.046 0.025 0.013 0.006 3/12 3/13 3/14 3/15 3/16,3/17,... 0.003 0.001 0.001 0.000 totalt 0.000 The estimator has expectation 0.672. It is biased.