Introduction to Alternative Statistical Methods. Or Stuff They Didn t Teach You in STAT 101

Similar documents
Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Sampling and sampling distribution

Chapter 9: Sampling Distributions

The normal distribution is a theoretical model derived mathematically and not empirically.

Chapter 9. Sampling Distributions. A sampling distribution is created by, as the name suggests, sampling.

Chapter 8 Statistical Intervals for a Single Sample

Learning Objectives for Ch. 7

Statistics 13 Elementary Statistics

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

ECON 214 Elements of Statistics for Economists

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Statistics Class 15 3/21/2012

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Introduction to Statistics I

Data Analysis and Statistical Methods Statistics 651

Confidence Intervals. σ unknown, small samples The t-statistic /22

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Business Statistics 41000: Probability 4

Chapter 7 Study Guide: The Central Limit Theorem

ECON 214 Elements of Statistics for Economists 2016/2017

The Binomial Probability Distribution

Confidence Intervals for the Mean. When σ is known

One sample z-test and t-test

STA Module 3B Discrete Random Variables

Value (x) probability Example A-2: Construct a histogram for population Ψ.

Chapter 5. Sampling Distributions

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Lecture 2. Probability Distributions Theophanis Tsandilas

Chapter 6 Probability

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

Chapter 7. Sampling Distributions and the Central Limit Theorem

The Two-Sample Independent Sample t Test

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

STAT Chapter 6: Sampling Distributions

Chapter 7. Sampling Distributions and the Central Limit Theorem

. (i) What is the probability that X is at most 8.75? =.875

MATH 3200 Exam 3 Dr. Syring

Math 227 Elementary Statistics. Bluman 5 th edition

Sampling Distributions and the Central Limit Theorem

Data Analysis and Statistical Methods Statistics 651

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

A Derivation of the Normal Distribution. Robert S. Wilson PhD.

1 Inferential Statistic

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

Statistics 6 th Edition

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

MATH 10 INTRODUCTORY STATISTICS

Statistical Intervals (One sample) (Chs )

Statistics and Probability

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Basics. STAT:5400 Computing in Statistics Simulation studies in statistics Lecture 9 September 21, 2016

8.1 Estimation of the Mean and Proportion

MAS187/AEF258. University of Newcastle upon Tyne

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Math 140 Introductory Statistics. First midterm September

Binomial and Normal Distributions

Module 4: Probability

AMS7: WEEK 4. CLASS 3

Chapter 11: Inference for Distributions Inference for Means of a Population 11.2 Comparing Two Means

Lecture 16: Estimating Parameters (Confidence Interval Estimates of the Mean)

Probability Weighted Moments. Andrew Smith

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

The Central Limit Theorem

Lecture 6: Chapter 6

Chapter 3 Discrete Random Variables and Probability Distributions

value BE.104 Spring Biostatistics: Distribution and the Mean J. L. Sherley

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

MATH 10 INTRODUCTORY STATISTICS

The Assumption(s) of Normality

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Probability Models.S2 Discrete Random Variables

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

MATH 10 INTRODUCTORY STATISTICS

For more information about how to cite these materials visit

AP Stats Review. Mrs. Daniel Alonzo & Tracy Mourning Sr. High

Chapter 7: Estimation Sections

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

STAT 201 Chapter 6. Distribution

MA131 Lecture 9.1. = µ = 25 and σ X P ( 90 < X < 100 ) = = /// σ X

Properties of Probability Models: Part Two. What they forgot to tell you about the Gammas

SAMPLING DISTRIBUTIONS. Chapter 7

Review: Population, sample, and sampling distributions

Chapter Seven: Confidence Intervals and Sample Size

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Non-Inferiority Tests for the Odds Ratio of Two Proportions

MATH 10 INTRODUCTORY STATISTICS

Chapter 5: Probability models

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

Confidence Intervals and Sample Size

Introduction to Statistical Data Analysis II

Data Analysis and Statistical Methods Statistics 651

Transcription:

Introduction to Alternative Statistical Methods Or Stuff They Didn t Teach You in STAT 101

Classical Statistics For the most part, classical statistics assumes normality, i.e., if all experimental units of interest were measured and those measurements were plotted, then the distribution would look bell-shaped.

Normal Distribution

Not to be Confused with

Normal Distribution The standard normal is the iconic normal distribution but there are as many normal distributions as there are combinations of finite means and finite, positive variances.

Normal Distributions

Central Limit Theorem All this is well and good, you say, but there is no reason, in general, to assume normality. How do we proceed? The famous French mathematician Laplace (1749 1827) discovered in the early 19 th century that the sample mean, X, is normally distributed for sufficiently large sample size, provided the sample is random and the underlying population variance is finite.

Central Limit Theorem How large is sufficiently large? For light-tailed distributions, sample size n 30 or so is usually sufficient. What do I mean by light-tailed? The rate at which the tail or tails of the distribution approach zero must be at least as fast as 1/e x in math-speak. (The tails of the normal distribution approach zero at the rate of (1/e x^2 ), which is faster still.)

Limitations of the CLT So where can things come off of the rails? It seems like the Central Limit Theorem takes care of just about everything.

What Manner of Distribution is this?

The Cauchy Distribution This is the Cauchy distribution. It sort of looks normal, and some people might mistake it for a normal distribution (present company excepted, of course!) but observe the tails.

Cauchy vs. Normal

Cauchy vs. Normal The Cauchy distribution has heavy tails. What is meant by heavy tails here? The tails of the Cauchy distribution approach zero at the rate of 1/x 2 How does that compare with the normal distribution? Example: For x = 10, 1/x 2 = 1/100 = 0.01. By way of contrast, (1/e x^2 ) = 3.72 x 10^-44 (i.e., a decimal with 43 leading zeroes). Quite a difference!

A Pathological Distribution In fact, the tails of the Cauchy are so heavy, that the mean and the variance do not exist. One observation is a better estimator of the center of the Cauchy distribution than the average of a random sample! The Cauchy Distribution represents an extreme situation and is good for testing where our methods break down. Will you encounter the Cauchy distribution in practice? Probably not, but it is a distinct possibility that you will encounter the next problematic example.

Mixed Normal A mixed normal distribution occurs when a population of interest actually contains two subpopulations that are normally distributed but with different means and variances within each subpopulation. For whatever reason, these subpopulations cannot be easily isolated, and the resulting distribution is not normal, even though it consists of two normal subpopulations.

Mixed Normal A specific example (from Rand Wilcox s Applying Contemporary Statistical Techniques): Assume a population consists of both dieters and non-dieters in a ratio of 1:9, i.e., 10% have dieted and 90% have not. Let X represent the amount of weight loss observed for an individual during the previous year. Further assume that X is distributed N(0, 100) for dieters and X is distributed N(0, 1) for non-dieters.

Mixed Normal The resulting distribution can be represented as (0.9)N(0, 1) + (0.1) N(0, 100), which has mean = (0.9)(0) + (0.1)(0) = 0 and variance = (0.9)(1) + (0.1)(100) = 0.9 + 10 = 10.9. Thus, even though non-dieters represent 90% of the population and their variance is 1, we observe much greater variability in the resulting mixed model. This presents a problem for inferences about the population mean, for example. How do we proceed in such problem cases?

Trimming and Winsorizing Data A trimmed mean is obtained by deleting a certain percentage of the smallest and largest values and then calculating the mean based on the remaining values. For example, a 10% trimmed mean is obtained by deleting 10% of the highest values and 10% of the lowest values, leaving you with 80% of the original values upon which a mean is then calculated.

A concrete example: Trimmed Mean {3.54, 6.61, 2.88, 2.20, 8.04, 5.31, 6.51, 6.37, 3.86, 7.82, 0.967, 1.12, 7.00, 4.87, 8.39, 4.15, 3.11, 7.48, 16.62, 2.77} The mean of this sample is 5.48. 10% trimming removes 0.967, 1.12, 8.39, and 16.62. The 10% trimmed mean is based on the 20-2-2 = 16 remaining values, which in this example, is 5.16.

Trimmed Mean Isn t throwing out data a bad idea? Not necessarily. In the 1960s and 1970s Lehmann and Bickel (two famous statisticians) showed that the 10% trimmed mean is nearly as good as X for approximately normal data and a much safer bet than X for heavy-tailed data. (A. DasGupta, Asymptotic Theory of Statistics and Probability, p. 271)

Trimmed Mean Incidentally, you have encountered trimmed means before, even if you have not recognized them as such. The median of a distribution is a 50% trimmed mean, i.e., you remove 50% of the lowest data values and 50% of the highest data values and are left with one number as an estimate of the location parameter or center of the distribution.

Trimmed Mean Moreover, removing transparently erroneous data or data collected on an unsuitable experimental unit is not trimming. Trimming occurs when you remove data values that are legitimate (or cannot be identified as illegitimate) but small or large with respect to the sample as a whole.

Winsorizing Data Winsorizing is distinct from trimming in that a certain proportion of lowest and highest values are not discarded but are instead replaced by the lowest and highest values in the data set apart from those values, so that the sample size remains the same.

Winsorizing Data Returning to my previous example: {3.54, 6.61, 2.88, 2.20, 8.04, 5.31, 6.51, 6.37, 3.86, 7.82, 0.967, 1.12, 7.00, 4.87, 8.39, 4.15, 3.11, 7.48, 16.62, 2.77} If this data is winsorized at 10% then the smallest and larger values after the 2 lowest and 2 highest (2 = 10% of 20) values are identified are 2.20 and 8.04, respectively. The winsorized data set becomes:

Winsorizing Data {3.54, 6.61, 2.88, 2.20, 8.04, 5.31, 6.51, 6.37, 3.86, 7.82, 2.20, 2.20, 7.00, 4.87, 8.04, 4.15, 3.11, 7.48, 8.04, 2.77} The 10% winsorized mean is the average of these 20 values with the repeats, which is 5.15 as compared to the sample mean of 5.48. The winsorized standard deviation is just the standard deviation of the winsorized data. In this example, it is 2.22 (as compared to 3.5 for the original sample).

(1-α)% CI for Trimmed Mean You cannot use the standard method of constructing confidence intervals about means for trimmed means! It would be unsound for a number of reasons, including that the remaining values in the trimmed data set are no longer independent or identically distributed.

(1-α)% CI for Trimmed Mean How can the leftover data points be dependent when they were independent just a few minutes ago before trimming? I did not trim at random. I ordered the data first, then trimmed the lowest and highest 10%. For observations to be independent, one cannot tell you anything about another. When you order the data and observe that the penultimate largest value is 8, say, then you know the largest value cannot be 7 or, indeed, any value less than 8. Hence, the trimmed data set is not independent (or identically distributed).

(1-α)% CI for Trimmed Mean The correct standard error for a 10% trimmed mean is s w 0.8 n Where s w is the standard deviation of the 10% winsorized sample.

(1-α)% CI for Trimmed Mean The correct standard error for the 10% trimmed mean of my example data is 0.62 and a 95% confidence interval for the 10% trimmed mean is: 5.16±2.13(0.62) = (3.84, 6.47) where 2.13 is the appropriate critical value from the T distribution with 16 (= #of values left after deleting two smallest and two largest) - 1 = 15 degrees of freedom (i.e., t 0.975 (15) = 2.13)

CI s for Binomial Proportions Everyone is taught the Wald-Wolfowitz confidence interval for binomial proportions, i.e., Which is based on a normal approximation. They are not taught how poorly it performs in general, however, even for large n.

CI s for Binomial Proportions What do I mean by performs poorly? The coverage probability of the Wald-Wolfowitz CI is often less than you intend when you choose your α. (For example, you might intend to have a 95% CI but end up with a 89% CI.) What is the coverage probability? The coverage probability is the percentage associated with the CI you construct. A 95% confidence interval has (or should have) a coverage probability of 0.95

CI s for Binomial Proportions But what does this mean? For the frequentist statistician, it means that were she to replicate her experiment 20 times, she would expect 19 out of the 20 confidence intervals she constructs to contain the true value of the parameter she is estimating. For the frequentist, the parameter is a fixed constant of nature and once the CI is constructed it either contains the true value or it does not.

CI s for Binomial Proportions By way of contrast, for the Bayesian statistician, the parameter is not fixed; there is a probability distribution associated with it and even after the CI is constructed he can speak of there being a probability that the parameter is contained in that interval. The distinction between frequentist and Bayesian statistics is not particularly important here and would, in any event, require a talk of its own.

CI s for Binomial Proportions What should you use instead of the Wald- Wolfowitz confidence interval? There are a number of alternatives, but I recommend the Wilson score interval.

Wilson Score Interval The most general form of the Wilson score interval is as follows:

Wilson Score Interval A 95% Wilson Interval is (approximately): 2 pˆ(1 pˆ) 1 pˆ 2 2 n n n 4 1 n

Wilson Score Interval As an example, suppose we observe 7 successes in 100 trials. An (approximate) 95% Wilson interval would be (0.034, 0.14) I have been writing approximate in parentheses because I used 2 in place of the correct critical value z 0.975 = 1.96 to make the formula on the previous slide look cleaner. The difference is slight. If you want to be extra fastidious, you can use the critical value from the T distribution, t 0.975 (n-1), which in this ex. would be 1.984.

In Summary Bell-shaped distributions need not have nice properties. Do not simply assume the underlying distribution of your data is normal and apply standard statistical methods. To do so is sort of like the statistical equivalent of running with scissors. Instead, investigate the data until you are satisfied of its approximate normality.

In Summary Even when the data appears to be approximately normal, you might want to consider trimming and/or winsorizing it. The efficiency of the 10% trimmed mean is such that it is competitive with the sample mean even under normality and is a better bet for heavy-tailed data. (This situation is not unlike the Wilcoxon-Mann-Whitney test vs. the twosample T test.)

In Summary The Wald-Wolfowitz confidence interval for binomial proportions, like fast food, probably should be avoided.