value BE.104 Spring Biostatistics: Distribution and the Mean J. L. Sherley

Similar documents
ECON 214 Elements of Statistics for Economists 2016/2017

The normal distribution is a theoretical model derived mathematically and not empirically.

Statistical Methods in Practice STAT/MATH 3379

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Chapter ! Bell Shaped

AP Statistics Chapter 6 - Random Variables

Statistics 6 th Edition

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

MATH 264 Problem Homework I

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Theoretical Foundations

Continuous Probability Distributions & Normal Distribution

Lecture 9. Probability Distributions. Outline. Outline

Math 227 Elementary Statistics. Bluman 5 th edition

Lecture 9. Probability Distributions

Chapter 7: Random Variables

Describing Uncertain Variables

ECON 214 Elements of Statistics for Economists

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

2011 Pearson Education, Inc

Fundamentals of Statistics

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Introduction to Business Statistics QM 120 Chapter 6

Data Analysis and Statistical Methods Statistics 651

Prob and Stats, Nov 7

Introduction to Statistical Data Analysis II

Statistics and Probability

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

4: Probability. What is probability? Random variables (RVs)

Business Statistics 41000: Probability 4

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

MAS187/AEF258. University of Newcastle upon Tyne

11.5: Normal Distributions

Some Characteristics of Data

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Lecture 3: Probability Distributions (cont d)

Statistics, Measures of Central Tendency I

E509A: Principle of Biostatistics. GY Zou

TOPIC: PROBABILITY DISTRIBUTIONS

The Normal Probability Distribution

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Part V - Chance Variability

Statistics for Business and Economics

Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 6 Normal Probability Distribution QMIS 120. Dr.

Normal Probability Distributions

Elementary Statistics Lecture 5

The topics in this section are related and necessary topics for both course objectives.

STAT 201 Chapter 6. Distribution

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Statistics 511 Supplemental Materials

The Mathematics of Normality

Estimation and Confidence Intervals

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

The Binomial Distribution

Chapter 5 Basic Probability

CS 237: Probability in Computing

If the distribution of a random variable x is approximately normal, then

Normal Model (Part 1)

The Binomial Distribution

Lecture 2. Probability Distributions Theophanis Tsandilas

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

Probability. An intro for calculus students P= Figure 1: A normal integral

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.

The Normal Distribution

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3: Probability Distributions and Statistics

5.1 Mean, Median, & Mode

Statistics 431 Spring 2007 P. Shaman. Preliminaries

2 DESCRIPTIVE STATISTICS

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

Unit 5: Sampling Distributions of Statistics

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Chapter 4 Probability and Probability Distributions. Sections

Unit 5: Sampling Distributions of Statistics

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Descriptive Analysis

Sampling Distributions and the Central Limit Theorem

Data Analysis and Statistical Methods Statistics 651

Basic Procedure for Histograms

Confidence Intervals. σ unknown, small samples The t-statistic /22

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

Chapter Seven: Confidence Intervals and Sample Size

Chapter 9. Idea of Probability. Randomness and Probability. Basic Practice of Statistics - 3rd Edition. Chapter 9 1. Introducing Probability

UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES

6.2 Normal Distribution. Normal Distributions

Data Analysis and Statistical Methods Statistics 651

Random Variables and Probability Distributions

Probability and distributions

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Continuous Distributions

Confidence Intervals Introduction

Expected Value of a Random Variable

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Measure of Variation

Transcription:

BE.104 Spring Biostatistics: Distribution and the Mean J. L. Sherley Outline: 1) Review of Variation & Error 2) Binomial Distributions 3) The Normal Distribution 4) Defining the Mean of a population Goals: 1) Understand the concepts that underlie common statistical analyses for EHS studies 2) Evaluate them! Appropriate use? Quality? Meaning? 3) Requires familiarity with operations & mathematics; but not formal training to be a statitician Why do we need statistics in EHS? 1) To organize data for analysis 2) To quantify error 3) To quantify and compare variation 4) To detect differences between populations e.g. affected vs non-affected exposed vs non-exposed 5) To detect relationships 6) To make predictions about events in the future; specifically to estimate risk The Basic Tool for Organization of Population Data: Frequency Plot or Distribution Population People Towns measurements sample Value of measurement a b c d e f g # of occurrences i ii iii iv v vi vii # value

1) All measurements and observations are samplings (e.g., heart rates, bead numbers, hand measurements) There is an ideal universe of all our measurements that is infinite. We are tying to estimate the characteristic features of this ideal universe of measures. Why? To develop the Best representation of reality. The Best representation of risk, prediction, difference. If we measured body weight, how would the plot look? We expect variability in this measure its expected variation remember this type of variance is always present (biological, quantitative, statistical) Now, suppose I gave each of you a ruler (12 inches) and asked you to measure this table. How would the distribution of measurements look? <Graphs> What if I gave you 100 1-foot rulers and asked you to measure them with a 2-ft. ruler? <graph> We called the spread in data variance (mathematical definition later.) Three sources of variance in population distributions: Variance} I) Errors of measurement (quantitative, investigator, seasonal) II) Variation (statistical, sampling, biological, physical) III) "Things are really different. More than one distribution present."

I) Errors of Measurement 1) Technical: calibration, systematic 2) Investigator: ability, judgment, bias: more later for sure! 3) Process variation: temp-dependent, diurnal rhythms, seasonal 4) Population heterogeneity: samples not homogenous (w/ persons) mixing is never complete II) Variation 1) Known A) Controllable- diet, fasting, activity level B) Uncontrollable- gender, height, age (biological/physical) But can "control" for by matching strategies and adjustments (e.g., per capita, age-specific) 2) Unknown "Random;" Statistical Variation due to sampling from the universe of possibilities III) Real Differences: What we seek to evaluate Back to Distributions How do we describe them? How do we compare them? Analysis of Distributions 1) We could use mathematics to develop an exact function to describe. HARD to do! Especially for each that is studied! f(v) = # OR 2) We can model to idealized distributions for which mathematical treatments are tractable (though still not trivial!) - statistical methods Three related distributions for statistical analyses: Binomial- Based on frequency of occurrences when there are only two possible outcomes (e.g., bead drawings: each bead is either one color or the other; coin flipping: each toss is heads or tails). Well-developed mathematics, but two approximations are used for population data. <Graph>

Poisson- For infrequent events Normal- For frequent events As sample size increases both Binomial & Poisson distributions converge to Normal. Normal Distribution (ND)- Ideal Arises when the data (x values) are the sum of many independent small random factors e.g. measurements, BP, heart rate Also known as Gaussian Distribution, Bell Curve <Graph> Properties: 1) Symmetric 2) Most frequent value = mode = µ, mean, arithmetic average µ = Σ x i n i N 3) mode = mean = median, value of x that accounts for 50% of the total values Two parameters define the ND entirely 1) µ, the mean indicates where the distribution is centered 2) σ, the standard deviation = (µ-x i ) 2 N Measure of the spread in the data around µ. E.g. For measurements, its an indicator of precision.

Now, how can we compare the spread of a distribution or the quality of precision at different scale and for measurements of different type. For example: Measure of bacteria: Measure of temperature: Measure of rat tails: scale = microns scale = degrees scale = inches CV, coefficient of variation = (σ/µ ) 100% µ 68% - 95% - 99% Rule- As a probability density function ± 1σ ± 2σ ± 3σ Statistical methods based on the ND employ the two parameters σ and µ Therefore they are called PARAMETRIC STATISTICS Important Rule Before you apply parametric statistics, confirm that the data are normally distributed. If not 1) Transform to ND and then use parametrics (e.g., Log normal transformation). 2) Use non-parametrics (which are often conversions to pseudo normal distributions, e.g. Mann Whitney) What happens if you use parametric statistics when data are not normally distributed? You may miss differences that more appropriate statistical methods might have sufficient power to detect. What happens if you use non-parametric statistics when data are normally distributed? You may miss differences that would have been detected by parametric methods; because when data are normally distributed, parametric statistics are the most powerful methods. What type of formal statistical errors are these?

Move from ideal discussion to practical discussion We want to know about µ in most cases Consider: An ideal population With N infinity (e.g., stars) or N large, but finite (e.g., people) and some measured property that is normally distributed about a population µ with standard deviation σ When we measure a sample of n individuals, we can construct a new sample distribution with mean x-bar x-bar, the sample mean is an estimate of what we seek, µ. First Concern- How close is x-bar to µ? Consider n = 1: x-bar could be near, could be far x-bar is most likely to be near because of the normal probability density function, but it is not likely to = µ Consider how x-bar approaches µ, as n approaches N Central Limit Theory- "convergence to the mean": as the size of a sample increases, its mean, x-bar, approaches the mean, µ, of the sampled population ************************************************************************ Now consider: If we knew σ, the population standard deviation, (often we can, e.g., infant birth weights; heart rate) what can we say about how close x-bar, the sample mean, is likely to be near µ, the population mean? Consider a single x again: Given that the population is normally distributed, [How could we tell? Evaluate the sample distributions form.] we can say that our x is within ± 2 σ of µ with 95% confidence. <Graph> I.e., the 95% Confidence Interval (CI) for µ about x = x ± 1.96 σ What does this mean? It means that, if we drew an x many times, 95% of the time µ, the mean of the sampled population mean, would lie within this interval. Therefore, we have set a limit on what µ might be.

What happens to this interval as we increase the sample size, the size of n? Because of the central limit theory, our confidence that x-bar = µ increases. Mathematically, the interval shrinks by the n And 95% CI for µ about x-bar = x-bar ± 1.96 σ n Note that as n approaches N, 1.96 σ approaches 0, and, therefore, x-bar approaches µ n For analysis of measurements, a Small 95% CI implies precision Large 95% CI implies uncertainty How would you determine 99% CI? = x-bar ± 2.94 σ n What does the 95% CI mean? 1) You are 95% confident that over many samplings, µ will lie in this interval 2) If you performed 100 samplings of the same n, 95 of the x-bar's are predicted to lie in this interval and 5 will not. Why? Because the x-bar is an estimate of the population's mean, µ, which has a 95% probability of being within this interval. ************************************************************************ Often, σ, population standard deviation is not known. What then? Then we can estimate σ from s, the sample standard deviation s = (x-bar xi) 2 n-1 n-1, Not n When n approaches N, this is not as important, as in the calculation of σ. It matters when the sample size is small.

n-1 = degrees of freedom We use n-1 instead of n, because the x-bar is used in the calculation and x-bar is known. Estimate of σ CI for µ = x-bar ± ts n Note: as n approaches N, t approaches 1.960 for 95% CI (see t-table) and of course s approaches σ t = t-statistic, a parameter that specifies the desired CI for a given n So, 95%CI for µ = x-bar ± 1.96 σ n Look familiar?