BIOS 4120: Introduction to Biostatistics Breheny. Lab #7. I. Binomial Distribution. RCode: dbinom(x, size, prob) binom.test(x, n, p = 0.

Similar documents
STAT Lab#5 Binomial Distribution & Midterm Review

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Counting Basics. Venn diagrams

Converting to the Standard Normal rv: Exponential PDF and CDF for x 0 Chapter 7: expected value of x

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

(# of die rolls that satisfy the criteria) (# of possible die rolls)

Basic Procedure for Histograms

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Review. What is the probability of throwing two 6s in a row with a fair die? a) b) c) d) 0.333

Section3-2: Measures of Center

Theoretical Foundations

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

The normal distribution is a theoretical model derived mathematically and not empirically.

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

4 Random Variables and Distributions

Introduction to Statistical Data Analysis II

Math 227 Elementary Statistics. Bluman 5 th edition

CHAPTER 5 Sampling Distributions

σ e, which will be large when prediction errors are Linear regression model

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

DATA SUMMARIZATION AND VISUALIZATION

CH 5 Normal Probability Distributions Properties of the Normal Distribution

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Normal Curves & Sampling Distributions

Test 6A AP Statistics Name:

The Binomial Probability Distribution

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Statistics and Probability

Learning Objectives for Ch. 7

Commonly Used Distributions

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions:

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics

Probability Distribution Unit Review

Examples of continuous probability distributions: The normal and standard normal

Chapter 8: The Binomial and Geometric Distributions

The Normal Probability Distribution

AP Stats. Review. Mrs. Daniel Alonzo & Tracy Mourning Sr. High

Nicole Dalzell. July 7, 2014

Discrete Probability Distribution

LECTURE 6 DISTRIBUTIONS

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Section Introduction to Normal Distributions

PROBABILITY and BAYES THEOREM

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Lecture 6: Chapter 6

The Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis

Name Period AP Statistics Unit 5 Review

Chapter 8. Binomial and Geometric Distributions

WebAssign Math 3680 Homework 5 Devore Fall 2013 (Homework)

Math 14 Lecture Notes Ch The Normal Approximation to the Binomial Distribution. P (X ) = nc X p X q n X =

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

Risk Reduction Potential

First Midterm Examination Econ 103, Statistics for Economists February 16th, 2016

Copyright 2005 Pearson Education, Inc. Slide 6-1

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

STATE BANK OF PAKISTAN

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Mean of a Discrete Random variable. Suppose that X is a discrete random variable whose distribution is : :

1. Variability in estimates and CLT

Continuous distributions. Lecture 6: Probability. Probabilities from continuous distributions. From histograms to continuous distributions

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

Some estimates of the height of the podium

Binomial Distribution. Normal Approximation to the Binomial

Multiple Choice: Identify the choice that best completes the statement or answers the question.

4.2 Bernoulli Trials and Binomial Distributions

4: Probability. What is probability? Random variables (RVs)

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

Lecture 1: Review and Exploratory Data Analysis (EDA)

x is a random variable which is a numerical description of the outcome of an experiment.

Linear Regression with One Regressor

Unit2: Probabilityanddistributions. 3. Normal distribution

Chapter 7. Inferences about Population Variances

STAT 430/510 Probability Lecture 5: Conditional Probability and Bayes Rule

Announcements. Data resources: Data and GIS Services. Project. Lab 3a due tomorrow at 6 PM Project Proposal. Nicole Dalzell.

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Chapter 11: Inference for Distributions Inference for Means of a Population 11.2 Comparing Two Means

M249 Diagnostic Quiz

Chapter 4. The Normal Distribution

1. (9; 3ea) The table lists the survey results of 100 non-senior students. Math major Art major Biology major

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes. Standardizing normal distributions The Standard Normal Curve

Business Statistics 41000: Probability 4

8.1 Binomial Distributions

STA 103: Final Exam. Print clearly on this exam. Only correct solutions that can be read will be given credit.

Chapter 11. Data Descriptions and Probability Distributions. Section 4 Bernoulli Trials and Binomial Distribution

1 Describing Distributions with numbers

CUR 412: Game Theory and its Applications, Lecture 11

MAKING SENSE OF DATA Essentials series

Stat 333 Lab Assignment #2

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Transcription:

BIOS 4120: Introduction to Biostatistics Breheny Lab #7 I. Binomial Distribution P(X = k) = ( n k )pk (1 p) n k RCode: dbinom(x, size, prob) binom.test(x, n, p = 0.5) P(X < K) = P(X = 0) + P(X = 1) + + P(X = k-1) P(X 1) = 1 P(X = 0) Assumptions: - The number of trials n must be fixed in advance - The probability that the event occurs, p, must be the same from trial to trial - The trials must be independent - Only two possible outcomes II. Practice Problems 1) An agent sells life insurance policies to five equally aged, healthy people. According to recent data, the probability of a person living in these conditions for 30 years or more is 2/3. Calculate the probability that after 30 years: Use P(X = k) = ( n k )pk (1 p) n k formula N = 5, p = 2/3 a. All five people are still living. K = 5. P(X = 5) = 0.135 b. at least three people are still living. 1 P(X = 0) P(X=1) P(X=2) = 0.795 c. Exactly two people are still living. P(X=2) = 0.16

2) A pharmaceutical lab states that a drug causes negative side effects in 3 of every 100 patients. To confirm this affirmation, another laboratory chooses 5 people at random who have consumed the drug. What is the probability of the following events? Use P(X = k) = ( n k )pk (1 p) n k formula N = 5, p = 0.03 a. None of the five patients experience side effects. P(X = 0) = 0.86 b. At least two had side effects. 1 P(X = 0) P(X=1) = 0.008 c. It is highly plausible that Hispanic people experience side effects more often than Caucasian patients. Suppose of the 5 people; three are Caucasian and two are Hispanic. Is this a problem for the previous two situations? Explain. Yes because the assumption of a constant probability for the binomial distribution will be violated, since Hispanic people will have a higher probability for a bad effect. 3) Let X = the number of 65- to 74-year-olds who suffer from diabetes in the sample of size 7. X is a Bin(7, 0.125) random variable. Use P(X = k) = ( n k )pk (1 p) n k formula N = 7, p = 0.125 a. If you wish to make a list of the seven persons chosen, how many ways can they be ordered? 7! = 5040 b. Without regard to order, in how many ways can you select four individuals from this group of 7? 7!/(4!*3!) = 35 c. What is the probability that two of the seven people have diabetes? P(X = 2) = 0.17 d. What is the probability that four of the seven people have diabetes? P(X = 4) = 0.006

4) Suppose you are interested in monitoring air pollution in LA over a one-week period. Let X be a random variable that represents the number of days out of seven on which the concentration of carbon monoxide surpasses a specified level. Do you believe X has a binomial distribution? Explain. No, because if one of the seven days has a concentration of CO above a certain level, then more than likely the probability of it still being above that level is higher for the next few days. Independence would be violated. III. Quiz Review β = r σ y σ x y = α + βx What is α? What is β? The correlation coefficient says that if you go up in x by one standard deviation, you can expect to go up in y by r standard deviations (standard units). Predicting y with x 1. z x = x x SD x 2. z y = rz x 3. y = y + z y SD y Plots and Descriptive Measures Be familiar with: histograms, boxplots, barcharts, standard deviations (+/- 1, +/- 2), mean, median, percentiles, skewness. Probability Intersections, unions, complements Addition rule: P(A U B) = P(A) + P(B) P(A B) Multiplication rule: P(A B) = P(A)P(B A) = P(B)P(A B) P(A C ) = 1-P(A) P(A) = P(A B) + P(A B C )

Bayes Theorem: P(A B) = P(A)P(B A) P(A)P(B A)+P(A C )P(B A C ) Diagnostic Tests Sensitivity: P(T D) Specificity: P(T - D - ) Prevalence: P(D) IV. Practice Problems 1. What does the Pearson correlation coefficient measure? -- the linear association between two continuous variables 2. It is hypothesized that there are fluctuations in norepinephrine (NE) levels which accompany fluctuations in affect with bipolar affective disorder (manic-depressive illness; low affect scores represents increased mania). Let s say the regression line looks like: NE = 39 0.017*Affect a. What is the relationship between norepinephrine levels and affect test score? -- Since the slope is negative we can say on the average, as affect score increases then NE levels decrease. Low affect scores result in higher NE levels. b. Interpret the slope coefficient. -- For a one unit increase in affect score, we can expect the NE level to decrease by 0.017 units. c. Find the correlation coefficient if the standard deviation for NE and Affect is 8.43 and 384.9, respectively. -0.017*384.9/8.43 = -0.78 3. Given a dataset: 3.21 3.38 4.19 4.37 4.71 4.76 4.79 5.06 5.23 5.36 5.50 5.56 5.64 5.76 a. Find the 25 th and 75 th percentiles. 25 th :14*.25 = 3.5 position 4 is 4.37 ; 75 th : 14*.75 = 10.5 position 11 is 5.5

b. find the mean and median. 4.82 ; 4.93 c. Is this data skewed or symmetric? Slightly skewed left, however it is not by much. It is roughly symmetric. 4. The prevalence of colon cancer is 40%. A colonoscopy can test for colon cancer, and it has a sensitivity of and a specificity of. The predictive value positive (PVP) of this test is about. Colon Cancer No Colon Cancer Positive Test 30 20 Negative Test 10 40 Sensitivity = 30/40 = 0.75; specificity = 40/60 = 0.67 PVP = (.75*.4)/(.75*.4 +.33*.6) = 0.6 5. Examine the following boxplots: Which boxplot has the higher median? Has the most outliers? Has the most variability? Are both data sets symmetric? What are the components of the boxplot? Explain. The boxplot on the left; the boxplot on the right; the boxplot on the right; the left boxplot is roughly symmetric and the boxplot on the right is right skewed; 25 th and 75 th percentiles are the boxes, and the thick bar is the median. The circles are outliers

6. The probability of event A occurring is 47%. The probability of event B occurring is 18%. The probability of both events occurring at the same time is 10%. a. Is event A independent of event B? no;.47*.18 = 0.08 which does not equal 0.1 b. Find P(B A) and P(A B)..1/.47 = 0.21;.1/.18 = 0.56 c. Find P(A U B)..47 +.18 -.1 = 0.55 The rest of the lab is open for questions.