Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Similar documents
Central Limit Theorem, Joint Distributions Spring 2018

Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Random Variables Handout. Xavier Vilà

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Business Statistics 41000: Probability 3

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Business Statistics 41000: Probability 4

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Chapter 7. Sampling Distributions and the Central Limit Theorem

Statistics for Business and Economics

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

The Bernoulli distribution

Central Limit Theorem (cont d) 7/28/2006

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Commonly Used Distributions

(Practice Version) Midterm Exam 1

Chapter 4 and 5 Note Guide: Probability Distributions

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Binomial Random Variables. Binomial Random Variables

4 Random Variables and Distributions

Discrete Random Variables

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Chapter 5. Statistical inference for Parametric Models

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Chapter 5. Sampling Distributions

Chapter 7: Estimation Sections

1/2 2. Mean & variance. Mean & standard deviation

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

TRINITY COLLGE DUBLIN

IEOR 165 Lecture 1 Probability Review

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 7. Sampling Distributions and the Central Limit Theorem

Discrete Random Variables and Probability Distributions

Converting to the Standard Normal rv: Exponential PDF and CDF for x 0 Chapter 7: expected value of x

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Sampling Distribution

Simple Random Sample

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

TABLE OF CONTENTS - VOLUME 2

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

AMS7: WEEK 4. CLASS 3

BIO5312 Biostatistics Lecture 5: Estimations

Non-informative Priors Multiparameter Models

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Chapter 7: Point Estimation and Sampling Distributions

5. In fact, any function of a random variable is also a random variable

STA 220H1F LEC0201. Week 7: More Probability: Discrete Random Variables

Confidence Intervals Introduction

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

CS 237: Probability in Computing

MA : Introductory Probability

Statistics for Managers Using Microsoft Excel 7 th Edition

Engineering Statistics ECIV 2305

Chapter 8: The Binomial and Geometric Distributions

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Some Discrete Distribution Families

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

UNIT 4 MATHEMATICAL METHODS

Probability Distributions for Discrete RV

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Review of the Topics for Midterm I

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

STA 103: Final Exam. Print clearly on this exam. Only correct solutions that can be read will be given credit.

Lecture 2. Probability Distributions Theophanis Tsandilas

Statistical Methods for NLP LT 2202

Section 0: Introduction and Review of Basic Concepts

18.05 Problem Set 3, Spring 2014 Solutions

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

. (i) What is the probability that X is at most 8.75? =.875

The Normal Distribution

Probability Models.S2 Discrete Random Variables

Test 7A AP Statistics Name: Directions: Work on these sheets.

Simulation Wrap-up, Statistics COS 323

CS134: Networks Spring Random Variables and Independence. 1.2 Probability Distribution Function (PDF) Number of heads Probability 2 0.

AP Statistics Test 5

A useful modeling tricks.

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Probability and Random Variables A FINANCIAL TIMES COMPANY

Statistical Methods in Practice STAT/MATH 3379

MAS187/AEF258. University of Newcastle upon Tyne

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Welcome to Stat 410!

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Transcription:

Review for Final Exam 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom

THANK YOU!!!! JON!! PETER!! RUTHI!! ERIKA!! ALL OF YOU!!!!

Probability Counting Sets Inclusion-exclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event Discrete, continuous Probability function Conditional probability Independent events Law of total probability Bayes theorem June 10, 2014 3 / 42

Probability Random variables Discrete: general, uniform, Bernoulli, binomial, geometric Continuous: general, uniform, normal, exponential pmf, pdf, cdf Expectation = mean = average value Variance; standard deviation Joint distributions Joint pmf and pdf Independent random variables Covariance and correlation Central limit theorem June 10, 2014 4 / 42

Statistics Maximum likelihood Least squares Bayesian inference Discrete sets of hypotheses Continuous ranges of hypotheses Beta distributions Conjugate priors Choosing priors Probability intervals Frequentist inference NHST: rejection regions, significance NHST: p-values z, t, χ 2 NHST: type I and type II error NHST: power Confidence intervals Bootstrapping June 10, 2014 5 / 42

Problem 17. Directly from the definitions of expected value and variance, compute E (X ) and Var(X ) when X has probability mass function given by the following table: X -2-1 0 1 2 p(x) 1/15 2/15 3/15 4/15 5/15 June 10, 2014 6 / 42

Problem 18. Suppose that X takes values between 0 and 1 and has probability density function 2x. Compute Var(X ) and Var(X 2 ). June 10, 2014 7 / 42

Problem 20. For a certain random variable X it is known that E (X ) = 2 and Var(X ) = 3. What is E (X 2 )? June 10, 2014 8 / 42

Problem 21. Determine the expectation and variance of a Bernoulli(p) random variable. June 10, 2014 9 / 42

Problem 22. Suppose 100 people all toss a hat into a box and then proceed to randomly pick out a hat. What is the expected number of people to get their own hat back. Hint: express the number of people who get their own hat as a sum of random variables whose expected value is easy to compute. June 10, 2014 10 / 42

pmf, pdf, cdf Probability Mass Functions, Probability Density Functions and Cumulative Distribution Functions June 10, 2014 11 / 42

Problem 27. Suppose you roll a fair 6-sided die 25 times (independently), and you get $3 every time you roll a 6. Let X be the total number of dollars you win. (a) What is the pmf of X. (b) Find E (X ) and Var(X ). (c) Let Y be the total won on another 25 independent rolls. Compute and compare E (X + Y ), E (2X ), Var(X + Y ), Var(2X ). Explain briefly why this makes sense. June 10, 2014 12 / 42

Problem 28. A continuous random variable X has PDF f (x) = x + ax 2 on [0,1] Find a, the CDF and P(.5 < X < 1). June 10, 2014 13 / 42

Problem 32. For each of the following say whether it can be the graph of a cdf. If it can be, say whether the variable is discrete or continuous. (i) F (x) (ii) F (x) 1 1 0.5 0.5 x x (iii) 1 0.5 F (x) x (iv) 1 0.5 F (x) x June 10, 2014 14 / 42

Continued (v) 1 0.5 F (x) x (vi) 1 0.5 F (x) x (vii) 1 0.5 F (x) x (viii) 1 0.5 F (x) x June 10, 2014 15 / 42

Distributions with names June 10, 2014 16 / 42

Problem 35. Suppose that buses arrive are scheduled to arrive at a bus stop at noon but are always X minutes late, where X is an exponential random variable with probability density function f X (x) = λe λx. Suppose that you arrive at the bus stop precisely at noon. (a) Compute the probability that you have to wait for more than five minutes for the bus to arrive. (b) Suppose that you have already waiting for 10 minutes. Compute the probability that you have to wait an additional five minutes or more. June 10, 2014 17 / 42

Problem 39. More Transforming Normal Distributions (a) Suppose Z is a standard normal random variable and let Y = az + b, where a > 0 and b are constants. Show Y N(b, a 2 ). Y µ (b) Suppose Y N(µ, σ 2 ). Show follows a standard normal σ distribution. June 10, 2014 18 / 42

Problem 40. (Sums of normal random variables) Let X be independent random variables where X N(2, 5) and Y N(5, 9) (we use the notation N(µ, σ 2 )). Let W = 3X 2Y + 1. (a) Compute E (W ) and Var(W ). (b) It is known that the sum of independent normal distributions is normal. Compute P(W 6). June 10, 2014 19 / 42

Problem 41. Let X U(a, b). Compute E (X ) and Var(X ). June 10, 2014 20 / 42

Problem 42. In n + m independent Bernoulli(p) trials, let S n be the number of successes in the first n trials and T m the number of successes in the last m trials. (a) What is the distribution of S n? Why? (b) What is the distribution of T m? Why? (c) What is the distribution of S n + T m? Why? (d) Are S n and T m independent? Why? June 10, 2014 21 / 42

Problem 43. Compute the median for the exponential distribution with parameter λ. June 10, 2014 22 / 42

Joint distributions Joint pmf, pdf, cdf. Marginal pmf, pdf, cdf Covariance and correlation. June 10, 2014 23 / 42

Problem 46. To investigate the relationship between hair color and eye color, the hair color and eye color of 5383 persons was recorded. Eye color is coded by the values 1 (Light) and 2 (Dark), and hair color by 1 (Fair/red), 2 (Medium), and 3 (Dark/black). The data are given in the following table: Eye \ Hair 1 2 3 1 1168 825 305 2 573 1312 1200 The table is turned into a joint pdf for X (hair color) and Y (eye color). (a) Determine the joint and marginal pmf of X and Y. (b) Are X and Y independent? June 10, 2014 24 / 42

Problem 47. Let X and Y be two continuous random variables with joint pdf 12 f (x, y) = xy(1 + y) for 0 x 1 and 0 y 1, 5 and f (x) = 0 otherwise. (a) Find the probability P( 1 X 1, 1 Y 2 ). 4 2 3 3 (b) Determine the joint cdf of X and Y for a and b between 0 and 1. (c) Use your answer from (b) to find marginal cdf F X (a) for a between 0 and 1. (d) Find the marginal pdf f X (x) directly from f (x, y) and check that it is the derivative of F X (x). (e) Are X and Y independent? June 10, 2014 25 / 42

Problem 50. (Arithmetic Puzzle) The joint pmf of X and Y is partly given in the following table. X \ Y 0 1 2 1......... 1/2 1... 1/2... 1/2 1/6 2/3 1/6 1 (a) Complete the table. (b) Are X and Y independent? June 10, 2014 26 / 42

Problem 51. (Simple Joint Probability) Let X and Y have joint pmf given by the table: X \ Y 1 2 3 4 1 16/136 3/136 2/136 13/136 2 5/136 10/136 11/136 8/136 3 9/136 6/136 7/136 12/136 4 4/136 15/136 14/136 1/136 Compute: (a) P(X = Y ). (b) P(X + Y = 5). (c) P(1 < X 3, 1 < Y 3). (d) P((X, Y ) {1, 4} {1, 4}). June 10, 2014 27 / 42

Problem 52. Toss a fair coin 3 times. Let X = the number of heads on the first toss, Y the total number of heads on the last two tosses, and Z the number of heads on the first two tosses. (a) Give the joint probability table for X and Y. Compute Cov(X, Y ). (b) Give the joint probability table for X and Z. Compute Cov(X, Z ). June 10, 2014 28 / 42

Problem 54. Continuous Joint Distributions Suppose X and Y are continuous random variables with joint density function f (x, y) = x + y on the unit square [0, 1] [0, 1]. (a) Let F (x, y) be the joint CDF. Compute F (1, 1). Compute F (x, y). (b) Compute the marginal densities for X and Y. (c) Are X and Y independent? (d) Compute E (X ), (Y ), E (X 2 + Y 2 ), Cov(X, Y ). June 10, 2014 29 / 42

Law of Large Numbers, Central Limit Theorem June 10, 2014 30 / 42

Problem 55. Suppose X 1,..., X 100 are i.i.d. with mean 1/5 and variance 1/9. Use the m central limit theorem to estimate P( X i < 30). June 10, 2014 31 / 42

Problem 57. (Central Limit Theorem) Let X 1, X 2,..., X 144 be i.i.d., each with expected value µ = E (X i ) = 2, and variance σ 2 = Var(X i ) = 4. Approximate P(X 1 + X 2 + X 144 > 264), using the central limit theorem. June 10, 2014 32 / 42

Problem 59. (More Central Limit Theorem) The average IQ in a population is 100 with standard deviation 15 (by definition, IQ is normalized so this is the case). What is the probability that a randomly selected group of 100 people has an average IQ above 115? June 10, 2014 33 / 42

Post unit 2: 1. Confidence intervals 2. Bootstrap confidence intervals 3. Linear regression June 10, 2014 34 / 42

Confidence intervals 1 Suppose that against a certain opponent the number of points the MIT basketaball team scores is normally distributed with unknown mean θ and unknown variance, σ 2. Suppose that over the course of the last 10 games between the two teams MIT scored the following points: 59, 62, 59, 74, 70, 61, 62, 66, 62, 75 (a) Compute a 95% t confidence interval for θ. Does 95% confidence mean that the probability θ is in the interval you just found is 95%? June 10, 2014 35 / 42

Confidence intervals 1 Suppose that against a certain opponent the number of points the MIT basketaball team scores is normally distributed with unknown mean θ and unknown variance, σ 2. Suppose that over the course of the last 10 games between the two teams MIT scored the following points: 59, 62, 59, 74, 70, 61, 62, 66, 62, 75 (a) Compute a 95% t confidence interval for θ. Does 95% confidence mean that the probability θ is in the interval you just found is 95%? answer: Data mean and variance x = 65, s 2 = 35.778. The number of degrees of freedom is 9. We look up t 9,.025 = 2.262 in the t-table The 95% confidence interval is [ ] t [ ] 9,.025 s t 9,.025 s x, x + = 65 2.262 3.5778, 65 + 2.262 3.5778 n n June 10, 2014 35 / 42

Confidence interval 2 The volume in a set of wine bottles is known to follow a N(µ, 25) distribution. You take a sample of the bottles and measure their volumes. How many bottles do you have to sample to have a 95% confidence interval for µ with width 1? June 10, 2014 36 / 42

Confidence interval 2 The volume in a set of wine bottles is known to follow a N(µ, 25) distribution. You take a sample of the bottles and measure their volumes. How many bottles do you have to sample to have a 95% confidence interval for µ with width 1? answer: Suppose we have taken data x 1,..., x n with mean x. Remember in these probabilities µ is a given (fixed) hypothesis. ( ) ( x µ.5 P( x µ.5 µ) =.95 P < µ =.95 P σ/ n σ/ n.5 n Using the table, we have precisely that = 1.96. So, 5 n = (19.6) 2 = 384.. If we use our rule of thumb that the.95 interval is 2σ we have n/10 = 2 n = 400. June 10, 2014 36 / 42

Polling confidence intervals You do a poll to see what fraction p of the population supports candidate A over candidate B. 1. How many people do you need to poll to know p to within 1% with 95% confidence? June 10, 2014 37 / 42

Polling confidence intervals You do a poll to see what fraction p of the population supports candidate A over candidate B. 1. How many people do you need to poll to know p to within 1% with 95% confidence? answer: The rule-of-thumb is that a 95% confidence interval is x ± 1/ n. To be within 1% we need 1 =.01 n = 10000. n Using z.025 = 1.96 instead the 95% confidence interval is z.025 x ±. 2 n To be within 1% we need z.025 =.01 n = 9604. 2 n Note, we are using the standard Bernoulli approximation σ 1/2. June 10, 2014 37 / 42

Polling confidence intervals 2 2. If you poll 400 people, how many have to prefer candidate A to make the 90% confidence interval entirely in the range where A is preferred. June 10, 2014 38 / 42

Polling confidence intervals 2 2. If you poll 400 people, how many have to prefer candidate A to make the 90% confidence interval entirely in the range where A is preferred. answer: The 90% confidence interval is z.05 1.64 x ± = x ± 2 n 40 We want x 1.64 >.5, that is x >.541. 40 number preferring A So >.541. So, 400 number preferring A > 216.4 June 10, 2014 38 / 42

Confidence intervals 3 Suppose you made 40 confidence intervals with confidence level 95%. About how many of them would you expect to be wrong? That is, how many would not actually contain the parameter being estimated? Should you be surprised if 10 of them are wrong? June 10, 2014 39 / 42

Confidence intervals 3 Suppose you made 40 confidence intervals with confidence level 95%. About how many of them would you expect to be wrong? That is, how many would not actually contain the parameter being estimated? Should you be surprised if 10 of them are wrong? answer: A 95% confidence means about 5% = 1/20 will be wrong. You d expect about 2 to be wrong. With a probability p =.05 of being wrong, the number wrong follows a Binomial(40, p) distribution. This has expected value 2, and standard deviation 40(.05)(.95) = 1.38. 10 wrong is (10-2)/1.38 = 5.8 standard deviations from the mean. This would be surprising. June 10, 2014 39 / 42

χ 2 confidence interval A statistician chooses 27 randomly selected dates, and when examining the occupancy records of a particular motel for those dates, finds a standard deviation of 5.86 rooms rented. If the number of rooms rented is normally distributed, find the 95% confidence interval for the standar deviation of the number of rooms rented. June 10, 2014 40 / 42

Solution answer: We have n = 27 and s 2 = 5.86. If we fix a hypothesis for σ 2 we know (n 1)s 2 χ 2 σ 2 n 1 We used R to find the critical values. (Or use the χ 2 table.) c025 = qchisq(.975,26) = 41.923 c975 = qchisq(.025,26) = 13.844 The 95% confidence interval for σ 2 is [ ] [ ] (n 1) s 2 (n 1) s 2 26 5.86 26 5.86, =, = [3.6343, 11.0056 c.025 c.975 41.923 13.844 We can take square roots to find the 95% confidence interval for σ [1.9064, 3.3175] June 10, 2014 41 / 42

Linear regression (least squares) 1. Set up fitting the least squares line through the points (1, 1), (2, 1), and (3, 3). 2. You have trivariate date (x 1, x 2, y): (1, 2, 3), (2, 3, 5), (3, 0, 1). Set up a least squares fit of the multiple regression model y = ax 1 + bx 2 + c. 3. Redo problem (2) for general data (x i,1, x i,2, y i ). June 10, 2014 42 / 42

MIT OpenCourseWare http://ocw.mit.edu 18.05 Introduction to Probability and Statistics Spring 2014 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.