CALIFORNIA INSTITUTE OF TECHNOLOGY. Introduction to Probability and Statistics Winter Assignment 3

Similar documents
Assignment 4. 1 The Normal approximation to the Binomial

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

4 Random Variables and Distributions

Elementary Statistics Lecture 5

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

LESSON 9: BINOMIAL DISTRIBUTION

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Probability. An intro for calculus students P= Figure 1: A normal integral

Central Limit Theorem, Joint Distributions Spring 2018

The Binomial and Geometric Distributions. Chapter 8

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

15.063: Communicating with Data Summer Recitation 3 Probability II

Favorite Distributions

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Chapter 4 and 5 Note Guide: Probability Distributions

Business Statistics 41000: Probability 4


Simulation. Decision Models

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Business Statistics 41000: Probability 3

Chapter 5 Probability Distributions. Section 5-2 Random Variables. Random Variable Probability Distribution. Discrete and Continuous Random Variables

STA Module 3B Discrete Random Variables

Chapter 3 Discrete Random Variables and Probability Distributions

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

(# of die rolls that satisfy the criteria) (# of possible die rolls)

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

The normal distribution is a theoretical model derived mathematically and not empirically.

2011 Pearson Education, Inc

Data Analysis and Statistical Methods Statistics 651

4. Basic distributions with R

Lecture 9 - Sampling Distributions and the CLT

Statistics, Measures of Central Tendency I

x is a random variable which is a numerical description of the outcome of an experiment.

Statistics 6 th Edition

Chapter 4 Continuous Random Variables and Probability Distributions

The Binomial Probability Distribution

Commonly Used Distributions

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Statistics for Managers Using Microsoft Excel 7 th Edition

Section 0: Introduction and Review of Basic Concepts

Statistical Tables Compiled by Alan J. Terry

Discrete Probability Distributions

Part V - Chance Variability

5.2 Random Variables, Probability Histograms and Probability Distributions

Math 130 Jeff Stratton. The Binomial Model. Goal: To gain experience with the binomial model as well as the sampling distribution of the mean.

Statistical Methods in Practice STAT/MATH 3379

18.05 Problem Set 3, Spring 2014 Solutions

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 7 1. Random Variables

Chapter 5 Discrete Probability Distributions. Random Variables Discrete Probability Distributions Expected Value and Variance

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

5.1 Personal Probability

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

(Practice Version) Midterm Exam 1

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

Financial Econometrics Jeffrey R. Russell. Midterm 2014 Suggested Solutions. TA: B. B. Deng

Probability of tails given coin is green is 10%, Probability of tails given coin is purple is 60%.

Module 4: Probability

Normal Approximation to Binomial Distributions

MLLunsford 1. Activity: Mathematical Expectation

4.3 Normal distribution

MATH 264 Problem Homework I

4: Probability. What is probability? Random variables (RVs)

Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances

A useful modeling tricks.

Chapter 4 Continuous Random Variables and Probability Distributions

Law of Large Numbers, Central Limit Theorem

Chapter 5. Sampling Distributions

Data Analysis and Statistical Methods Statistics 651

Chapter 5: Probability

Cover Page Homework #8

AP Statistics Ch 8 The Binomial and Geometric Distributions

Random Variables and Probability Functions

Chapter 4 Probability Distributions

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

The Bernoulli distribution

Probability and distributions

Simple Random Sample

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Have you ever wondered whether it would be worth it to buy a lottery ticket every week, or pondered on questions such as If I were offered a choice

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

STOR Lecture 7. Random Variables - I

CSSS/SOC/STAT 321 Case-Based Statistics I. Random Variables & Probability Distributions I: Discrete Distributions

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Transcription:

CALIFORNIA INSTITUTE OF TECHNOLOGY Ma 3 KC Border Introduction to Probability and Statistics Winter 2015 Assignment 3 Due Monday, January 25 by 4:00 p.m. at 253 Sloan Instructions: When asked for a probability or an expectation, give both a formula and an explanation for why you used that formula, and also give a numerical value when available. When asked to plot something, use informative labels (even if handwritten), so the TA knows what you are plotting, attach a copy of the plot, and, if appropriate, the commands that produced it. Exercise 1 (30 pts) Is it possible to have three random variables X, Y, and Z, where X and Y are stochastically independent, Y and Z are stochastically independent, and X and Z are stochastically independent; but the set {X, Y, Z} of random variables is not stochastically independent? Explain why your answer is correct. Exercise 2 (Problem 3.3.15 in Pitman) (20 pts) Let X and Y be independent random variables. Show that Var(X Y ) = Var(X + Y ). Exercise 3 (The Standard Normal Distribution) The Standard Normal Density is given by f(z) = 1 2π e z2 /2. The cumulative distribution function for the standard normal is denoted Φ. That is, Φ(t) = 1 2π t e z2 /2 dz. 1

KC Border Assignment 3 2 There is no closed form expression for this in terms of elementary functions, but there are some decent approximations. Most statistics books have tables of selected values of this cdf, but nowadays it is built in to languages such as R and Mathematica. In Mathematica (since version 8), to find Φ(t), you evaluate CDF[NormalDistribution[0, 1], t]. The values of the density at z is given by PDF[NormalDistribution[0, 1], z]. Mathematica also lets you find the probability mass function and cdf for a Binomial(n, p) variable with PDF[BinomialDistribution[n, p], k] and CDF[BinomialDistribution[n, p], k]. In R to find Φ(t), you evaluate pnorm(t), or more completely, pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) if you don t trust the defaults. The values of the density at z is given by dnorm(z). R also lets you find the probability mass function and cdf for a Binomial(n, p) variable with dbinom(k, n, p) and pbinom(t, n, p). To get the corresponding functions for a N(µ, σ 2 ) normal random variable with expectation µ and variance σ 2, replace the 0 by µ and the 1 by σ (not σ 2!) in the commands above. Note: You may want to look at Section 4.3 in Larsen Marx [1] and/or Sections 2.2 2.3 in Pitman [2]. 1. (20 pts for answering yes) Do you have some appropriate software installed? (E.g., R, Matlab, NumPy, Mathematica, Octave, or something else useful for statistical calculations and plotting. Excel does not count.) 2. (10 pts) Use the program of your choice to make a table of values of Φ(t) for t = 3, 2, 1, 0, 1, 1.96, 2, 2.58, 3, 4, 5, 6. 3. (5 pts) If Z is a standard normal random variable, what are Prob (Z 1.96) and Prob ( Z 2.58)? Exercise 4 (Cf. problem 3.3.26 in Pitman) (25 pts) Use Jensen s Inequality (Lecture 6) to show that for a random variable X with finite mean µ, std. dev. X E X µ, with equality if and only if X is degenerate. Exercise 5 (25 pts) There are n balls numbered 1,..., n and n bins numbered 1,..., n. The balls are put into the bins at random, one per bin. What is the expected number of balls put in the matching bin? Explain your reasoning. (Hint: Let E i be the event that ball i is in bin i. Use indicator functions.) Exercise 6 (Exploring some data) (40 pts) In lecture I suggested that examining the empirical distribution function was a good way to look at data. Let s compare it to using histograms.

KC Border Assignment 3 3 At the beginning of the term you flipped coins. This generated a long string of 0s and 1s. A segment of this string can be interpreted as a binary number, and by dividing this by the appropriate power of two, it can be interpreted as a number between 0 and 1. Moreover, if the coin tosses are independent and Heads and Tails are equally likely, then these numbers should be i.i.d. with an approximately uniform distribution. We are going to subject this to an eyeball test, which is one of the first things you should always do with data. I have taken the liberty of chopping the coin toss data from this year and three previous years into 3,464 strings of length 32, and converting them into numbers between 0 and 1. You can download these results from http://www.math.caltech.edu/ %7E2015-16/2term/ma003/Data/Random32.txt. Or you can do it yourself from the raw data at http://www.math.caltech.edu/%7e2015-16/2term/ma003/data/flips.txt Using the program/language of your choice do the following. (I give hints for R and Mathematica below.) 1. What is the expected value of a Uniform[0,1] random variable? What is its standard deviation? 2. What is the average of the numbers in your samples? What is the sample standard deviation of each sample? (The sample standard deviation is gotten by squaring the deviation of each sample value from the sample mean, summing them, dividing by (sample size 1), and then taking the square root. 3. Plot a histogram of these numbers, using the default. Then plot a histogram using bins of length 0.02. 4. Now plot a cumulative histogram or the empirical cumulative distribution function. (In Mathematica, this is just an option of the Histogram command, and in R use the ecdf command.) 5. Which method makes it easier to check by eye if the data appear to be uniform? If you don t have a preference, there is a lot to be said for learning the R statistical programming language. It is used widely on campus, and it looks like it will be around for a while. It is also free and runs on the major operating systems. You can get it at http://www.r-project.org. But if you are familiar with something else, go ahead. Even Excel can probably handle this assignment, but future ones may be trickier. Hint: Badly documented sample R code: Warning: I am not an R programer, and I am sure there are probably better ways to do things. Most of what I know I got by Googling various questions. Also typing?command will bring up help on the command command.

KC Border Assignment 3 4 First, use setwd("your_data_pathname") to change your working directory to the folder where the data file is. (Or be prepared to use a full path name.) You can use getwd() and list.files() check that you are in the right place. Read the data from the file into an array. Check the length, it should be 3464 for the file Random32. (# is a comment character.) a = as.matrix(read.table("random32.txt")) length(a) Now try a default histogram: hist(a) # the as.matrix is important! Now try a histogram with bins of size 0.02. Also instead of actual counts, use relative frequencies (density): bins=seq(0.0,1.0,by=0.02) hist(a, breaks=bins, freq=false) # freq=false uses relative frequencies?! c=ecdf(a) plot(c) Now let s examine the empirical cdf. How do you save these plots? Well on my Mac, I just click on the graphic s window and hit Save, and it saves the graphic as a pdf. But here is a better way. Say you want to save the plot above to a png file named Hist.png. Here you go: png("hist.png") # open the file for writing plot(c) # plot to the file dev.off() # close the file. This is crucial. To save to a pdf file use pdf("hist.pdf") for the first line. stdout.org/rcookbook/graphs/output%20to%20a%20file/. Hint: Undocumented sample Mathematica code: SetDirectory["Your path goes here"] a = Flatten[ Import["Random32", "Table"] ]; g = Histogram[a] Export["File name 1.pdf",g] g = Histogram[a, {0, 1, 0.02}] Export["File name 2.pdf",g] g = Histogram[a, {0, 1, 0.02}, "CDF"] Export["File name 3.pdf",g] I found this at http://wiki. Exercise 7 (10 pts) How much time did you spend on the previous exercises? Exercise 8 (Optional Exercise) (50 pts) There are n balls numbered 1,..., n and n bins numbered 1,..., n. The balls are put into the bins at random, one per bin. For each k = 0,..., n, what is the probability that exactly k balls are put in the matching bin? Explain your reasoning. (Hint: Let E i be the event that ball i is in bin i. Use indicator functions.)

KC Border Assignment 3 5 References [1] R. J. Larsen and M. L. Marx. 2012. An introduction to mathematical statistics and its applications, fifth ed. Boston: Prentice Hall. [2] J. Pitman. 1993. Probability. New York, Berlin, and Heidelberg: Springer.