Simulation Wrap-up, Statistics COS 323

Similar documents
Random Variables Handout. Xavier Vilà

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Statistics for Business and Economics

Business Statistics 41000: Probability 3

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Sampling Distribution

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

The Bernoulli distribution

Derivative Securities

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Statistics, Their Distributions, and the Central Limit Theorem

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Introduction to Statistics I

Lecture 2. Probability Distributions Theophanis Tsandilas

Probability: Week 4. Kwonsang Lee. University of Pennsylvania February 13, 2015

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Homework Assignments

Chapter 4: Asymptotic Properties of MLE (Part 3)

Probability. An intro for calculus students P= Figure 1: A normal integral

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

TOPIC: PROBABILITY DISTRIBUTIONS

The Normal Distribution

Central Limit Theorem (cont d) 7/28/2006

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Modeling Portfolios that Contain Risky Assets Stochastic Models I: One Risky Asset

6. Continous Distributions

Ch4. Variance Reduction Techniques

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Introduction to Business Statistics QM 120 Chapter 6

LECTURE CHAPTER 3 DESCRETE RANDOM VARIABLE

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Commonly Used Distributions

Martingales, Part II, with Exercise Due 9/21

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Chapter 7: Estimation Sections

Normal Probability Distributions

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

CHAPTER 6 Random Variables

Elementary Statistics Lecture 5

Chapter 7: Point Estimation and Sampling Distributions

MAS3904/MAS8904 Stochastic Financial Modelling

Chapter 8: Sampling distributions of estimators Sections

CS 237: Probability in Computing

STA Module 3B Discrete Random Variables

χ 2 distributions and confidence intervals for population variance

5.3 Statistics and Their Distributions

12. Conditional heteroscedastic models (ARCH) MA6622, Ernesto Mordecki, CityU, HK, 2006.

Chapter 3 - Lecture 4 Moments and Moment Generating Funct

ELEMENTS OF MONTE CARLO SIMULATION

PROBABILITY DISTRIBUTIONS

Using Monte Carlo Integration and Control Variates to Estimate π

Bus 701: Advanced Statistics. Harald Schmidbauer

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 5. Sampling Distributions

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

1/2 2. Mean & variance. Mean & standard deviation

Strategies for Improving the Efficiency of Monte-Carlo Methods

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

ECE 295: Lecture 03 Estimation and Confidence Interval

I. Time Series and Stochastic Processes

9. Statistics I. Mean and variance Expected value Models of probability events

IEOR 165 Lecture 1 Probability Review

Chapter 7. Sampling Distributions and the Central Limit Theorem

Statistics for Business and Economics

Statistical Methods in Practice STAT/MATH 3379

Probability Distributions II

Binomial Approximation and Joint Distributions Chris Piech CS109, Stanford University

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Reliability and Risk Analysis. Survival and Reliability Function

Statistical analysis and bootstrapping

ECON Introductory Econometrics. Lecture 1: Introduction and Review of Statistics

Engineering Statistics ECIV 2305

MATH 3200 Exam 3 Dr. Syring

Chapter 5. Statistical inference for Parametric Models

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Discrete Random Variables

CPSC 540: Machine Learning

Statistics and Probability

Continuous Probability Distributions & Normal Distribution

Probability and Random Variables A FINANCIAL TIMES COMPANY

MTH6154 Financial Mathematics I Stochastic Interest Rates

Random Variables and Probability Distributions

Statistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015

E509A: Principle of Biostatistics. GY Zou

Data Analysis and Statistical Methods Statistics 651

Chapter 7. Random Variables: 7.1: Discrete and Continuous. Random Variables. 7.2: Means and Variances of. Random Variables

Transcription:

Simulation Wrap-up, Statistics COS 323

Today Simulation Re-cap Statistics Variance and confidence intervals for simulations Simulation wrap-up FYI: No class or office hours Thursday

Simulation wrap-up

Last time Time-driven, event-driven Simulation from differential equations Cellular automata, microsimulation, agentbased simulation see e.g. http://www.microsimulation.org/ima/what%20is %20microsimulation.htm Example applications: SIR disease model, population genetics

Simulation: Pros and Cons Pros: Building model can be easy (easier) than other approaches Outcomes can be easy to understand Cheap, safe Good for comparisons Cons: Hard to debug No guarantee of optimality Hard to establish validity Can t produce absolute numbers

Simulation: Important Considerations Are outcomes statistically significant? (Need many simulation runs to assess this) What should initial state be? How long should the simulation run? Is the model realistic? How sensitive is the model to parameters, initial conditions?

Statistics Overview

Random Variables A random variable is any probabilistic outcome e.g., a coin flip, height of someone randomly chosen from a population A R.V. takes on a value in a sample space space can be discrete, e.g., {H, T} or continuous, e.g. height in (0, infinity) R.V. denoted with capital letter (X), a realization with lowercase letter (x) e.g., X is a coin flip, x is the value (H or T) of that coin flip

Probability Mass Function Describes probability for a discrete R.V. e.g.,

Probability Density Function Describes probability for a continuous R.V. e.g.,

[Population] Mean of a Random Variable aka expected value, first moment for discrete RV: E[ X] = µ = x i p i i for continuous RV: E X [ ] = µ = x p(x) dx

[Population] Variance σ 2 = E [(X µ) 2 ] = E[ X 2 2Xµ + µ 2 ] = E[ X 2 ] µ 2 [ ] E X = E X 2 ( [ ]) 2 for discrete RV: i σ 2 = p i (x i µ) 2 for continuous RV: σ 2 = (x µ) 2 p(x) dx

Sample mean and sample variance Suppose we have N independent observations of X: x 1, x 2, x N Sample mean: 1 N N i=1 x i = x Sample variance: N 1 (x N 1 i x ) 2 = s 2 i=1 E[x ] = µ E[s 2 ] = σ 2

1/(N-1) and the sample variance The N differences x i x (x i x ) = 0 are not independent: If you know N-1 of these values, you can deduce the last one i.e., only N-1 degrees of freedom Could treat sample as population and compute population variance: 1 N N i=1 (x i x ) 2 BUT this underestimates true population variance (especially bad if sample is small)

Sample variance using 1/(N-1) is unbiased [ ] = E E s 2 1 N 1 = 1 N 1 E N i=1 N i=1 (x i x ) 2 x 2 i Nx 2 = 1 N 1 N σ 2 + µ 2 = σ 2 ( ) N σ 2 N + µ2

Computing sample variance Can compute as s 2 = 1 N 1 N i=1 (x i x ) 2 Prefer: s 2 = N 2 x i i=1 N(x ) 2 N 1

The Gaussian Distribution 1 p(x) = σ 2π e E[X] = µ Var[X] = σ 2 1 x µ 2 σ 2

Why so important? sum of independent observations of a random variable converges to Gaussian in nature, events having variations resulting from many small, independent effects tend to have Gaussian distributions demo: http://www.mongrav.org/math/falling-ballsprobability.htm e.g., measurement error if effects are multiplicative, logarithm is often normally distributed

Central Limit Theorem Suppose we sample x 1, x 2, x N from a distribution with mean μ and variance σ 2 Let then x x = 1 N N x i i=1 z = x µ σ / N N(0,1) i.e., distributed normally with mean μ, variance σ 2 /N

Important Properties of Normal Distribution 1. Family of normal distributions closed under linear transformations: if X ~ N(μ, σ 2 ) then (ax + b) ~ N(aμ+b, a 2 σ 2 ) 2. Linear combination of normals is also normal: if X 1 ~ N(μ 1, σ 12 ) and X 2 ~ N(μ 2, σ 22 ) then ax 1 +bx 2 ~ N(aμ 1 + bμ 2, a 2 σ 1 2 + b 2 σ 22 )

Important Properties of Normal Distribution 3. Of all distributions with mean and variance, normal has maximum entropy Information theory: Entropy like uninformativeness Principle of maximum entropy: choose to represent the world with as uninformative a distribution as possible, subject to testable information If we know x is in [a, b], then uniform distribution on [a, b] has least entropy If we know distribution has mean µ, variance σ 2, normal distribution N(µ, σ 2 ) has least entropy

Important Properties of Normal Distribution 4. If errors are normally distributed, a least-squares fit yields the maximum likelihood estimator Finding least-squares x st Ax b finds the value of x that maximizes the likelihood of data A under some model

Important Properties of Normal Distribution 5. Many derived random variables have analytically-known densities e.g., sample mean, sample variance 6. Sample mean and variance of n identical independent samples are independent; sample mean is a normally-distributed random variable X n ~ N(µ,σ 2 /n)

Distribution of Sample Variance s 2 = 1 N 1 (For Gaussian R.V. X) N i=1 (x i x ) 2 (n 1)s2 define U = σ 2 then U has a χ 2 distribution with (n -1) d.o.f. p(x) = 2 n / 2 Γ n 2 1 ( x) n 2 1 e x / 2, x 0 E[ U] = n 1, Var[ U] = 2(n 1)

The Chi-Squared Distribution

What if we don t know true variance? Sample mean is normally distributed R.V. X n ~ N(µ,σ 2 /n) Taking advantage of this presumes we know σ 2 x µ has a t distribution with (n-1) d.o.f. s n / n

[Student s] t-distribution

Forming a confidence interval e.g., given that I observed a sample mean of, I m 99% confident that the true mean lies between and. Know that x µ s n / n has t distribution Choose q 1, q 2 such that student t with (n-1) dof has 99% probability of lying between q 1, q 2

Confidence interval for the mean if P q 1 < x n µ s n / n < q 2 = 0.99 s then P x n q n 2 n < µ < x q n 1 s n n = 0.99

Interpreting Simulation Outcomes How long will customers have to wait, on average? e.g., for given # tellers, arrival rate, service time distribution, etc.

Simulate bank for N customers Let x i be the wait time of customer i Is mean(x) a good estimate for µ? How to compute a 95% confidence interval for µ? Problem: x i are not independent!

Replications Run simulation to get M observations Repeat simulation N times (different random numbers each time) Treat the sample mean of different runs as approximately uncorrelated s 2 = 1 n 1 i (X i X ) 2

Batch Means Run simulation for N (large) Divide x i into k consecutive batches of size b If b large enough, mean(batch1) approx. uncorrelated with mean(batch2), etc.

Other approaches Use estimation of autocorrelation between x i s to derive better estimate of variance that can be used for confidence interval Regenerative method: Take advantage of regeneration points or cycles in behavior e.g., points when bank is empty of customers

Simulation Wrap-up

Finally

Implications Who designed it all? How should we behave? What if we start running too many of our own simulations?

Software http://en.wikipedia.org/wiki/ List_of_computer_simulation_software