Lecture 3: Review of Probability, MATLAB, Histograms

Similar documents
The normal distribution is a theoretical model derived mathematically and not empirically.

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Chapter 8 Estimation

Random Variables and Probability Distributions

MAS187/AEF258. University of Newcastle upon Tyne

Business Statistics 41000: Probability 3

4: Probability. What is probability? Random variables (RVs)

Chapter 6 Analyzing Accumulated Change: Integrals in Action

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Central Limit Theorem (CLT) RLS

Lean Six Sigma: Training/Certification Books and Resources

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Lecture 3: Probability Distributions (cont d)

Business Statistics 41000: Probability 4

Statistical Tables Compiled by Alan J. Terry

23.1 Probability Distributions

Commonly Used Distributions

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

2011 Pearson Education, Inc

The topics in this section are related and necessary topics for both course objectives.

Lecture 9. Probability Distributions. Outline. Outline

Chapter 7. Random Variables

2. The sum of all the probabilities in the sample space must add up to 1

Chapter 7 1. Random Variables

Lecture 9. Probability Distributions

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

Statistics for Business and Economics: Random Variables:Continuous

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

Data Analysis and Statistical Methods Statistics 651

Statistical Methods in Practice STAT/MATH 3379

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

IEOR 165 Lecture 1 Probability Review

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

Part V - Chance Variability

Favorite Distributions

Continuous Probability Distributions & Normal Distribution

ECON 214 Elements of Statistics for Economists 2016/2017

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Continuous Probability Distributions

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

CS 237: Probability in Computing

MATH 264 Problem Homework I

CH 5 Normal Probability Distributions Properties of the Normal Distribution

DATA SUMMARIZATION AND VISUALIZATION

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

MAKING SENSE OF DATA Essentials series

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Class 11. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

What was in the last lecture?

Continuous random variables

Some Characteristics of Data

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

MA : Introductory Probability

Probability. An intro for calculus students P= Figure 1: A normal integral

Unit 2: Statistics Probability

Statistics for Business and Economics

Chapter 4 Continuous Random Variables and Probability Distributions

Central Limit Theorem, Joint Distributions Spring 2018

Random Variables and Probability Functions

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

MATH 3200 Exam 3 Dr. Syring

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

Chapter 4 and 5 Note Guide: Probability Distributions

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Prepared By. Handaru Jati, Ph.D. Universitas Negeri Yogyakarta.

Statistics TI-83 Usage Handout

The Binomial Distribution

Statistics 251: Statistical Methods Sampling Distributions Module

Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Random Variables Handout. Xavier Vilà

ECON 214 Elements of Statistics for Economists

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Basic Procedure for Histograms

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Consumer Guide Dealership Word of Mouth Internet

Populations and Samples Bios 662

5.7 Probability Distributions and Variance

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Computer labs. May 10, A list of matlab tutorials can be found under

4.3 Normal distribution

Lecture Data Science

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Chapter 4 Continuous Random Variables and Probability Distributions

Introduction to Business Statistics QM 120 Chapter 6

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Discrete Random Variables and Probability Distributions

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Math 227 Elementary Statistics. Bluman 5 th edition

Spike Statistics. File: spike statistics3.tex JV Stone Psychology Department, Sheffield University, England.

Transcription:

CS 4980/6980: Introduction to Data Science c Spring 2018 Lecture 3: Review of Probability, MATLAB, Histograms Instructor: Daniel L. Pimentel-Alarcón Scribed and Ken Varghese This is preliminary work and has not been reviewed by instructor. If you have comments about typos, errors, notation inconsistencies, etc., please email Chad Conley and Ken Varghese at cconley8@student.gsu.edu, kvarghese2@student.gsu.edu. 3.1 Introduction This lecture covers a broad review of probability including Bernoulli, binomial, exponential, and Gaussian distribution. It also covers the use of some MATLAB commands helpful for completing Mini-project 1 as well as the use of histograms. 3.2 Basics of Probability Probability The Science of analyzing events that may or may not happen. There are multiple models that are used to help provide theoretical outcomes that mirror real world events. Each has their strengths and weaknesses. We cover four models in class. In order to understand those models, we need to understand some terms. 3.3 Terms Random Variables are numeric variables that fall randomly. They are represented by a curly x. Functions determine the potential outcomes of random variables, and these can be represented as f(x). The phrase i.i.d. stands for independently, identically distributed, and it states that all inputs to a function, such as coin flips, are independent of each other (they dont have any impact on previous or future trials), they all have the same probability P, and they can fall anywhere defined by the function. 3-1

Lecture 3: Review of Probability, MATLAB, Histograms 3-2 3.4 Bernoulli Distribution A Bernoulli distribution describes a simple 50/50 chance in probability, defined by f(k; p) = { p if k = 1 1 p if k = 0 (3.1) in which each event has an equal chance of happening. There are a limited number of outcomes; the function is discrete. 3.5 Binomial Distribution A Binomial distribution represents a function with two parameters n and p, where n is the number of individual Bernoilli- distributed experiments, and p is the probability of each one. In other words, a Binomial distribution will represent the outcome of a sequence of Bernoulli trials. The popular usage of this distribution is to check for statistical significance. It is defined as such: ( ) n f(x = k) = p k (1 p) n k (3.2) k 3.6 Exponential Distribution The Exponential distribution represents an exponential function using lambda: f(x) = λe λx (3.3) Where lambda is the slope of the curve (the function is in slope-intercept form). e would be the variable. As lambda increases, the slope gets more steep. 3.7 Gaussian/Normal distribution Gaussian distribution also known as normal distribution is a function that represents the distribution of variable as a symmetrical bell-curve. It is used to model things that we expect to see in real life. A random value has a higher probability of being closer to the mean of the data set. The probability density function of the normal distribution is represented by the following equation: P (x) = 1 σ 2 2π e (x µ) /2σ 2 (3.4)

Lecture 3: Review of Probability, MATLAB, Histograms 3-3 Example 3.1 (Manipulating the probability density function.). Suppose: Where: N - Normal/Gaussian µ - mu - mean σ 2 - sigma squared - variance X N (µ, σ 2 ). (3.5) The standard normal distribution is represented by the red curve. A change in µ causes a lateral shift of the function. A change in σ 2 changes the maximum value of the function where a larger variance equates a wider less focused distribution of data points.

Lecture 3: Review of Probability, MATLAB, Histograms 3-4 3.8 MATLAB MATLAB commands useful to complete Mini-project 1: command description of command help name Displays the help text for the functionality specified by name, such as a function, method, class, toolbox or variable. X = rand (m,n) Returns an m-by-n matrix of random numbers. X = randn (m,n) Returns an m-by-n matrix of normally distributed random numbers. figure (n) Finds a figure in which the Number property is equal to n, and makes it the current figure. If no figure exists with that property value, MATLAB creates a new figure and sets its Number property to n. hist (x) Creates a histogram bar chart of the elements in vector x. plot (X,Y) Creates a 2-D line plot of the data in Y versus the corresponding values in X. hold on Retains plots in the current axes so that new plots added to the axes do not delete existing plots. B = reshape (A,sz) Reshapes A using the size vector, sz, to define size(b). For example, reshape(a,[2,3]) reshapes A into a 2-by-3 matrix. sz must contain at least 2 elements, and prod(sz) must be the same as numel(a). B = repmat (A,n) Returns an array containing n copies of A in the row and column dimensions. The size of B is size(a)*n when A is a matrix. S = sum (A,dim) Returns the sum along dimension dim. For example, if A is a matrix, then sum(a,2) is a column vector containing the sum of each row. k = find (X) Returns a vector containing the linear indices of each nonzero element in array X. M = max (A) Returns the largest elements of A. M = max (A,[],dim) Returns the largest elements along dimension dim. For example, if A is a matrix, then max(a,[],2) is a column vector containing the maximum value of each row. Y = abs (X) Returns the absolute value of each element in array X.

Lecture 3: Review of Probability, MATLAB, Histograms 3-5 3.9 Histograms histogram - the distribution of your sample represented as a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval. Example 3.2 (Creating a histogram in MATLAB). Suppose: A histogram is generated in MATLAB of a 1x100 vector. 25 20 15 10 5 0-3 -2-1 0 1 2 3 4 This can be used to help determine that a Gaussian distribution has been achieved by the randn function of MATLAB and can be compared to the following random generated distribution: 14 12 10 8 6 4 2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1