Lecture Data Science

Size: px
Start display at page:

Download "Lecture Data Science"

Transcription

1 Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics Foundations JProf. Dr. Claudia Wagner

2 Learning Goals How to describe sample data? What is mode/median/mean? What is variance/std? What is kurtosis/skewness? What types of data exist? Which statistic can be computed on nominal data? What is a probability? Which distribution describes the success probability of an experiment? Which distribution describes the success probability of multiple experiments? Claudia Wagner 2

3 Statistics Aim: learn sth about a population by analyzing sample data Population Probability Sample Descriptive Statistics Inferential Statistics Claudia Wagner 3

4 Types of Data (Statistician Viewpoint) Ratio (e.g., weight) Absolute zero Interval (e.g., temperature in Celsius) Distance is meaningful Ordinal (e.g., status) Observations can be ordered Nominal (e.g., ethnic group, sex, nationality) Observations are only named Stevens, S. S. (1946). "On the Theory of Scales of Measurement". Science 103 (2684): Claudia Wagner 4

5 Why shall we care? Claudia Wagner 5 5

6 Frequencies Absolute frequency h k Relative frequency (Proportion) f k Cumulative frequency c t Observations: Y Y N Y Y N N Order needs to be meaningful Claudia Wagner 6

7 Y Y N Y Y N N Claudia Wagner 7

8 Mode Applies to nominals already! Can be used for all types of data. The mode is the value that appears most often in a set of data. What is the mode of X = [17, 19, 20, 21, 22, 23, 23, 23, 23] Claudia Wagner 8

9 Median X = [17, 19, 20, 21, 22, 23, 23, 23, 23, 25] Median of X is 22.5 X = [17, 19, 20, 21, 22, 23, 23, 23, 23] Median of X is 22 Median is useful for skewed distribution where mean is meaningless Applies to ordinals, intervals and ratios Claudia Wagner 9

10 Mean (expected value) Applies to interval scales and ratios: Example: X = [17, 19, 20, 21, 22, 23, 23, 23, 23, 25] Claudia Wagner 10

11 Mode, median, mean two log-normal distributions; Claudia Wagner 11

12 Sample of Dogs Range R = x max x min Range = = 430 Average height of a dog (measured by shoulders) is 394mm src: Claudia Wagner 12

13 Dispersion Variance= src: Claudia Wagner 13

14 Standard Deviation Standard Deviation is just the square root of variance We can now show what lies within 1 std away from the mean. This helps us to assess what is normal, what is extra large or extra small? src: Claudia Wagner 15

15 Standard Deviation Standard Deviation does not measure how far typical values tend to be from the mean How could we compute that? src: Claudia Wagner 16

16 Percentiles The n th percentile is a value such that n% of all observations fall at or below it! Quartiles Q1 is the value for which 25% of all observations fall at or below it Image: Claudia Wagner 17

17 Boxplots IQR = Q 3 Q 1 Outliers are usually 3 IQR or more above the third quartile or 3 IQR or more below the first quartile. Image: Claudia Wagner 18

18 Skewness Skewness quantifies how symmetrical a distribution is. A symmetrical distribution has a skewness of zero. Negative values for the skewness indicate data is skewed left. Positive values for the skewness indicate data is skewed right. Skewness < 0 Left skew skewness=0 Skewness > 0 Right skew Claudia Wagner 19

19 Kurtosis Kurtosis quantifies how peaky a distribution is compared to a normal distribution A normal distribution has a kurtosis of 0. A flatter distribution has a negative kurtosis, A distribution more peaked than a Normal distribution has a positive kurtosis. kurtosis<0 kurtosis=0 kurtosis>0 Claudia Wagner 22

20 Kurtosis Fourth central moment Nominator: weight up strong deviations from mean how long are the tails? Denominator: large variance decreases kurtosis Large kurtosis: low variance and long tails Claudia Wagner 23

21 Normal Distribution More peaky than normal distribution! Positive Kurtosis! Claudia Wagner 25

22 Normal Distribution Flatter than normal distribution! Negative Kurtosis! Claudia Wagner 26

23 Normal Distribution Right skewed! Positive skewness value! Claudia Wagner 27

24 Normal Distribution Left skewed! Negative skewness values! Claudia Wagner 28

25 WHAT IS A PROBABILITY? Claudia Wagner 29

26 Random Variables Random Variable Discrete Random Variable Can take on only a discrete set of values. You can count the values it can take on. Continuous Random Variable Can take on any value in an interval Claudia Wagner 30

27 Discrete Random Variable X X=(S,P) S is a finite set of values P: S [0,1], whereby å s P(s) =1 Probability Mass Function Example: S={2,3,4,5,6,7,8,9,10,11,12} What does this mean? 0,18 0,16 0,14 0,12 0,1 0,08 0,06 0,04 0,02 0 Two Dices Claudia Wagner 31

28 Expected Value / Expectation Which value of the random variable do we expect? E.g., for a dice E(X) = 3.5 The expected value is the long range average Claudia Wagner 32

29 Continuous Random Variable X f(x) Probability Density Function (PDF)? Probability that someone is exactly 180cm is zero. Height x 180cm Better: P( x - 180cm < 0.1) P(179 < x < 181) Area under the curve Claudia Wagner 33

30 Expected Value / Expectation Which value of the random variable do we expect? E.g., height, length of snowboard, weight E(X) = 168cm Claudia Wagner 34

31 Example Equal number of black and red balls in an urn p = 1-p = 0.5 You win if you pick a red ball Bernoulli Distribution Probability Mass Function Claudia Wagner 35

32 Bernoulli Single experiment: draw one ball Repeat experiment 5 time. Observe: BBBBR What is the probability of observing this sequence? 0.5 * 0.5 * 0.5 * 0.5 * 0.5 = Let s assume 30% of balls are red and 70% are black? 0.7 * 0.7 * 0.7 * 0.7 * 0.3 = Claudia Wagner 36

33 Likelihood for a Bernoulli Claudia Wagner 37

34 Binomial Distribution What if drawing 5 balls becomes one experiment If we repeat the experiment what is the probability of observing k successes out of n? Success is e.g. picking a red ball PMF of a binomial distribution Number of ways to choose k elements from a set of n elements disregarding their order Claudia Wagner 38

35 Change Parameters p and n are the parameter of the binomial distribution Claudia Wagner 39

36 Discrete Random Variable X One Experiment: Toss 4 coins Coin shows either head H or tail T Number of all possible outcomes? 2 4 = 16 Claudia Wagner 40

37 Discrete Random Variable X 4 coin tosses: S={0,1,2,3} Number of all possible outcomes? 2 4 = 16 Number of outcomes that give 3 heads = 4!/(3!*1!) = 4 Probability of observing 3 heads: 4/16= 0.25 Number of ways to choose k elements from a set of n elements disregarding their order Claudia Wagner 41

38 Discrete Random Variable X One Experiment: Toss 4 coins Coin shows either head H or tail T Number of all possible outcomes? 2 4 = 16 Number of outcomes that give you 3 heads = 4 Claudia Wagner 42

39 Exploit the fact that you know that the PMF of the Binomial distribution: Probability of observing 3 heads when we toss a fair coin 4 times (no order): 4!/(3! 1!) = 0.25 Claudia Wagner 43

40 What is the probability of observing 3 THREE when rolling 4 dices (no order)? 4 1/6 3 5/6 1 = =1296 combinations if you roll 4 dices How many of them show 3 THREE? 4!/(3!*1!) * 5 = 20 20/1296 = What is the probability of observing 3 THREE in a row when rolling a dice 4 times (order)? 2 (1/6) 3 (5/6) = Claudia Wagner 44

41 What is the probability of observing at least 5 heads when we flip a fair coin 6 times (no order)? Claudia Wagner 45

42 Discrete Random Variable X 6 times repeated coin toss: S={0,1,2,3,4,5,6} Number of all possible outcomes? 2 6 = 64 Number of outcomes that give you at least 5 heads = 6!/(6!*1!) + 6!/(6!*0!) = 7 Probability: 7/64 = !/(5!*1!)*0.5 5 * !/(6!*0!) *0.5 6 = Claudia Wagner 46

43 Probability Distribution Why do we care? Compute the probability of observations! How likely is an observation given certain parameters? Claudia Wagner 47

44 Example Source: Claudia Wagner 48

45 Why should we care? Most of the time we do not know the parameter of the true distribution that generated our sample data 1. But we can test hypothesis about the parameter If we observe 5 times head in 6 coin tosses what is the probability that the coin was fair? 2. And we can estimate the parameter from the observed sample data Inference! If we observe 5 times head in 6 coin tosses what was the parameter p of the coin? Claudia Wagner 49

46 HYPOTHESIS TESTING Claudia Wagner 50

47 Hypothesis Testing Example: my hypothesis is that the coin is unfair. We create a null-hypothesis which would falsify our hypothesis if it was true. Then we try to reject the null hypothesis (with a certain probability). Null-Hypothesis H 0 would be? H 0 : X ~ Binom(n, p=0.5) Alternative-Hypothesis H A would be? H A : X ~ Binom(n, p!=0.5) Claudia Wagner 51

48 Can we verify my hypothesis and if so how? We can never verify a hypothesis, we can only reject it! If we can reject H 0 we have more evidence that supports the assumption that H A can be true, but we do not know it. Which experiment could we conduct in order to test if we should reject H 0? Repeated coin toss experiments. How many heads do we observe? How many heads would we expect if H 0 would be true? Claudia Wagner 52

49 Remember The binomial distribution shows how likely it is that we observe X heads; Parameter: number of trials per experiment n=6 and fairness of coin p= , , , ,3125 0, , , ,35 0,3 Binomial distribution Reference distribution: 0,25 0,2 0,15 0,1 0, We expect that the outcome of the experiment should look like this if p=0.5 and n=6 Binomial distribution Claudia Wagner 53

50 Hypothesis Testing: 3 Steps Compute a suitable test statistic t obs from the observed sample data and compare it to a reference distribution E.g. expected number of heads from our repeated experiments is 5 Reference distribution describes how your data looks like if the null hypothesis is true E.g. expected number of heads = 3 Find out if t obs lies in the critical regions of the reference distribution Claudia Wagner 54

51 Remember The binomial distribution shows how likely it is that we observe X heads; Parameter: number of trials n=6 and fairness of coin p= , , , ,3125 0, , , ,35 0,3 Binomial distribution 0,25 0,2 0,15 0,1 Critical Region of the reference distribution Binomial distribution 0, Claudia Wagner 55

52 Learning Goals How to describe sample data? What is mode/median/mean? What is variance/std? What is kurtosis/skewness? What types of data exist? Which statistic can be computed on nominal data? What is a probability? Which distribution describes the success probability of an experiment? Which distribution describes the success probability of multiple experiments? Claudia Wagner 56

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr. Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics and Probabilities JProf. Dr. Claudia Wagner Data Science Open Position @GESIS Student Assistant Job in Data

More information

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.

More information

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes Model Paper Statistics Objective Intermediate Part I (11 th Class) Examination Session 2012-2013 and onward Total marks: 17 Paper Code Time Allowed: 20 minutes Note:- You have four choices for each objective

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 4 Random Variables & Probability Distributions Content 1. Two Types of Random Variables 2. Probability Distributions for Discrete Random Variables 3. The Binomial

More information

The normal distribution is a theoretical model derived mathematically and not empirically.

The normal distribution is a theoretical model derived mathematically and not empirically. Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.

More information

The Bernoulli distribution

The Bernoulli distribution This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal Econ 6900: Statistical Problems Instructor: Yogesh Uppal Email: yuppal@ysu.edu Lecture Slides 4 Random Variables Probability Distributions Discrete Distributions Discrete Uniform Probability Distribution

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

Lecture 2. Probability Distributions Theophanis Tsandilas

Lecture 2. Probability Distributions Theophanis Tsandilas Lecture 2 Probability Distributions Theophanis Tsandilas Comment on measures of dispersion Why do common measures of dispersion (variance and standard deviation) use sums of squares: nx (x i ˆµ) 2 i=1

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

MATH 118 Class Notes For Chapter 5 By: Maan Omran

MATH 118 Class Notes For Chapter 5 By: Maan Omran MATH 118 Class Notes For Chapter 5 By: Maan Omran Section 5.1 Central Tendency Mode: the number or numbers that occur most often. Median: the number at the midpoint of a ranked data. Ex1: The test scores

More information

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions. ME3620 Theory of Engineering Experimentation Chapter III. Random Variables and Probability Distributions Chapter III 1 3.2 Random Variables In an experiment, a measurement is usually denoted by a variable

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Ani Manichaikul amanicha@jhsph.edu 16 April 2007 1 / 40 Course Information I Office hours For questions and help When? I ll announce this tomorrow

More information

Chapter 3 Discrete Random Variables and Probability Distributions

Chapter 3 Discrete Random Variables and Probability Distributions Chapter 3 Discrete Random Variables and Probability Distributions Part 2: Mean and Variance of a Discrete Random Variable Section 3.4 1 / 16 Discrete Random Variable - Expected Value In a random experiment,

More information

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3 Sections from Text and MIT Video Lecture: Sections 2.1 through 2.5 http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systemsanalysis-and-applied-probability-fall-2010/video-lectures/lecture-1-probability-models-and-axioms/

More information

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw MAS1403 Quantitative Methods for Business Management Semester 1, 2018 2019 Module leader: Dr. David Walshaw Additional lecturers: Dr. James Waldren and Dr. Stuart Hall Announcements: Written assignment

More information

STAT 113 Variability

STAT 113 Variability STAT 113 Variability Colin Reimer Dawson Oberlin College September 14, 2017 1 / 48 Outline Last Time: Shape and Center Variability Boxplots and the IQR Variance and Standard Deviaton Transformations 2

More information

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table: Chapter8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables tthe value of the result of the probability experiment is a RANDOM VARIABLE. Example - Let X be the number

More information

Theoretical Foundations

Theoretical Foundations Theoretical Foundations Probabilities Monia Ranalli monia.ranalli@uniroma2.it Ranalli M. Theoretical Foundations - Probabilities 1 / 27 Objectives understand the probability basics quantify random phenomena

More information

Binomial Random Variables. Binomial Random Variables

Binomial Random Variables. Binomial Random Variables Bernoulli Trials Definition A Bernoulli trial is a random experiment in which there are only two possible outcomes - success and failure. 1 Tossing a coin and considering heads as success and tails as

More information

Example. Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables

Example. Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables You are dealt a hand of 5 cards. Find the probability distribution table for the number of hearts. Graph

More information

5.1 Personal Probability

5.1 Personal Probability 5. Probability Value Page 1 5.1 Personal Probability Although we think probability is something that is confined to math class, in the form of personal probability it is something we use to make decisions

More information

CS145: Probability & Computing

CS145: Probability & Computing CS145: Probability & Computing Lecture 8: Variance of Sums, Cumulative Distribution, Continuous Variables Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,

More information

2 Exploring Univariate Data

2 Exploring Univariate Data 2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting

More information

David Tenenbaum GEOG 090 UNC-CH Spring 2005

David Tenenbaum GEOG 090 UNC-CH Spring 2005 Simple Descriptive Statistics Review and Examples You will likely make use of all three measures of central tendency (mode, median, and mean), as well as some key measures of dispersion (standard deviation,

More information

Lecture 2 Describing Data

Lecture 2 Describing Data Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms

More information

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form: 1 Exercise One Note that the data is not grouped! 1.1 Calculate the mean ROI Below you find the raw data in tabular form: Obs Data 1 18.5 2 18.6 3 17.4 4 12.2 5 19.7 6 5.6 7 7.7 8 9.8 9 19.9 10 9.9 11

More information

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table: Chapter7 Probability Distributions and Statistics Distributions of Random Variables tthe value of the result of the probability experiment is a RANDOM VARIABLE. Example - Let X be the number of boys in

More information

1/2 2. Mean & variance. Mean & standard deviation

1/2 2. Mean & variance. Mean & standard deviation Question # 1 of 10 ( Start time: 09:46:03 PM ) Total Marks: 1 The probability distribution of X is given below. x: 0 1 2 3 4 p(x): 0.73? 0.06 0.04 0.01 What is the value of missing probability? 0.54 0.16

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution January 31, 2018 Contents The Binomial Distribution The Normal Approximation to the Binomial The Binomial Hypothesis Test Computing Binomial Probabilities in R 30 Problems The

More information

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9 INF5830 015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 3, 1.9 Today: More statistics Binomial distribution Continuous random variables/distributions Normal distribution Sampling and sampling

More information

Probability Models.S2 Discrete Random Variables

Probability Models.S2 Discrete Random Variables Probability Models.S2 Discrete Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Results of an experiment involving uncertainty are described by one or more random

More information

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9 1 INF5830 2015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 3, 1.9 Today: More statistics 2 Recap Probability distributions Categorical distributions Bernoulli trial Binomial distribution

More information

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI 88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical

More information

DESCRIBING DATA: MESURES OF LOCATION

DESCRIBING DATA: MESURES OF LOCATION DESCRIBING DATA: MESURES OF LOCATION A. Measures of Central Tendency Measures of Central Tendency are used to pinpoint the center or average of a data set which can then be used to represent the typical

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution January 31, 2019 Contents The Binomial Distribution The Normal Approximation to the Binomial The Binomial Hypothesis Test Computing Binomial Probabilities in R 30 Problems The

More information

Data Distributions and Normality

Data Distributions and Normality Data Distributions and Normality Definition (Non)Parametric Parametric statistics assume that data come from a normal distribution, and make inferences about parameters of that distribution. These statistical

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

Some estimates of the height of the podium

Some estimates of the height of the podium Some estimates of the height of the podium 24 36 40 40 40 41 42 44 46 48 50 53 65 98 1 5 number summary Inter quartile range (IQR) range = max min 2 1.5 IQR outlier rule 3 make a boxplot 24 36 40 40 40

More information

Statistical Methods in Practice STAT/MATH 3379

Statistical Methods in Practice STAT/MATH 3379 Statistical Methods in Practice STAT/MATH 3379 Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics Sam Houston State University Overview 6.1 Discrete

More information

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی یادگیري ماشین توزیع هاي نمونه و تخمین نقطه اي پارامترها Sampling Distributions and Point Estimation of Parameter (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی درس هفتم 1 Outline Introduction

More information

Binomial and Normal Distributions

Binomial and Normal Distributions Binomial and Normal Distributions Bernoulli Trials A Bernoulli trial is a random experiment with 2 special properties: The result of a Bernoulli trial is binary. Examples: Heads vs. Tails, Healthy vs.

More information

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions SGSB Workshop: Using Statistical Data to Make Decisions Module 2: The Logic of Statistical Inference Dr. Tom Ilvento January 2006 Dr. Mugdim Pašić Key Objectives Understand the logic of statistical inference

More information

Descriptive Statistics

Descriptive Statistics Petra Petrovics Descriptive Statistics 2 nd seminar DESCRIPTIVE STATISTICS Definition: Descriptive statistics is concerned only with collecting and describing data Methods: - statistical tables and graphs

More information

Description of Data I

Description of Data I Description of Data I (Summary and Variability measures) Objectives: Able to understand how to summarize the data Able to understand how to measure the variability of the data Able to use and interpret

More information

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Chapter 8 Measures of Center Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Data that can only be integer

More information

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality Point Estimation Some General Concepts of Point Estimation Statistical inference = conclusions about parameters Parameters == population characteristics A point estimate of a parameter is a value (based

More information

Sampling Distributions and the Central Limit Theorem

Sampling Distributions and the Central Limit Theorem Sampling Distributions and the Central Limit Theorem February 18 Data distributions and sampling distributions So far, we have discussed the distribution of data (i.e. of random variables in our sample,

More information

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi Chapter 4: Commonly Used Distributions Statistics for Engineers and Scientists Fourth Edition William Navidi 2014 by Education. This is proprietary material solely for authorized instructor use. Not authorized

More information

Business Statistics 41000: Probability 4

Business Statistics 41000: Probability 4 Business Statistics 41000: Probability 4 Drew D. Creal University of Chicago, Booth School of Business February 14 and 15, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:

More information

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why? Probability Introduction Shifting our focus We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why? What is Probability? Probability is used

More information

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate

More information

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate

More information

4 Random Variables and Distributions

4 Random Variables and Distributions 4 Random Variables and Distributions Random variables A random variable assigns each outcome in a sample space. e.g. called a realization of that variable to Note: We ll usually denote a random variable

More information

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives Basic Statistics for the Healthcare Professional 1 F R A N K C O H E N, M B B, M P A D I R E C T O R O F A N A L Y T I C S D O C T O R S M A N A G E M E N T, LLC Purpose of Statistic 2 Provide a numerical

More information

Statistics for IT Managers

Statistics for IT Managers Statistics for IT Managers 95-796, Fall 212 Course Overview Instructor: Daniel B. Neill (neill@cs.cmu.edu) TAs: Eli (Han) Liu, Kats Sasanuma, Sriram Somanchi, Skyler Speakman, Quan Wang, Yiye Zhang (see

More information

MATH 264 Problem Homework I

MATH 264 Problem Homework I MATH Problem Homework I Due to December 9, 00@:0 PROBLEMS & SOLUTIONS. A student answers a multiple-choice examination question that offers four possible answers. Suppose that the probability that the

More information

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s). We will look the three common and useful measures of spread. The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s). 1 Ameasure of the center

More information

Chapter 3: Probability Distributions and Statistics

Chapter 3: Probability Distributions and Statistics Chapter 3: Probability Distributions and Statistics Section 3.-3.3 3. Random Variables and Histograms A is a rule that assigns precisely one real number to each outcome of an experiment. We usually denote

More information

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course). 4: Probability What is probability? The probability of an event is its relative frequency (proportion) in the population. An event that happens half the time (such as a head showing up on the flip of a

More information

Chapter 3 Discrete Random Variables and Probability Distributions

Chapter 3 Discrete Random Variables and Probability Distributions Chapter 3 Discrete Random Variables and Probability Distributions Part 3: Special Discrete Random Variable Distributions Section 3.5 Discrete Uniform Section 3.6 Bernoulli and Binomial Others sections

More information

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom Review for Final Exam 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom THANK YOU!!!! JON!! PETER!! RUTHI!! ERIKA!! ALL OF YOU!!!! Probability Counting Sets Inclusion-exclusion principle Rule of product

More information

MA : Introductory Probability

MA : Introductory Probability MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:

More information

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation Name In a binomial experiment of n trials, where p = probability of success and q = probability of failure mean variance standard deviation µ = n p σ = n p q σ = n p q Notation X ~ B(n, p) The probability

More information

Chapter 7. Inferences about Population Variances

Chapter 7. Inferences about Population Variances Chapter 7. Inferences about Population Variances Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from

More information

Probability mass function; cumulative distribution function

Probability mass function; cumulative distribution function PHP 2510 Random variables; some discrete distributions Random variables - what are they? Probability mass function; cumulative distribution function Some discrete random variable models: Bernoulli Binomial

More information

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017 Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017 Please fill out the attendance sheet! Suggestions Box: Feedback and suggestions are important to the

More information

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii) Contents (ix) Contents Preface... (vii) CHAPTER 1 An Overview of Statistical Applications 1.1 Introduction... 1 1. Probability Functions and Statistics... 1..1 Discrete versus Continuous Functions... 1..

More information

STAT 201 Chapter 6. Distribution

STAT 201 Chapter 6. Distribution STAT 201 Chapter 6 Distribution 1 Random Variable We know variable Random Variable: a numerical measurement of the outcome of a random phenomena Capital letter refer to the random variable Lower case letters

More information

Copyright 2005 Pearson Education, Inc. Slide 6-1

Copyright 2005 Pearson Education, Inc. Slide 6-1 Copyright 2005 Pearson Education, Inc. Slide 6-1 Chapter 6 Copyright 2005 Pearson Education, Inc. Measures of Center in a Distribution 6-A The mean is what we most commonly call the average value. It is

More information

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either

More information

Lecture Week 4 Inspecting Data: Distributions

Lecture Week 4 Inspecting Data: Distributions Lecture Week 4 Inspecting Data: Distributions Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit So next week No lecture & workgroups But Practice Test on-line (BB) Enter data for your

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and

More information

5. In fact, any function of a random variable is also a random variable

5. In fact, any function of a random variable is also a random variable Random Variables - Class 11 October 14, 2012 Debdeep Pati 1 Random variables 1.1 Expectation of a function of a random variable 1. Expectation of a function of a random variable 2. We know E(X) = x xp(x)

More information

Class 11. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 11. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 11 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 2017 by D.B. Rowe 1 Agenda: Recap Chapter 5.3 continued Lecture 6.1-6.2 Go over Eam 2. 2 5: Probability

More information

Exploring Data and Graphics

Exploring Data and Graphics Exploring Data and Graphics Rick White Department of Statistics, UBC Graduate Pathways to Success Graduate & Postdoctoral Studies November 13, 2013 Outline Summarizing Data Types of Data Visualizing Data

More information

Counting Basics. Venn diagrams

Counting Basics. Venn diagrams Counting Basics Sets Ways of specifying sets Union and intersection Universal set and complements Empty set and disjoint sets Venn diagrams Counting Inclusion-exclusion Multiplication principle Addition

More information

II. Random Variables

II. Random Variables II. Random Variables Random variables operate in much the same way as the outcomes or events in some arbitrary sample space the distinction is that random variables are simply outcomes that are represented

More information

CSC Advanced Scientific Programming, Spring Descriptive Statistics

CSC Advanced Scientific Programming, Spring Descriptive Statistics CSC 223 - Advanced Scientific Programming, Spring 2018 Descriptive Statistics Overview Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.

More information

MVE051/MSG Lecture 7

MVE051/MSG Lecture 7 MVE051/MSG810 2017 Lecture 7 Petter Mostad Chalmers November 20, 2017 The purpose of collecting and analyzing data Purpose: To build and select models for parts of the real world (which can be used for

More information

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS 8.1 Distribution of Random Variables Random Variable Probability Distribution of Random Variables 8.2 Expected Value Mean Mean is the average value of

More information

Probability. An intro for calculus students P= Figure 1: A normal integral

Probability. An intro for calculus students P= Figure 1: A normal integral Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided

More information

Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances

Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances Physical Principles in Biology Biology 3550 Fall 2018 Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances Monday, 10 September 2018 c David P. Goldenberg University

More information

Section Random Variables and Histograms

Section Random Variables and Histograms Section 3.1 - Random Variables and Histograms Definition: A random variable is a rule that assigns a number to each outcome of an experiment. Example 1: Suppose we toss a coin three times. Then we could

More information

Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes

Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes MDM 4U Probability Review Properties of Probability Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes Theoretical

More information

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny. Distributions September 17 Random variables Anything that can be measured or categorized is called a variable If the value that a variable takes on is subject to variability, then it the variable is a

More information

The Binomial Distribution

The Binomial Distribution Patrick Breheny September 13 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 16 Outcomes and summary statistics Random variables Distributions So far, we have discussed the

More information

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012 The Normal Distribution & Descriptive Statistics Kin 304W Week 2: Jan 15, 2012 1 Questionnaire Results I received 71 completed questionnaires. Thank you! Are you nervous about scientific writing? You re

More information

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS A box plot is a pictorial representation of the data and can be used to get a good idea and a clear picture about the distribution of the data. It shows

More information

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016 Probability Theory Probability and Statistics for Data Science CSE594 - Spring 2016 What is Probability? 2 What is Probability? Examples outcome of flipping a coin (seminal example) amount of snowfall

More information

M3S1 - Binomial Distribution

M3S1 - Binomial Distribution M3S1 - Binomial Distribution Professor Jarad Niemi STAT 226 - Iowa State University September 28, 2018 Professor Jarad Niemi (STAT226@ISU) M3S1 - Binomial Distribution September 28, 2018 1 / 28 Outline

More information

x is a random variable which is a numerical description of the outcome of an experiment.

x is a random variable which is a numerical description of the outcome of an experiment. Chapter 5 Discrete Probability Distributions Random Variables is a random variable which is a numerical description of the outcome of an eperiment. Discrete: If the possible values change by steps or jumps.

More information

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Module 3: Sampling Distributions and the CLT Statistics (OA3102) Module 3: Sampling Distributions and the CLT Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chpt 7.1-7.3, 7.5 Revision: 1-12 1 Goals for

More information

Simple Descriptive Statistics

Simple Descriptive Statistics Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency

More information

5.2 Random Variables, Probability Histograms and Probability Distributions

5.2 Random Variables, Probability Histograms and Probability Distributions Chapter 5 5.2 Random Variables, Probability Histograms and Probability Distributions A random variable (r.v.) can be either continuous or discrete. It takes on the possible values of an experiment. It

More information

2 DESCRIPTIVE STATISTICS

2 DESCRIPTIVE STATISTICS Chapter 2 Descriptive Statistics 47 2 DESCRIPTIVE STATISTICS Figure 2.1 When you have large amounts of data, you will need to organize it in a way that makes sense. These ballots from an election are rolled

More information

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION 1 Day 3 Summer 2017.07.31 DISTRIBUTION Symmetry Modality 单峰, 双峰 Skewness 正偏或负偏 Kurtosis 2 3 CHAPTER 4 Measures of Central Tendency 集中趋势

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic Probability Distributions: Binomial and Poisson Distributions Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College

More information

MATH 112 Section 7.3: Understanding Chance

MATH 112 Section 7.3: Understanding Chance MATH 112 Section 7.3: Understanding Chance Prof. Jonathan Duncan Walla Walla University Autumn Quarter, 2007 Outline 1 Introduction to Probability 2 Theoretical vs. Experimental Probability 3 Advanced

More information

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1 32.S [F] SU 02 June 2014 2015 All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1 32.S [F] SU 02 June 2014 2015 All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 2 32.S

More information