Lecture Data Science

Similar documents
Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

2011 Pearson Education, Inc

The normal distribution is a theoretical model derived mathematically and not empirically.

The Bernoulli distribution

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Some Characteristics of Data

Lecture 2. Probability Distributions Theophanis Tsandilas

Basic Procedure for Histograms

MATH 118 Class Notes For Chapter 5 By: Maan Omran

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Lecture 1: Review and Exploratory Data Analysis (EDA)

Chapter 3 Discrete Random Variables and Probability Distributions

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

STAT 113 Variability

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Theoretical Foundations

Binomial Random Variables. Binomial Random Variables

Example. Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables

5.1 Personal Probability

CS145: Probability & Computing

2 Exploring Univariate Data

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Lecture 2 Describing Data

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

1/2 2. Mean & variance. Mean & standard deviation

The Binomial Distribution

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Probability Models.S2 Discrete Random Variables

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

DESCRIBING DATA: MESURES OF LOCATION

The Binomial Distribution

Data Distributions and Normality

DATA SUMMARIZATION AND VISUALIZATION

Some estimates of the height of the podium

Statistical Methods in Practice STAT/MATH 3379

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Binomial and Normal Distributions

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Descriptive Statistics

Description of Data I

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Sampling Distributions and the Central Limit Theorem

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Business Statistics 41000: Probability 4

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

4 Random Variables and Distributions

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Statistics for IT Managers

MATH 264 Problem Homework I

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Chapter 3: Probability Distributions and Statistics

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

Chapter 3 Discrete Random Variables and Probability Distributions

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

MA : Introductory Probability

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Chapter 7. Inferences about Population Variances

Probability mass function; cumulative distribution function

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

STAT 201 Chapter 6. Distribution

Copyright 2005 Pearson Education, Inc. Slide 6-1

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Lecture Week 4 Inspecting Data: Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

5. In fact, any function of a random variable is also a random variable

Class 11. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Exploring Data and Graphics

Counting Basics. Venn diagrams

II. Random Variables

CSC Advanced Scientific Programming, Spring Descriptive Statistics

MVE051/MSG Lecture 7

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

Probability. An intro for calculus students P= Figure 1: A normal integral

Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances

Section Random Variables and Histograms

Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

The Binomial Distribution

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

M3S1 - Binomial Distribution

x is a random variable which is a numerical description of the outcome of an experiment.

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Simple Descriptive Statistics

5.2 Random Variables, Probability Histograms and Probability Distributions

2 DESCRIPTIVE STATISTICS

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

ECON 214 Elements of Statistics for Economists 2016/2017

MATH 112 Section 7.3: Understanding Chance

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

Transcription:

Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics Foundations JProf. Dr. Claudia Wagner

Learning Goals How to describe sample data? What is mode/median/mean? What is variance/std? What is kurtosis/skewness? What types of data exist? Which statistic can be computed on nominal data? What is a probability? Which distribution describes the success probability of an experiment? Which distribution describes the success probability of multiple experiments? Claudia Wagner 2

Statistics Aim: learn sth about a population by analyzing sample data Population Probability Sample Descriptive Statistics Inferential Statistics Claudia Wagner 3

Types of Data (Statistician Viewpoint) Ratio (e.g., weight) Absolute zero Interval (e.g., temperature in Celsius) Distance is meaningful Ordinal (e.g., status) Observations can be ordered Nominal (e.g., ethnic group, sex, nationality) Observations are only named Stevens, S. S. (1946). "On the Theory of Scales of Measurement". Science 103 (2684): 677 680. 4 Claudia Wagner 4

Why shall we care? Claudia Wagner 5 5

Frequencies Absolute frequency h k Relative frequency (Proportion) f k Cumulative frequency c t Observations: 1 2 3 4 5 6 7 Y Y N Y Y N N Order needs to be meaningful Claudia Wagner 6

1 2 3 4 5 6 7 Y Y N Y Y N N Claudia Wagner 7

Mode Applies to nominals already! Can be used for all types of data. The mode is the value that appears most often in a set of data. What is the mode of X = [17, 19, 20, 21, 22, 23, 23, 23, 23] Claudia Wagner 8

Median X = [17, 19, 20, 21, 22, 23, 23, 23, 23, 25] Median of X is 22.5 X = [17, 19, 20, 21, 22, 23, 23, 23, 23] Median of X is 22 Median is useful for skewed distribution where mean is meaningless Applies to ordinals, intervals and ratios Claudia Wagner 9

Mean (expected value) Applies to interval scales and ratios: Example: X = [17, 19, 20, 21, 22, 23, 23, 23, 23, 25] Claudia Wagner 10

Mode, median, mean two log-normal distributions; https://en.wikipedia.org/wiki/file:comparison_mean_median_mode.svg Claudia Wagner 11

Sample of Dogs Range R = x max x min Range = 600-170 = 430 Average height of a dog (measured by shoulders) is 394mm src: http://www.mathsisfun.com/data/standard-deviation.html Claudia Wagner 12

Dispersion Variance= src: http://www.mathsisfun.com/data/standard-deviation.html Claudia Wagner 13

Standard Deviation Standard Deviation is just the square root of variance We can now show what lies within 1 std away from the mean. This helps us to assess what is normal, what is extra large or extra small? src: http://www.mathsisfun.com/data/standard-deviation.html Claudia Wagner 15

Standard Deviation Standard Deviation does not measure how far typical values tend to be from the mean How could we compute that? src: http://www.mathsisfun.com/data/standard-deviation.html Claudia Wagner 16

Percentiles The n th percentile is a value such that n% of all observations fall at or below it! Quartiles Q1 is the value for which 25% of all observations fall at or below it Image: http://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture2.pdf Claudia Wagner 17

Boxplots IQR = Q 3 Q 1 Outliers are usually 3 IQR or more above the third quartile or 3 IQR or more below the first quartile. Image: http://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture2.pdf Claudia Wagner 18

Skewness Skewness quantifies how symmetrical a distribution is. A symmetrical distribution has a skewness of zero. Negative values for the skewness indicate data is skewed left. Positive values for the skewness indicate data is skewed right. Skewness < 0 Left skew skewness=0 Skewness > 0 Right skew Claudia Wagner 19

Kurtosis Kurtosis quantifies how peaky a distribution is compared to a normal distribution A normal distribution has a kurtosis of 0. A flatter distribution has a negative kurtosis, A distribution more peaked than a Normal distribution has a positive kurtosis. kurtosis<0 kurtosis=0 kurtosis>0 Claudia Wagner 22

Kurtosis Fourth central moment Nominator: weight up strong deviations from mean how long are the tails? Denominator: large variance decreases kurtosis Large kurtosis: low variance and long tails Claudia Wagner 23

Normal Distribution More peaky than normal distribution! Positive Kurtosis! Claudia Wagner 25

Normal Distribution Flatter than normal distribution! Negative Kurtosis! Claudia Wagner 26

Normal Distribution Right skewed! Positive skewness value! Claudia Wagner 27

Normal Distribution Left skewed! Negative skewness values! Claudia Wagner 28

WHAT IS A PROBABILITY? Claudia Wagner 29

Random Variables Random Variable Discrete Random Variable Can take on only a discrete set of values. You can count the values it can take on. Continuous Random Variable Can take on any value in an interval Claudia Wagner 30

Discrete Random Variable X X=(S,P) S is a finite set of values P: S [0,1], whereby å s P(s) =1 Probability Mass Function Example: S={2,3,4,5,6,7,8,9,10,11,12} What does this mean? 0,18 0,16 0,14 0,12 0,1 0,08 0,06 0,04 0,02 0 Two Dices 2 3 4 5 6 7 8 9 10 11 12 Claudia Wagner 31

Expected Value / Expectation Which value of the random variable do we expect? E.g., for a dice E(X) = 3.5 The expected value is the long range average Claudia Wagner 32

Continuous Random Variable X f(x) Probability Density Function (PDF)? Probability that someone is exactly 180cm is zero. Height x 180cm Better: P( x - 180cm < 0.1) P(179 < x < 181) Area under the curve Claudia Wagner 33

Expected Value / Expectation Which value of the random variable do we expect? E.g., height, length of snowboard, weight E(X) = 168cm Claudia Wagner 34

Example Equal number of black and red balls in an urn p = 1-p = 0.5 You win if you pick a red ball Bernoulli Distribution Probability Mass Function Claudia Wagner 35

Bernoulli Single experiment: draw one ball Repeat experiment 5 time. Observe: BBBBR What is the probability of observing this sequence? 0.5 * 0.5 * 0.5 * 0.5 * 0.5 = 0.031 Let s assume 30% of balls are red and 70% are black? 0.7 * 0.7 * 0.7 * 0.7 * 0.3 = 0.072 Claudia Wagner 36

Likelihood for a Bernoulli Claudia Wagner 37

Binomial Distribution What if drawing 5 balls becomes one experiment If we repeat the experiment what is the probability of observing k successes out of n? Success is e.g. picking a red ball PMF of a binomial distribution Number of ways to choose k elements from a set of n elements disregarding their order Claudia Wagner 38

Change Parameters p and n are the parameter of the binomial distribution Claudia Wagner 39

Discrete Random Variable X One Experiment: Toss 4 coins Coin shows either head H or tail T Number of all possible outcomes? 2 4 = 16 Claudia Wagner 40

Discrete Random Variable X 4 coin tosses: S={0,1,2,3} Number of all possible outcomes? 2 4 = 16 Number of outcomes that give 3 heads = 4!/(3!*1!) = 4 Probability of observing 3 heads: 4/16= 0.25 Number of ways to choose k elements from a set of n elements disregarding their order Claudia Wagner 41

Discrete Random Variable X One Experiment: Toss 4 coins Coin shows either head H or tail T Number of all possible outcomes? 2 4 = 16 Number of outcomes that give you 3 heads = 4 Claudia Wagner 42

Exploit the fact that you know that the PMF of the Binomial distribution: Probability of observing 3 heads when we toss a fair coin 4 times (no order): 4!/(3! 1!) 0.5 3 0.5 1 = 0.25 Claudia Wagner 43

What is the probability of observing 3 THREE when rolling 4 dices (no order)? 4 1/6 3 5/6 1 = 0.015 6 4 =1296 combinations if you roll 4 dices How many of them show 3 THREE? 4!/(3!*1!) * 5 = 20 20/1296 = 0.015 What is the probability of observing 3 THREE in a row when rolling a dice 4 times (order)? 2 (1/6) 3 (5/6) = 0.0077 Claudia Wagner 44

What is the probability of observing at least 5 heads when we flip a fair coin 6 times (no order)? Claudia Wagner 45

Discrete Random Variable X 6 times repeated coin toss: S={0,1,2,3,4,5,6} Number of all possible outcomes? 2 6 = 64 Number of outcomes that give you at least 5 heads = 6!/(6!*1!) + 6!/(6!*0!) = 7 Probability: 7/64 = 0.1094 6!/(5!*1!)*0.5 5 *0.5 + 6!/(6!*0!) *0.5 6 = 0.1094 Claudia Wagner 46

Probability Distribution Why do we care? Compute the probability of observations! How likely is an observation given certain parameters? Claudia Wagner 47

Example Source: https://west.uni-koblenz.de/en/studium/lehrveranstaltungen/ws1617/probabilistic-functionalprogramming Claudia Wagner 48

Why should we care? Most of the time we do not know the parameter of the true distribution that generated our sample data 1. But we can test hypothesis about the parameter If we observe 5 times head in 6 coin tosses what is the probability that the coin was fair? 2. And we can estimate the parameter from the observed sample data Inference! If we observe 5 times head in 6 coin tosses what was the parameter p of the coin? Claudia Wagner 49

HYPOTHESIS TESTING Claudia Wagner 50

Hypothesis Testing Example: my hypothesis is that the coin is unfair. We create a null-hypothesis which would falsify our hypothesis if it was true. Then we try to reject the null hypothesis (with a certain probability). Null-Hypothesis H 0 would be? H 0 : X ~ Binom(n, p=0.5) Alternative-Hypothesis H A would be? H A : X ~ Binom(n, p!=0.5) Claudia Wagner 51

Can we verify my hypothesis and if so how? We can never verify a hypothesis, we can only reject it! If we can reject H 0 we have more evidence that supports the assumption that H A can be true, but we do not know it. Which experiment could we conduct in order to test if we should reject H 0? Repeated coin toss experiments. How many heads do we observe? How many heads would we expect if H 0 would be true? Claudia Wagner 52

Remember The binomial distribution shows how likely it is that we observe X heads; Parameter: number of trials per experiment n=6 and fairness of coin p=0.5 0 1 2 3 4 5 6 1 6 15 20 15 6 1 0,015625 0,09375 0,234375 0,3125 0,234375 0,09375 0,015625 0,35 0,3 Binomial distribution Reference distribution: 0,25 0,2 0,15 0,1 0,05 0 0 1 2 3 4 5 6 We expect that the outcome of the experiment should look like this if p=0.5 and n=6 Binomial distribution Claudia Wagner 53

Hypothesis Testing: 3 Steps Compute a suitable test statistic t obs from the observed sample data and compare it to a reference distribution E.g. expected number of heads from our repeated experiments is 5 Reference distribution describes how your data looks like if the null hypothesis is true E.g. expected number of heads = 3 Find out if t obs lies in the critical regions of the reference distribution Claudia Wagner 54

Remember The binomial distribution shows how likely it is that we observe X heads; Parameter: number of trials n=6 and fairness of coin p=0.5 0 1 2 3 4 5 6 1 6 15 20 15 6 1 0,015625 0,09375 0,234375 0,3125 0,234375 0,09375 0,015625 0,35 0,3 Binomial distribution 0,25 0,2 0,15 0,1 Critical Region of the reference distribution Binomial distribution 0,05 0 0 1 2 3 4 5 6 Claudia Wagner 55

Learning Goals How to describe sample data? What is mode/median/mean? What is variance/std? What is kurtosis/skewness? What types of data exist? Which statistic can be computed on nominal data? What is a probability? Which distribution describes the success probability of an experiment? Which distribution describes the success probability of multiple experiments? Claudia Wagner 56