Lecture Data Science - PDF Free Download

Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics Foundations JProf. Dr. Claudia Wagner

Learning Goals How to describe sample data? What is mode/median/mean? What is variance/std? What is kurtosis/skewness? What types of data exist? Which statistic can be computed on nominal data? What is a probability? Which distribution describes the success probability of an experiment? Which distribution describes the success probability of multiple experiments? Claudia Wagner 2

Statistics Aim: learn sth about a population by analyzing sample data Population Probability Sample Descriptive Statistics Inferential Statistics Claudia Wagner 3

Types of Data (Statistician Viewpoint) Ratio (e.g., weight) Absolute zero Interval (e.g., temperature in Celsius) Distance is meaningful Ordinal (e.g., status) Observations can be ordered Nominal (e.g., ethnic group, sex, nationality) Observations are only named Stevens, S. S. (1946). "On the Theory of Scales of Measurement". Science 103 (2684): 677 680. 4 Claudia Wagner 4

Why shall we care? Claudia Wagner 5 5

Frequencies Absolute frequency h k Relative frequency (Proportion) f k Cumulative frequency c t Observations: 1 2 3 4 5 6 7 Y Y N Y Y N N Order needs to be meaningful Claudia Wagner 6

1 2 3 4 5 6 7 Y Y N Y Y N N Claudia Wagner 7

Mode Applies to nominals already! Can be used for all types of data. The mode is the value that appears most often in a set of data. What is the mode of X = [17, 19, 20, 21, 22, 23, 23, 23, 23] Claudia Wagner 8

Median X = [17, 19, 20, 21, 22, 23, 23, 23, 23, 25] Median of X is 22.5 X = [17, 19, 20, 21, 22, 23, 23, 23, 23] Median of X is 22 Median is useful for skewed distribution where mean is meaningless Applies to ordinals, intervals and ratios Claudia Wagner 9

Mean (expected value) Applies to interval scales and ratios: Example: X = [17, 19, 20, 21, 22, 23, 23, 23, 23, 25] Claudia Wagner 10

Mode, median, mean two log-normal distributions; https://en.wikipedia.org/wiki/file:comparison_mean_median_mode.svg Claudia Wagner 11

Sample of Dogs Range R = x max x min Range = 600-170 = 430 Average height of a dog (measured by shoulders) is 394mm src: http://www.mathsisfun.com/data/standard-deviation.html Claudia Wagner 12

Dispersion Variance= src: http://www.mathsisfun.com/data/standard-deviation.html Claudia Wagner 13

Standard Deviation Standard Deviation is just the square root of variance We can now show what lies within 1 std away from the mean. This helps us to assess what is normal, what is extra large or extra small? src: http://www.mathsisfun.com/data/standard-deviation.html Claudia Wagner 15

Standard Deviation Standard Deviation does not measure how far typical values tend to be from the mean How could we compute that? src: http://www.mathsisfun.com/data/standard-deviation.html Claudia Wagner 16

Percentiles The n th percentile is a value such that n% of all observations fall at or below it! Quartiles Q1 is the value for which 25% of all observations fall at or below it Image: http://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture2.pdf Claudia Wagner 17

Boxplots IQR = Q 3 Q 1 Outliers are usually 3 IQR or more above the third quartile or 3 IQR or more below the first quartile. Image: http://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture2.pdf Claudia Wagner 18

Skewness Skewness quantifies how symmetrical a distribution is. A symmetrical distribution has a skewness of zero. Negative values for the skewness indicate data is skewed left. Positive values for the skewness indicate data is skewed right. Skewness < 0 Left skew skewness=0 Skewness > 0 Right skew Claudia Wagner 19

Kurtosis Kurtosis quantifies how peaky a distribution is compared to a normal distribution A normal distribution has a kurtosis of 0. A flatter distribution has a negative kurtosis, A distribution more peaked than a Normal distribution has a positive kurtosis. kurtosis<0 kurtosis=0 kurtosis>0 Claudia Wagner 22

Kurtosis Fourth central moment Nominator: weight up strong deviations from mean how long are the tails? Denominator: large variance decreases kurtosis Large kurtosis: low variance and long tails Claudia Wagner 23

Normal Distribution More peaky than normal distribution! Positive Kurtosis! Claudia Wagner 25

Normal Distribution Flatter than normal distribution! Negative Kurtosis! Claudia Wagner 26

Normal Distribution Right skewed! Positive skewness value! Claudia Wagner 27

Normal Distribution Left skewed! Negative skewness values! Claudia Wagner 28

WHAT IS A PROBABILITY? Claudia Wagner 29

Random Variables Random Variable Discrete Random Variable Can take on only a discrete set of values. You can count the values it can take on. Continuous Random Variable Can take on any value in an interval Claudia Wagner 30

Discrete Random Variable X X=(S,P) S is a finite set of values P: S [0,1], whereby å s P(s) =1 Probability Mass Function Example: S={2,3,4,5,6,7,8,9,10,11,12} What does this mean? 0,18 0,16 0,14 0,12 0,1 0,08 0,06 0,04 0,02 0 Two Dices 2 3 4 5 6 7 8 9 10 11 12 Claudia Wagner 31

Expected Value / Expectation Which value of the random variable do we expect? E.g., for a dice E(X) = 3.5 The expected value is the long range average Claudia Wagner 32

Continuous Random Variable X f(x) Probability Density Function (PDF)? Probability that someone is exactly 180cm is zero. Height x 180cm Better: P( x - 180cm < 0.1) P(179 < x < 181) Area under the curve Claudia Wagner 33

Expected Value / Expectation Which value of the random variable do we expect? E.g., height, length of snowboard, weight E(X) = 168cm Claudia Wagner 34

Example Equal number of black and red balls in an urn p = 1-p = 0.5 You win if you pick a red ball Bernoulli Distribution Probability Mass Function Claudia Wagner 35

Bernoulli Single experiment: draw one ball Repeat experiment 5 time. Observe: BBBBR What is the probability of observing this sequence? 0.5 * 0.5 * 0.5 * 0.5 * 0.5 = 0.031 Let s assume 30% of balls are red and 70% are black? 0.7 * 0.7 * 0.7 * 0.7 * 0.3 = 0.072 Claudia Wagner 36

Likelihood for a Bernoulli Claudia Wagner 37

Binomial Distribution What if drawing 5 balls becomes one experiment If we repeat the experiment what is the probability of observing k successes out of n? Success is e.g. picking a red ball PMF of a binomial distribution Number of ways to choose k elements from a set of n elements disregarding their order Claudia Wagner 38

Change Parameters p and n are the parameter of the binomial distribution Claudia Wagner 39

Discrete Random Variable X One Experiment: Toss 4 coins Coin shows either head H or tail T Number of all possible outcomes? 2 4 = 16 Claudia Wagner 40

Discrete Random Variable X 4 coin tosses: S={0,1,2,3} Number of all possible outcomes? 2 4 = 16 Number of outcomes that give 3 heads = 4!/(3!*1!) = 4 Probability of observing 3 heads: 4/16= 0.25 Number of ways to choose k elements from a set of n elements disregarding their order Claudia Wagner 41

Discrete Random Variable X One Experiment: Toss 4 coins Coin shows either head H or tail T Number of all possible outcomes? 2 4 = 16 Number of outcomes that give you 3 heads = 4 Claudia Wagner 42

Exploit the fact that you know that the PMF of the Binomial distribution: Probability of observing 3 heads when we toss a fair coin 4 times (no order): 4!/(3! 1!) 0.5 3 0.5 1 = 0.25 Claudia Wagner 43

What is the probability of observing 3 THREE when rolling 4 dices (no order)? 4 1/6 3 5/6 1 = 0.015 6 4 =1296 combinations if you roll 4 dices How many of them show 3 THREE? 4!/(3!*1!) * 5 = 20 20/1296 = 0.015 What is the probability of observing 3 THREE in a row when rolling a dice 4 times (order)? 2 (1/6) 3 (5/6) = 0.0077 Claudia Wagner 44

What is the probability of observing at least 5 heads when we flip a fair coin 6 times (no order)? Claudia Wagner 45

Discrete Random Variable X 6 times repeated coin toss: S={0,1,2,3,4,5,6} Number of all possible outcomes? 2 6 = 64 Number of outcomes that give you at least 5 heads = 6!/(6!*1!) + 6!/(6!*0!) = 7 Probability: 7/64 = 0.1094 6!/(5!*1!)*0.5 5 *0.5 + 6!/(6!*0!) *0.5 6 = 0.1094 Claudia Wagner 46

Probability Distribution Why do we care? Compute the probability of observations! How likely is an observation given certain parameters? Claudia Wagner 47

Example Source: https://west.uni-koblenz.de/en/studium/lehrveranstaltungen/ws1617/probabilistic-functionalprogramming Claudia Wagner 48

Why should we care? Most of the time we do not know the parameter of the true distribution that generated our sample data 1. But we can test hypothesis about the parameter If we observe 5 times head in 6 coin tosses what is the probability that the coin was fair? 2. And we can estimate the parameter from the observed sample data Inference! If we observe 5 times head in 6 coin tosses what was the parameter p of the coin? Claudia Wagner 49

HYPOTHESIS TESTING Claudia Wagner 50

Hypothesis Testing Example: my hypothesis is that the coin is unfair. We create a null-hypothesis which would falsify our hypothesis if it was true. Then we try to reject the null hypothesis (with a certain probability). Null-Hypothesis H 0 would be? H 0 : X ~ Binom(n, p=0.5) Alternative-Hypothesis H A would be? H A : X ~ Binom(n, p!=0.5) Claudia Wagner 51

Can we verify my hypothesis and if so how? We can never verify a hypothesis, we can only reject it! If we can reject H 0 we have more evidence that supports the assumption that H A can be true, but we do not know it. Which experiment could we conduct in order to test if we should reject H 0? Repeated coin toss experiments. How many heads do we observe? How many heads would we expect if H 0 would be true? Claudia Wagner 52

Remember The binomial distribution shows how likely it is that we observe X heads; Parameter: number of trials per experiment n=6 and fairness of coin p=0.5 0 1 2 3 4 5 6 1 6 15 20 15 6 1 0,015625 0,09375 0,234375 0,3125 0,234375 0,09375 0,015625 0,35 0,3 Binomial distribution Reference distribution: 0,25 0,2 0,15 0,1 0,05 0 0 1 2 3 4 5 6 We expect that the outcome of the experiment should look like this if p=0.5 and n=6 Binomial distribution Claudia Wagner 53

Hypothesis Testing: 3 Steps Compute a suitable test statistic t obs from the observed sample data and compare it to a reference distribution E.g. expected number of heads from our repeated experiments is 5 Reference distribution describes how your data looks like if the null hypothesis is true E.g. expected number of heads = 3 Find out if t obs lies in the critical regions of the reference distribution Claudia Wagner 54

Remember The binomial distribution shows how likely it is that we observe X heads; Parameter: number of trials n=6 and fairness of coin p=0.5 0 1 2 3 4 5 6 1 6 15 20 15 6 1 0,015625 0,09375 0,234375 0,3125 0,234375 0,09375 0,015625 0,35 0,3 Binomial distribution 0,25 0,2 0,15 0,1 Critical Region of the reference distribution Binomial distribution 0,05 0 0 1 2 3 4 5 6 Claudia Wagner 55