Lecture Data Science

Size: px

Start display at page:

Download "Lecture Data Science"

Ambrose Cain
5 years ago
Views:

1 Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics Foundations JProf. Dr. Claudia Wagner

2 Learning Goals How to describe sample data? What is mode/median/mean? What is variance/std? What is kurtosis/skewness? What types of data exist? Which statistic can be computed on nominal data? What is a probability? Which distribution describes the success probability of an experiment? Which distribution describes the success probability of multiple experiments? Claudia Wagner 2

3 Statistics Aim: learn sth about a population by analyzing sample data Population Probability Sample Descriptive Statistics Inferential Statistics Claudia Wagner 3

Types of Data (Statistician Viewpoint) Ratio (e.g., weight) Absolute zero Interval (e.g., temperature in Celsius) Distance is meaningful Ordinal (e.g., status) Observations can be ordered Nominal (e.

4 Types of Data (Statistician Viewpoint) Ratio (e.g., weight) Absolute zero Interval (e.g., temperature in Celsius) Distance is meaningful Ordinal (e.g., status) Observations can be ordered Nominal (e.g., ethnic group, sex, nationality) Observations are only named Stevens, S. S. (1946). "On the Theory of Scales of Measurement". Science 103 (2684): Claudia Wagner 4

5 Why shall we care? Claudia Wagner 5 5

6 Frequencies Absolute frequency h k Relative frequency (Proportion) f k Cumulative frequency c t Observations: Y Y N Y Y N N Order needs to be meaningful Claudia Wagner 6

7 Y Y N Y Y N N Claudia Wagner 7

8 Mode Applies to nominals already! Can be used for all types of data. The mode is the value that appears most often in a set of data. What is the mode of X = [17, 19, 20, 21, 22, 23, 23, 23, 23] Claudia Wagner 8

9 Median X = [17, 19, 20, 21, 22, 23, 23, 23, 23, 25] Median of X is 22.5 X = [17, 19, 20, 21, 22, 23, 23, 23, 23] Median of X is 22 Median is useful for skewed distribution where mean is meaningless Applies to ordinals, intervals and ratios Claudia Wagner 9

10 Mean (expected value) Applies to interval scales and ratios: Example: X = [17, 19, 20, 21, 22, 23, 23, 23, 23, 25] Claudia Wagner 10

11 Mode, median, mean two log-normal distributions; Claudia Wagner 11

Sample of Dogs Range R = x max x min Range = 600-170 = 430 Average height of a dog (measured by

12 Sample of Dogs Range R = x max x min Range = = 430 Average height of a dog (measured by shoulders) is 394mm src: Claudia Wagner 12

13 Dispersion Variance= src: Claudia Wagner 13

14 Standard Deviation Standard Deviation is just the square root of variance We can now show what lies within 1 std away from the mean. This helps us to assess what is normal, what is extra large or extra small? src: Claudia Wagner 15

15 Standard Deviation Standard Deviation does not measure how far typical values tend to be from the mean How could we compute that? src: Claudia Wagner 16

16 Percentiles The n th percentile is a value such that n% of all observations fall at or below it! Quartiles Q1 is the value for which 25% of all observations fall at or below it Image: Claudia Wagner 17

17 Boxplots IQR = Q 3 Q 1 Outliers are usually 3 IQR or more above the third quartile or 3 IQR or more below the first quartile. Image: Claudia Wagner 18

18 Skewness Skewness quantifies how symmetrical a distribution is. A symmetrical distribution has a skewness of zero. Negative values for the skewness indicate data is skewed left. Positive values for the skewness indicate data is skewed right. Skewness < 0 Left skew skewness=0 Skewness > 0 Right skew Claudia Wagner 19

19 Kurtosis Kurtosis quantifies how peaky a distribution is compared to a normal distribution A normal distribution has a kurtosis of 0. A flatter distribution has a negative kurtosis, A distribution more peaked than a Normal distribution has a positive kurtosis. kurtosis<0 kurtosis=0 kurtosis>0 Claudia Wagner 22

20 Kurtosis Fourth central moment Nominator: weight up strong deviations from mean how long are the tails? Denominator: large variance decreases kurtosis Large kurtosis: low variance and long tails Claudia Wagner 23

21 Normal Distribution More peaky than normal distribution! Positive Kurtosis! Claudia Wagner 25

22 Normal Distribution Flatter than normal distribution! Negative Kurtosis! Claudia Wagner 26

23 Normal Distribution Right skewed! Positive skewness value! Claudia Wagner 27

24 Normal Distribution Left skewed! Negative skewness values! Claudia Wagner 28

25 WHAT IS A PROBABILITY? Claudia Wagner 29

26 Random Variables Random Variable Discrete Random Variable Can take on only a discrete set of values. You can count the values it can take on. Continuous Random Variable Can take on any value in an interval Claudia Wagner 30

27 Discrete Random Variable X X=(S,P) S is a finite set of values P: S [0,1], whereby å s P(s) =1 Probability Mass Function Example: S={2,3,4,5,6,7,8,9,10,11,12} What does this mean? 0,18 0,16 0,14 0,12 0,1 0,08 0,06 0,04 0,02 0 Two Dices Claudia Wagner 31

28 Expected Value / Expectation Which value of the random variable do we expect? E.g., for a dice E(X) = 3.5 The expected value is the long range average Claudia Wagner 32

29 Continuous Random Variable X f(x) Probability Density Function (PDF)? Probability that someone is exactly 180cm is zero. Height x 180cm Better: P( x - 180cm < 0.1) P(179 < x < 181) Area under the curve Claudia Wagner 33

30 Expected Value / Expectation Which value of the random variable do we expect? E.g., height, length of snowboard, weight E(X) = 168cm Claudia Wagner 34

31 Example Equal number of black and red balls in an urn p = 1-p = 0.5 You win if you pick a red ball Bernoulli Distribution Probability Mass Function Claudia Wagner 35

32 Bernoulli Single experiment: draw one ball Repeat experiment 5 time. Observe: BBBBR What is the probability of observing this sequence? 0.5 * 0.5 * 0.5 * 0.5 * 0.5 = Let s assume 30% of balls are red and 70% are black? 0.7 * 0.7 * 0.7 * 0.7 * 0.3 = Claudia Wagner 36

33 Likelihood for a Bernoulli Claudia Wagner 37

34 Binomial Distribution What if drawing 5 balls becomes one experiment If we repeat the experiment what is the probability of observing k successes out of n? Success is e.g. picking a red ball PMF of a binomial distribution Number of ways to choose k elements from a set of n elements disregarding their order Claudia Wagner 38

35 Change Parameters p and n are the parameter of the binomial distribution Claudia Wagner 39

36 Discrete Random Variable X One Experiment: Toss 4 coins Coin shows either head H or tail T Number of all possible outcomes? 2 4 = 16 Claudia Wagner 40

37 Discrete Random Variable X 4 coin tosses: S={0,1,2,3} Number of all possible outcomes? 2 4 = 16 Number of outcomes that give 3 heads = 4!/(3!*1!) = 4 Probability of observing 3 heads: 4/16= 0.25 Number of ways to choose k elements from a set of n elements disregarding their order Claudia Wagner 41

38 Discrete Random Variable X One Experiment: Toss 4 coins Coin shows either head H or tail T Number of all possible outcomes? 2 4 = 16 Number of outcomes that give you 3 heads = 4 Claudia Wagner 42

39 Exploit the fact that you know that the PMF of the Binomial distribution: Probability of observing 3 heads when we toss a fair coin 4 times (no order): 4!/(3! 1!) = 0.25 Claudia Wagner 43

40 What is the probability of observing 3 THREE when rolling 4 dices (no order)? 4 1/6 3 5/6 1 = =1296 combinations if you roll 4 dices How many of them show 3 THREE? 4!/(3!*1!) * 5 = 20 20/1296 = What is the probability of observing 3 THREE in a row when rolling a dice 4 times (order)? 2 (1/6) 3 (5/6) = Claudia Wagner 44

41 What is the probability of observing at least 5 heads when we flip a fair coin 6 times (no order)? Claudia Wagner 45

42 Discrete Random Variable X 6 times repeated coin toss: S={0,1,2,3,4,5,6} Number of all possible outcomes? 2 6 = 64 Number of outcomes that give you at least 5 heads = 6!/(6!*1!) + 6!/(6!*0!) = 7 Probability: 7/64 = !/(5!*1!)*0.5 5 * !/(6!*0!) *0.5 6 = Claudia Wagner 46

43 Probability Distribution Why do we care? Compute the probability of observations! How likely is an observation given certain parameters? Claudia Wagner 47

44 Example Source: Claudia Wagner 48

45 Why should we care? Most of the time we do not know the parameter of the true distribution that generated our sample data 1. But we can test hypothesis about the parameter If we observe 5 times head in 6 coin tosses what is the probability that the coin was fair? 2. And we can estimate the parameter from the observed sample data Inference! If we observe 5 times head in 6 coin tosses what was the parameter p of the coin? Claudia Wagner 49

46 HYPOTHESIS TESTING Claudia Wagner 50

47 Hypothesis Testing Example: my hypothesis is that the coin is unfair. We create a null-hypothesis which would falsify our hypothesis if it was true. Then we try to reject the null hypothesis (with a certain probability). Null-Hypothesis H 0 would be? H 0 : X ~ Binom(n, p=0.5) Alternative-Hypothesis H A would be? H A : X ~ Binom(n, p!=0.5) Claudia Wagner 51

48 Can we verify my hypothesis and if so how? We can never verify a hypothesis, we can only reject it! If we can reject H 0 we have more evidence that supports the assumption that H A can be true, but we do not know it. Which experiment could we conduct in order to test if we should reject H 0? Repeated coin toss experiments. How many heads do we observe? How many heads would we expect if H 0 would be true? Claudia Wagner 52

49 Remember The binomial distribution shows how likely it is that we observe X heads; Parameter: number of trials per experiment n=6 and fairness of coin p= , , , ,3125 0, , , ,35 0,3 Binomial distribution Reference distribution: 0,25 0,2 0,15 0,1 0, We expect that the outcome of the experiment should look like this if p=0.5 and n=6 Binomial distribution Claudia Wagner 53

50 Hypothesis Testing: 3 Steps Compute a suitable test statistic t obs from the observed sample data and compare it to a reference distribution E.g. expected number of heads from our repeated experiments is 5 Reference distribution describes how your data looks like if the null hypothesis is true E.g. expected number of heads = 3 Find out if t obs lies in the critical regions of the reference distribution Claudia Wagner 54

51 Remember The binomial distribution shows how likely it is that we observe X heads; Parameter: number of trials n=6 and fairness of coin p= , , , ,3125 0, , , ,35 0,3 Binomial distribution 0,25 0,2 0,15 0,1 Critical Region of the reference distribution Binomial distribution 0, Claudia Wagner 55

52 Learning Goals How to describe sample data? What is mode/median/mean? What is variance/std? What is kurtosis/skewness? What types of data exist? Which statistic can be computed on nominal data? What is a probability? Which distribution describes the success probability of an experiment? Which distribution describes the success probability of multiple experiments? Claudia Wagner 56

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics and Probabilities JProf. Dr. Claudia Wagner Data Science Open Position @GESIS Student Assistant Job in Data