Introduction to Statistics I - PDF Free Download

Introduction to Statistics I Keio University, Faculty of Economics Continuous random variables Simon Clinet (Keio University) Intro to Stats November 1, 2018 1 / 18

Definition (Continuous random variable) A random variable is continuous if it can take any value in an interval [a, b]. (We can have a = and/or b = +.) Example The following random variables are continuous: Let X be the random time (in hours) someone spends watching TV everyday. The possible values for X are all the numbers in [0, 24]. Let X be the price of a stock in $ on a random day. The possible values for X are all the numbers in [0, + [. Simon Clinet (Keio University) Intro to Stats November 1, 2018 2 / 18

Probability distribution for a continuous random variable Unlike discrete random variables, continuous ones can take any value in an interval [a, b]. Therefore, it does not make sense to talk about P(X = c) for each possible c [a, b]. Definition (probability distribution for a continuous random variable) A probability distribution for a continuous random variable X lists all the probabilities P(x 1 X x 2 ) for all possible values x 1, x 2 such that a x 1 < x 2 b. We require that: Conditions on the probability distribution 0 P(x 1 X x 2 ) 1. P(a X b) = 1. Simon Clinet (Keio University) Intro to Stats November 1, 2018 3 / 18

Probability distribution for a continuous random variable Remark For a continuous random variable X, the probabilities P(x 1 X x 2 ), P(x 1 < X x 2 ), P(x 1 < X < x 2 ), P(x 1 X < x 2 ) are all equal. In other words, the symbols < and can be used indifferently. Simon Clinet (Keio University) Intro to Stats November 1, 2018 4 / 18

Density function, density curve Definition (Density curve, density function) We associate to a random variable X a density function f, such that the probabilities P(x 1 X x 2) are equal to the area under the curve x f (x) between x = x 1 and x = x 2. Simon Clinet (Keio University) Intro to Stats November 1, 2018 5 / 18

Density function, density curve We thus require that Conditions on the density function f (x) 0. The area under x f (x) between x = a and x = b is 1. Interpretation of the density We can interpret the density function as follows. X is likely to take a value where the curve is high. If we repeat the experiment a large number of times, and collect the measured values X 1, X 2,..., then the histogram of those values will look like the density curve. Simon Clinet (Keio University) Intro to Stats November 1, 2018 6 / 18

Interpretation of the density Example Let X be the random variable corresponding to the queuing time in minutes in a fast food restaurant. Statistical research has proven that X is distributed according to the red curve below. We repeat the experiment 1, 000 times and get the following histogram: Simon Clinet (Keio University) Intro to Stats November 1, 2018 7 / 18

Expected value and Variance Definition (Expected value and variance) If X is a continuous random variable, we associate to it an expected (or mean) value E[X ] and a variance Var[X ] which play the same role as for discrete random variables. They are also often denoted by µ and σ 2. Moreover, we also call σ the standard deviation of X. Remark We do not give definitions of the mean and the variance because they require more advanced mathematics. However they can be interpreted exactly in the same way as we did for discrete random variables. The mean E[X ] is the center of symmetry of the density curve. The more dispersed the density curve, the higher the variance Var[X ]. Simon Clinet (Keio University) Intro to Stats November 1, 2018 8 / 18

Example We consider three random variables with three distinct density curves (blue, green, red). They all are such that µ = 0. Blue : Var[X ] = 1, Green : Var[X ] = 3, Red : Var[X ] = 10. Simon Clinet (Keio University) Intro to Stats November 1, 2018 9 / 18

Fundamental example: The standard normal distribution Definition (Standard normal distribution) A continuous random variable X which is described by the following symmetric bell-shaped (see below) density curve is a standard normal random variable. We also say that it follows a standard normal distribution, and we write X N (0, 1). In particular, E[X ] = 0, and Var[X ] = 1, and X can take any value between and +. Simon Clinet (Keio University) Intro to Stats November 1, 2018 10 / 18

Standard normal variable Remarks The standard normal distribution is the most important one in statistics, because it approximates very well the distributions of many variables in practice. For example, the heights or weights of people, the total annual sales of a firm, exam scores of a given population are often approximately normally distributed. There are other symmetric bell-shaped curves which are not the normal distribution. In fact, the standard normal distribution can be defined more precisely by specifying the density with an equation like f (x) =..., but, again, this would require more advanced mathematics. Simon Clinet (Keio University) Intro to Stats November 1, 2018 11 / 18

general Normal distribution Definition (Normal distribution) For a number µ and a positive number σ 2, we say that a random variable X is normally distributed with parameters µ and σ 2, if (X µ)/σ N (0, 1). We write X N (µ, σ 2 ), and we have E[X ] = µ, Var[X ] = σ 2. Simon Clinet (Keio University) Intro to Stats November 1, 2018 12 / 18

Distribution table In practice, for a continuous random variable X, we use a distribution table to calculate the probabilities P(x 1 X x 2 ). Example: distribution table for the standard normal distribution A distribution table for N (0, 1) reports all the probabilities of the form P(X x 1 ) for x 1 0. By symmetry of the normal curve, this is sufficient to deduce any probability related to X (see the exercise on the next slide). Simon Clinet (Keio University) Intro to Stats November 1, 2018 13 / 18

Exercise Exercise Calculate for X N (0, 1) : P(X 1.02) P(X 2.36) = 1 P(X 2.36) P(0 X 0.9) = P(X 0.9) P(X < 0). P( 1.36 X ) = P(X 1.36) (by symmetry of the curve). P( 1.36 X 0) P( 1.36 X 0.03) Answers : 0.8461, 0.0091, 0.3159, 0.9131, 0.4131, 0.4251. Simon Clinet (Keio University) Intro to Stats November 1, 2018 14 / 18

Normal distribution table - Example When X N (µ, σ 2 ), we use the fact that the transformed variable Z = X µ σ N (0, 1). Example Assume that X N (1, 4). Let s calculate P(X 2). With Z defined as above, this is the same probability as P(Z (2 1)/ 4) = P(Z 0.5). From the distribution table, we get that it is approximately equal to 0.69. Exercise We know that a certain stock s dividend yield has a mean of µ = 6% and a standard deviation of σ = 2%. Assume the dividends follow a normal distribution. Compute the probability that the dividend yield will be: Less than 2 % Greater than 10 % Between 4 % and 8 % Answers : 0.0228, 0.0228, 0.6826. Simon Clinet (Keio University) Intro to Stats November 1, 2018 15 / 18

Quantile of the standard normal distribution Definition (Quantile of the standard normal distribution) We call quantile of the standard normal distribution of level a the number z a such that P(Z z a ) = a, where Z N (0, 1). Simon Clinet (Keio University) Intro to Stats November 1, 2018 16 / 18

Calculating quantiles in practice In practice, we also use the distribution table to calculate quantiles. Example Let us calculate z 0.75. In the distribution table, we look for the value such that P(Z value) 0.75. We find that P(Z 0.68) = 0.7517 which is the closest probability to 0.75 and so z 0.75 0.68. Exercise If we pick a student at random in a class, we assume that her grade X (a score between 0 and 100) approximately follows a N (µ, σ 2 ) with µ = 73 and σ 2 = 225. 1 What is the distribution of the variable Z = X 73 15? 2 Find the score x 10% such that P(X x 10 ) = 10%. What is the proportion of students who got less than this score? Simon Clinet (Keio University) Intro to Stats November 1, 2018 17 / 18

Conclusion summary A continuous random variable can take any value in a given interval. We associate to the variable a density function such that the probability to get a number between to values x 1 and x 2 is the area under the curve of the density function between those two points. We also associate an expected value, a variance and a standard deviation which play the same role as for discrete random variables. A fundamental continuous random variable is the standard normal variable, whose density is bell-shaped and symmetric. We can use a distribution table to calculate probabilities and quantiles. Simon Clinet (Keio University) Intro to Stats November 1, 2018 18 / 18