Section 0: Introduction and Review of Basic Concepts Carlos M. Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1
Getting Started Syllabus www.mccombs.utexas.edu/faculty/carlos.carvalho/teaching/ General Expectations 1. Read the notes / Practice 2. Be on schedule 2
Course Overview Section 0: Basic Concepts: Probability and Estimation Section 1: Simple Regression Model Section 2: Multiple Regression Section 3: Dummy Variables and Interactions Section 4: Regression Diagnostics and Transformations Section 5: Time Series Section 6: Model Selection, Logistic Regression and more... 3
Review of Basic Concepts Probability and statistics let us talk efficiently about things we are unsure about. if I only ask 1,000 voters out of 10 million, how sure can I be about how they all will vote? What is the true proportion of yes voters. if I am trying to predict sales next quarter, how sure am I? if I am trying to choose my portfolio, how sure am I about returns on the assets next period? if I want to do target marketing, which customers are more likely to respond to a promotion? All of these involve inferring or predicting unknown quantities!! 4
Random Variables Random Variables are numbers that we are NOT sure about but we might have some idea of how to describe its potential outcomes. We usually use a capital letter to denote a random variable. Example: Suppose we are about to toss two coins. Let X denote the number of heads. We say that X, is the random variable that stands for the number we are not sure about. Note that we always assign numbers to random variables! 5
Probability Distribution We describe the behavior of random variables with a Probability Distribution Example: If X is the random variable denoting the number of heads in two independent coin tosses, we can describe its behavior through the following probability distribution: 0 with prob. 0.25 X = 1 with prob. 0.5 2 with prob. 0.25 X is called a Discrete Random Variable as we are able to list all the possible outcomes Question: What is Pr(X = 0)? How about Pr(X 1)? 6
Probability Distribution Probability is always a positive number It can take values between 0 and 1. The total probability across all possible values of a random variable equals 1. 7
The Bernoulli Distribution Suppose the random variable X can only take the values 0 or 1. This is a dummy variable representing success of failure of an experiment... e.g.: X = 1 if I win a hand in blackjack... X = 0 if I lose X = 1 if David Ortiz gets a hit in an at bat... X = 0 otherwise X = 1 if the S&P500 return is positive... X = 0 if negative The Bernoulli is a distribution defined by the probability parameter p. We denote X Bernoulli(p) { 1 with prob. p X = 0 with prob. 1 p 8
Mean and Variance of a Random Variable Suppose someone asks you for a prediction of X. What would you say? Suppose someone asks you how sure you are. What would you say? 9
Mean and Variance of a Random Variable The Mean or Expected Value is defined as (for a discrete X ): E(X ) = n Pr(x i ) x i i=1 We weight each possible value by how likely they are... this provides us with a measure of centrality of the distribution... a good prediction for X! 10
Mean and Variance of a Random Variable Suppose X Bernoulli(p) n E(X ) = Pr(x i ) x i i=1 = 0 (1 p) + 1 p E(X ) = p 11
Mean and Variance of a Random Variable The Variance is defined as (for a discrete X ): Var(X ) = n Pr(x i ) [x i E(X )] 2 i=1 Weighted average of squared prediction errors... This is a measure of spread of a distribution. More risky distributions have larger variance. 12
Mean and Variance of a Random Variable Suppose X Bernoulli(p) n Var(X ) = Pr(x i ) [x i E(X )] 2 i=1 = (0 p) 2 (1 p) + (1 p) 2 p = p(1 p) [(1 p) + p] Var(X ) = p(1 p) Question: For which value of p is the variance the largest? 13
The Standard Deviation What are the units of E(X )? What are the units of Var(X )? A more intuitive way to understand the spread of a distribution is to look at the standard deviation: sd(x ) = Var(X ) What are the units of sd(x )? 14
Continuous Random Variables Suppose we are trying to predict tomorrow s return on the S&P500... Question: What is the random variable of interest? Question: How can we describe our uncertainty about tomorrow s outcome? Listing all possible values seems like a crazy task... we ll work with intervals instead. These are call continuous random variables. The probability of an interval is defined by the area under the probability density function. 15
The Normal Distribution A random variable is a number we are NOT sure about but we might have some idea of how to describe its potential outcomes. The Normal distribution is the most used probability distribution to describe a random variable The probability the number ends up in an interval is given by the area under the curve (pdf) standard normal pdf 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 16
The Normal Distribution The standard Normal distribution has mean 0 and has variance 1. Notation: If Z N(0, 1) (Z is the random variable) Pr( 1 < Z < 1) = 0.68 Pr( 1.96 < Z < 1.96) = 0.95 standard normal pdf 0.0 0.1 0.2 0.3 0.4 standard normal pdf 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 z 4 2 0 2 4 z 17
The Normal Distribution Note: For simplicity we will often use P( 2 < Z < 2) 0.95 Questions: What is Pr(Z < 2)? How about Pr(Z 2)? What is Pr(Z < 0)? 18
The Normal Distribution The standard normal is not that useful by itself. When we say the normal distribution, we really mean a family of distributions. We obtain pdfs in the normal family by shifting the bell curve around and spreading it out (or tightening it up). 19
The Normal Distribution We write X N(µ, σ 2 ). Normal distribution with mean µ and variance σ 2. The parameter µ determines where the curve is. The center of the curve is µ. The parameter σ determines how spread out the curve is. The area under the curve in the interval (µ 2σ, µ + 2σ) is 95%. Pr(µ 2 σ < X < µ + 2 σ) 0.95 µ 2σ µ σ µ µ + σ µ + 2σ 20
The Normal Distribution Example: Below are the pdfs of X 1 N(0, 1), X 2 N(3, 1), and X 3 N(0, 16). Which pdf goes with which X? 8 6 4 2 0 2 4 6 8 21
The Normal Distribution Example Assume the annual returns on the SP500 are normally distributed with mean 6% and standard deviation 15%. SP500 N(6, 225). (Notice: 15 2 = 225). Two questions: (i) What is the chance of losing money on a given year? (ii) What is the value that there s only a 2% chance of losing that or more? Lloyd Blankfein: I spend 98% of my time thinking about 2% probability events! (i) Pr(SP500 < 0) and (ii) Pr(SP500 <?) = 0.02 22
The Normal Distribution Example prob less than 0 prob is 2% 0.000 0.010 0.020 40 20 0 20 40 60 sp500 0.000 0.010 0.020 40 20 0 20 40 60 sp500 (i) Pr(SP500 < 0) = 0.35 and (ii) Pr(SP500 < 25) = 0.02 In Excel: NORMDIST and NORMINV (homework!) 23
The Normal Distribution 1. Note: In X N(µ, σ 2 ) µ is the mean and σ 2 is the variance. 2. Standardization: if X N(µ, σ 2 ) then Z = X µ σ N(0, 1) 3. Summary: X N(µ, σ 2 ): µ: where the curve is σ: how spread out the curve is 95% chance X µ ± 2σ. 24
The Normal Distribution Another Example Prior to the 1987 crash, monthly S&P500 returns (r) followed (approximately) a normal with mean 0.012 and standard deviation equal to 0.043. How extreme was the crash of -0.2176? The standardization helps us interpret these numbers... r N(0.012, 0.043 2 ) For the crash, z = r 0.012 0.043 N(0, 1) z = 0.2176 0.012 0.043 = 5.27 How extreme is this zvalue? 5 standard deviations away!! 25
Mean and Variance of a Random Variable Suppose X N(µ, σ 2 ). Suppose someone asks you for a prediction of X. What would you say? µ 2σ µ σ µ µ + σ µ + 2σ Suppose someone asks you how sure you are. What would you say? 26 x
Mean and Variance of a Random Variable For the normal family of distributions we can see that the parameter µ talks about where the distribution is located or centered. We often use µ as our best guess for a prediction. The parameter σ talks about how spread out the distribution is. This gives us and indication about how uncertain or how risky our prediction is. If X is any random variable, the mean will be a measure of the location of the distribution and the variance will be a measure of how spread out it is. 27
The Mean and Variance of a Normal The Mean and Variance of a Normal For continuous distributions, the above formulas for E(X ) and Var(X ) get a bit more complicated as we are adding an infinite number of possible outcomes... not to worry, the interpretation is still the same. if X N(µ, σ 2 ) then E(X ) = µ, Var(X ) = σ 2, sd(x ) = σ µ σ µ µ + σ 28 µ 2σ µ + 2σ
Two More (very important!) Formulas Let X and Y be two random variables: E(aX + by ) = ae(x ) + be(y ) Var(aX + by ) = a 2 Var(X ) + b 2 Var(Y ) + 2ab Cov(X, Y ) We will get back to this later... 29
Conditional, Joint and Marginal Distributions In general we want to use probability to address problems involving more than one variable at the time We need to be able to describe what we think will happen to one variable relative to another... we want to answer questions like: How are my sales impacted by the overall economy? 30
Conditional, Joint and Marginal Distributions Let E denote the performance of the economy next quarter... for simplicity, say E = 1 if the economy is expanding and E = 0 if the economy is contracting (what kind of random variable is this?) Let s assume E Bernoulli(0.7) Let S denote my sales next quarter... and let s suppose the following probability statements: S pr(s E = 1) S pr(s E = 0) 1 0.05 1 0.20 2 0.20 2 0.30 3 0.50 3 0.30 4 0.25 4 0.20 These are called Conditional Distributions 31
Conditional, Joint and Marginal Distributions S pr(s E = 1) S pr(s E = 0) 1 0.05 1 0.20 2 0.20 2 0.30 3 0.50 3 0.30 4 0.25 4 0.20 In blue is the conditional distribution of S given E = 1 In red is the conditional distribution of S given E = 0 We read: the probability of Sales of 4 (S = 4) given(or conditional on) the economy is growing (E = 1) is 0.25 32
Conditional, Joint and Marginal Distributions The conditional distributions tell us about about what can happen to S for a given value of E... but what about S and E jointly? pr(s = 4 and E = 1) = pr(e = 1) pr(s = 4 E = 1) = 0.70 0.25 = 0.175 In english, 70% of the times the economy grows and 1/4 of those times sales equals 4... 25% of 70% is 17.5% 33
Conditional, Joint and Marginal Distributions 34
Conditional, Joint and Marginal Distributions We call the probabilities of E and S together the joint distribution of E and S. In general the notation is... pr(y = y, X = x) is the joint probability of the random variable Y equal y END the random variable X equal x. pr(y = y X = x) is the conditional probability of the random variable Y takes the value y GIVEN that X equals x. pr(y = y) and pr(x = x) are the marginal probabilities of Y = y and X = x 35
Important relationships Relationship between the joint and conditional... pr(y, x) = pr(x) pr(y x) = pr(y) pr(x y) Relationship between joint and marginal... pr(x) = y pr(y) = x pr(x, y) pr(x, y) 36
Conditional, Joint and Marginal Distributions Why we call marginals marginals... the table represents the joint and at the margins, we get the marginals. 37
Conditional, Joint and Marginal Distributions Example... Given E = 1 what is the probability of S = 4? pr(s = 4, E = 1) pr(s = 4 E = 1) = = 0.175 pr(e = 1) 0.7 = 0.25 38
Conditional, Joint and Marginal Distributions Example... Given S = 4 what is the probability of E = 1? pr(s = 4, E = 1) pr(e = 1 S = 4) = = 0.175 pr(s = 4) 0.235 = 0.745 39
Bayes Theorem Disease testing example... Let D = 1 indicate you have a disease Let T = 1 indicate that you test positive for it If you take the test and the result is positive, you are really interested in the question: Given that you tested positive, what is the chance you have the disease? 40
Bayes Theorem 0.019 pr(d = 1 T = 1) = (0.019 + 0.0098) = 0.66 41
Bayes Theorem The computation of pr(x y) from pr(x) and pr(y x) is called Bayes theorem... pr(x y) = pr(y, x) pr(y) = pr(y, x) = pr(x)pr(y x) pr(y, x) x x pr(x)pr(y x) In the disease testing example: p(d = 1 T = 1) = p(t =1 D=1)p(D=1) p(t =1 D=1)p(D=1)+p(T =1 D=0)p(D=0) pr(d = 1 T = 1) = 0.019 (0.019+0.0098) = 0.66 42
Independence Two random variable X and Y are independent if pr(y = y X = x) = pr(y = y) for all possible x and y. In other words, knowing X tells you nothing about Y! e.g.,tossing a coin 2 times... what is the probability of getting H in the second toss given we saw a T in the first one? 43
Independence We can extend the notion of independence to any number of variables For example, Y 1 is independent of Y 2 and Y 3 if pr(y 1 = y 1 Y 2 = y 2, Y 3 = y 3 ) = pr(y 1 = y 1 ) 44
IID Suppose you are about to toss a coin n times Let Y i = 1 if the i th toss is a head and 0 otherwise... What is the pr(y 29 = 1)? What is the pr(y 29 = Y 27 = 0, Y 28 = 1)? What is pr(y 57 = 1)? 45
IID Each Y i Bernoulli(0.5) Each Y i is independent of all others... hence IID: independent and identically distributed 46
A First Modeling Exercise I have US$ 1,000 invested in the Brazilian stock index, the IBOVESPA. I need to predict tomorrow s value of my portfolio. I also want to know how risky my portfolio is, in particular, I want to know how likely am I to lose more than 3% of my money by the end of tomorrow s trading session. What should I do? 47
IBOVESPA - Data BOVESPA Density 0.00 0.05 0.10 0.15 0.20 0.25 0.30-4 -2 0 2 4 Daily Returns Daily Return -4-2 0 2 4 0 20 40 60 80 100 Date 48
As a first modeling decision, let s call the random variable associated with daily returns on the IBOVESPA X and assume that returns are independent and identically distributed as X N(µ, σ 2 ) Question: What are the values of µ and σ 2? We need to estimate these values from the sample in hands (n=113 observations)... 49
Let s assume that each observation in the random sample {x 1, x 2, x 3,..., x n } is independent and distributed according to the model above, i.e., x i N(µ, σ 2 ) An usual strategy is to estimate µ and σ 2, the mean and the variance of the distribution, via the sample mean ( X ) and the sample variance (s 2 )... (their sample counterparts) X = 1 n n i=1 x i s 2 = 1 n 1 n ( xi X ) 2 i=1 50
For the IBOVESPA data in hands, BOVESPA Density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 X = 0.04 and s 2 = 2.19-4 -2 0 2 4 Daily Returns The red line represents our model, i.e., the normal distribution with mean and variance given by the estimated quantities X and s 2. What is Pr(X < 3)? 51
Models, Parameters, Estimates... In general we talk about unknown quantities using the language of probability... and the following steps: Define the random variables of interest Define a model (or probability distribution) that describes the behavior of the RV of interest Based on the data available, we estimate the parameters defining the model We are now ready to describe possible scenarios, generate predictions, make decisions, evaluate risk, etc... 52
Oracle vs SAP Example (understanding variation) 53
Oracle vs. SAP Do we buy the claim from this add? We have a dataset of 81 firms that use SAP... The industry ROE is 15% (also an estimate but let s assume it is true) We assume that the random variable X represents ROE of SAP firms and can be described by X N(µ, σ 2 ) X s 2 SAP firms 0.1263 0.065 Well, 0.12 0.15 0.8! I guess the add is correct, right? Not so fast... 54
Oracle vs. SAP Let s assume that the ROE of firms using SAP is, on average, the same as the industry. Assume further that s 2 is a good estimate of the variance... ROE N(0.15, 0.065) In a sample of 81 firms, how often can we expect the sample mean to be below 0.15? What does this mean if I trying to compare the profitability of firms using SAP versus the industry? 55
Oracle vs. SAP Let s do a little simulation... Generate 1000 different samples of size 81 from a N(0.15, 0.065). Plot the histogram of X... Now, what do you think about the add? Histogram of sample mean Density 0 2 4 6 8 10 12 14 0.05 0.10 0.15 0.20 0.25 56
Sampling Distribution of Sample Mean Consider the mean for an iid sample of n observations of a random variable {X 1,..., X n } Suppose that E(X i ) = µ and var(x i ) = σ 2 E( X ) = 1 n E(Xi ) = µ var( X ) = var ( 1 ) n Xi = 1 n var 2 (Yi ) = σ2 n ( If X is normal, then X N µ, σ2 n This is called the sampling distribution of the mean... ). 57
Sampling Distribution of Sample Mean The sampling distribution of X describes how our estimate would vary over different datasets of the same size n It provides us with a vehicle to evaluate the uncertainty associated with our estimate of the mean... It turns out that s 2 is a good proxy for σ 2 so that we can approximate the sampling distribution by We call s 2 n X N ) (µ, s2 n the standard error of X... it is a measure of its variability... I like the notation s X = s 2 n 58
Sampling Distribution of Sample Mean X N ( µ, s 2 X ) X is unbiased... E( X ) = µ. On average, X is right! X is consistent... as n grows, s 2 X 0, i.e., with more information, eventually X correctly estimates µ! 59
Back to the Oracle vs. SAP example Our simulation was done assuming that µ = 0.15... in that case X N ( 0.15, 0.065 ) 81 Histogram of sample mean Density 0 2 4 6 8 10 12 14 Sampling Distribution 0.05 0.10 0.15 0.20 0.25 60
Confidence Intervals X N ( µ, s 2 X ) so... ( X µ) N ( 0, s 2 X ) right? What is a good prediction for µ? What is our best guess?? X How do we make mistakes? How far from µ can we be?? 95% of the time ±2 s X [ X ±2 s X ] gives a 95% range of plausible values for µ... this is called the 95% Confidence Interval for µ. 61
Oracle vs. SAP example... one more time In this example, X = 0.1263, s 2 = 0.065 and n = 81... therefore, s 2 X = 0.065 81 so, the 95% confidence interval for the ROE of SAP firms is [ X 2 s X ; X + 2 s X ] = [ 0.1263 2 0.065 81 ; 0.1263 + 2 = [0.069; 0.183] ] 0.065 81 Is 0.15 a plausible value? What does that mean? 62
y Estimating Proportions... another modeling example Your job is to manufacture a part. Each time you make a part, it is defective or not. Below we have the results from 100 parts you just made. Y i = 1 means a defect, 0 a good one. How would you predict the next one? 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 Index Would you model these Y as iid Bernoulli draws for some p? There are 18 ones and 82 zeros. 63
In this case, it might be reasonable to model the defects as iid Bernoulli(.18). We can t be sure this is right, but, the data looks like the kind of thing we would get if we had iid draws with that p!!! If we believe our model, what is the chance that the next 10 are good?.82 10 = 0.137. 64
We used the proportion of defects in our sample to estimate p, the true, long-run, proportion of defects. Could this estimate be wrong?!! Let ˆp denote the sample proportion. The standard error associated with the sample proportion as an estimate of the true proportion is: sˆp = ˆp (1 ˆp) n 65
Suppose we have iid Bernoulli data and estimate the true p by the observed sample proportion of 1 s, ˆp. The (approximate) 95% confidence interval for the true proportion is: ˆp ± 2 sˆp. 66
Defects: In our defect example we had ˆp =.18 and n = 100. This gives sˆp = (.18) (.82) 100 =.04. The confidence interval is.18 ±.08 = (.1,.26), big!!!!. 67
Polls: yet another example... If we take a relatively small random sample from a large population and ask each respondent yes or no with yes Y i = 1 and no Y i = 0, then, approximately. Y i Bernoulli(p) where p is the true population proportion of yes. Suppose, as is common, n = 1000, and ˆp.5. Then, sˆp = (.5) (.5) 1000 =.0158. The standard error is.0158 so that the ± is.0316, or about ± 3%. 68