ECE 295: Lecture 03 Estimation and Confidence Interval Spring 2018 Prof Stanley Chan School of Electrical and Computer Engineering Purdue University 1 / 23
Theme of this Lecture What is Estimation? You give me a set of data points I make a guess of the parameters E.g., Mean, Variance, etc What is Confidence Interval? You estimate the mean How good is your estimation? Accurate with large variance good 2 / 23
Mean and Variance Two Parameters of Gaussian Mean: µ Where is the center of the Gaussian? Variance: σ 2 How wide is the Gaussian? Standard Deviation σ is the the square root of variance. Question: When σ decreases, why does the Gaussian become taller? 3 / 23
Expectation and Variance Definition (Expectation) The expectation of a random variable X is E[X ] = x x p X (x), or E[X ] = xp X (x)dx. Definition (Variance) The variance of a random variable X is Var[X ] = (x µ) 2 p X (x), or E[X ] = x (x µ) 2 p X (x)dx. Usually denote E[X ] = µ, Var[X ] = σ 2. 4 / 23
Sample Mean and Sample Variance Given data points X 1,..., X N, what to estimate the mean and variance? X = 1 N S 2 = 1 N N i=1 X i N (X i X ) 2. i=1 5 / 23
True Mean and Sample Mean True Mean E[X ] A statistical property of a random variable. A deterministic number. Often unknown, or is the center question of estimation. You have to know X in order to find E[X ]; Top down. Sample Mean X A numerical value. Calculated from data. Itself is a random variable. It has uncertainty. Uncertainty reduces as more samples are used. We use sample mean to estimate the true mean. You do not need to know X in order to find X ; Bottom up. 6 / 23
Distribution of X X is the sample mean of one experiment. X has a distribution! (If you repeat N experiments.) 7 / 23
Distribution of X What is the distribution of X? Gaussian!!! (Thanks to something called the Central Limit Theorem.) Why Gaussian? Second order approximation of the Moment Generating Function M X (s) = E[e sx ]. See ECE 302 Lecture 25. 8 / 23
Influence of N Assume X 1,..., X N are independent random variables with identical distributions. And E[X i ] = µ, Var[X i ] = σ 2. [ ] 1 N E[X ] = E X i = 1 N E[X i ] = 1 N µ = µ N N N i=1 i=1 i=1 [ ] 1 N Var[X ] = Var X i = 1 N N N 2 Var[X i ] = 1 N N 2 i=1 i=1 i=1 σ 2 = σ2 N. 9 / 23
Outlier Tool 1: Likelihood Assume we have a Gaussian. Call it N (µ, σ 2 ). You have a data point X = x j. What is the probability that X = x j will show up for this Gaussian? The probability is called the likelihood: p(x j ) = { 1 exp (x j µ) 2 2πσ 2 2σ 2 } def = N (x j µ, σ 2 ). 10 / 23
Outlier Tool 1: Likelihood Here is a way to determine an outlier Start with your distribution, say N (µ, σ 2 ). Find the likelihood of your data point X. If the likelihood is extremely small, then X is an outlier. How small? You set the tolerance level, maybe 0.05. 11 / 23
Outlier Tool 2: p-value p-value is an alternative tool. Instead of comparing the likelihood, we check how far X is from the center. far, close in terms of σ If X is 3σ away, then very unlikely. Typically we set a tolerance level for the tail area α. The corresponding distance is called the p-value. 12 / 23
Outlier Tool 2: p-value Standardized Gaussian Before we have computers, calculating the likelihood is hard. One easy solution: Shift N (µ, σ 2 ) to N (0, 1). Can build a look-up table for N (0, 1). The process of turning N (µ, σ 2 ) to N (0, 1) is called standardization. Quite useful: Instead of checking 3σ, just check 3. Also useful for theoretical analysis Standardization: Given X N (µ, σ 2 ), the standardized Gaussian is: Z = X µ σ We can show that Z N (0, 1). 13 / 23
Outlier Tool 2: p-value Example: You have a dataset µ = 5, σ = 1; check data point x j = 2.2. z j = x j µ σ = 2.8. Set tolerance level α = 0.01 on one tail. Is x j outlier? α = 0.01 is equivalent to z α = 2.32. Since z j < z α, x j is an outlier. 14 / 23
Compare Two Mean You have two classes of data: Class 1 and Class 0. For each class you have (µ 1, σ 1, n 1 ), (µ 0, σ 0, n 0 ). Does class 1 has a significantly different mean than class 0? Approach: Pick α and hence z α Compute z = µ 1 µ 0 σ or z = µ 0 µ 1 σ σ 2 = σ2 0 n 0 + σ2 1 n 1 Check whether z > z α or z < z α 15 / 23
Confidence Interval: So What? Why care about confidence interval? From data, you tell me X. I ask you, how good is X? The quantification of X is the confidence interval Bottom Line: Whenever you report an estimate X, you also need to report the confidence interval. Otherwise, your X is meaningless. 16 / 23
Confidence Interval How good X is? Set α, and then find z α. Then we say that X has a confidence interval [ ] σ σ X z α N, X + z α N Two factors: N and σ. (z α is user defined.) 17 / 23
Bootstrap Illustrated A technique to estimate confidence interval for small datasets. Your dataset has very few data points. You can estimate σ; but will not be accurate. Key idea: Start with a set Ω = {X 1,..., X N }. Sample with replacement N points from Ω. Example: Ω = {4.2, 4.8, 4.7, 4.5, 4.9}, then Ω 1 = {4.2, 4.8, 4.8, 4.7, 4.8} X 1. Ω T = {4.5, 4.9, 4.2, 4.2, 4.7} X T The bootstrapped standard deviation is σ b 2 = 1 T (X t X ) 2. T where X = 1 N t X t. t=1 18 / 23
How good is Bootstrap? Example. Ideal distribution F : N (0, 1). Let s draw X 1,..., X m. m = 10, 000. Sample empirical distribution F, composed of Ω = X 1,..., X n, n = 50. The true values: µtrue = 0, σtrue = 1. True confidence interval: µtrue ± z σ α true n = 0 ± 0.1414z α. The estimated values: µest = 0.0416, σest = 1.0203. (one possible pair) Estimated confidence interval: µest ± z σ α est n = 0 ± 0.1443z α The bootstrap values: µ boot = 0.0401, σ boot = 0.1434. Bootstrap confidence interval: µ boot ± z α σ boot = 0 ± 0.1434z α σ boot has 1/ n embedded 19 / 23
Power of Bootstrap Wait a minute... You don t need bootstrap for sample mean There is a formula for sample mean s confidence interval X ± z α σ est n But in reality... You are not just interested in estimating the sample mean You may want to estimate the median or mode or high order moments or any functional mapping θ = g(x 1,..., X n ) Then the confidence interval is no longer X ± z α σ est n 20 / 23
Bootstrap for Median Start with a set Ω = {X 1,..., X N }. Sample with replacement N points from Ω. Example: Ω = {4.2, 4.8, 4.7, 4.5, 4.9}, then Ω 1 = {4.2, 4.8, 4.8, 4.7, 4.8} M 1 def = median(ω) 1. Ω T = {4.5, 4.9, 4.2, 4.2, 4.7} M T def = median(ω) T The bootstrapped standard deviation is where M = 1 N t M t. σ 2 b = 1 T T (M t M) 2. t=1 21 / 23
Principle behind Bootstrap Typically: σtrue σest (not always small, depending on n) σest σ boot (usually very small) 22 / 23
Additional Readings B. Efron, Bootstrap Methods: Another Look at the Jackknife, Annals of Statistics, vol. 7, no. 1, pp.1-26, 1979. L. Wasserman, All of Statistics, Springer. J. Friedman, R. Tibshirani, and T. Hastie, Elements of Statistical Learning, Springer. 23 / 23