Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Review for the previous lecture Definition: Several continuous distributions, including uniform, gamma, normal, Beta, Cauchy, double exponential Examples: mean, variance, mgf, relationship Examples: Applications of these distributions Chapter 3 Common Families of Distributions 34 Exponential Families Definition 34: A family of pmfs or pdfs is called exponential family if it can be expressed as (34) f ( x θ) = h( x) c( θ)exp( k wi( θ) ti( x)), where hx ( ) 0 and t ( x),, tk ( x) are real-valued functions of the observation x they cannot depend on θ, and c( θ ) and w ( ),, ( ) θ w k θ are real-valued functions of the possibly vector-valued parameter θ they cannot depend on x i= Important Notes: To verify that a family of pdfs or pmfs is an exponential family, Identify the functions hx ( ), c( θ ), t( x), and w( x ) and check that they satisfy the conditions Show that the family has the form of (34) i i
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Example 34: Several examples for exponential families - Binomial, Poisson, Exponential, normal Solution: n x n x n n p x n n p n () f ( x n, p) = p ( p) ( p) ( ) ( p) exp( xlog( )) x = = x p x, then hx ( ) = p x, c( p) = ( p) n, tx ( ) = x, and wp ( ) = log( p/( p)) Note: 0 < p <, and f ( x p ) is different for p = 0, 0< p <, and p = The above formula must matches all x Therefore, f ( x n, p ) is an exponential family only if 0 < p < λ λ xe λ () f ( x λ) = = e exp( λlog( x)), then hx ( ) = / x!, c( λ) = e λ, tx ( ) = log( x), and w( λ) = λ x! x! x (3) f ( x β ) = exp( ), x 0 β β >, then hx ( ) =, c( β ) = / β,() tx= x,and w( β ) = β ( x μ) x xμ μ (4) f ( x μσ, ) = exp( ) = exp( + ), then hx ( ) =, πσ σ πσ σ σ σ μ c( μσ, ) = exp( ) πσ σ, t ( x) = x /, w ( μ, σ ) = / σ, t ( x) = x, and w ( μ, σ ) = μ/ σ Theorem 34: If is a random variable with pdf or pmf of the form (34), then k wi ( ) E( θ t ( )) log( ( )) i i = c θ = θ θ j j k wi( θ) k wi( θ) i i = θ = i= i θ j θ j θ j Var( t ( )) log( c( )) E( t ( ))
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Proof: Omitted Example 343 (Binomial mean and variance) Solution: d wp ( ) = d log( p ) =, dp dp p p( p) d p wp ( ) = + = dp p ( p) p ( p) d log( c( p)) = d nlog( p) = n, dp dp p, d dp n log( c( p)) = ( p) Therefore, we have n E( ) E np p( p) = p =, n p Var( ) = E( ) Var( ) = np( p) p( p) ( p) p ( p) Example (Normal mean and variance) Solution: 3
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 w (, ) (/ ) μσ σ = = 0, μ μ w (, ) ( / ) μσ μ σ = = / σ, μ μ w (, ) (/ ) μσ σ 3 = = / σ, σ σ w (, ) ( / ) μσ μ σ 3 = = μ / σ, σ σ log( c( μ, σ )) = ( 05log( π) log( σ) μ /( σ )) = μ/ σ, μ μ 3 log( c( μ, σ )) = ( 05log( π) log( σ) μ /( σ )) = / σ + μ / σ σ σ Therefore, we have μ E( ) = and σ σ μ μ E( ( ) ) =, then we have 3 3 3 σ σ σ σ E = μ and E = σ + μ Definition 345: The indicator function of a set A, most often denoted by I ( x ), is the function A 4
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Alternatively, we can use I( x A) I A, x A ( x) = 0, x A Example: Let have a pdf given by Show that this is not an exponential family Solution: f x = I x x ( θ ) θ [ θ, ) ( )exp( ( / θ)) f x x x ( θ) = θ exp( ( / θ)),0 < θ < < Example 34: Several examples for exponential families - Binomial, Poisson, Exponential, normal Solution: () f x n p = I n n p n p x p p I x p I x p x x = x = p x p n then hx ( ) = I{0,,, n} ( x) x, c( p) = ( p) n, tx ( ) = x, and wp ( ) = log( p/( p)) Note: 0 < p <, and f ( x p ) is different for p = 0, 0< p <, and p = The above formula must matches all x Therefore, f ( x n, p ) is an exponential family only if 0 < p < x n x n x n (, ) {0,,, n} ( ) ( ) {0,,, n} ( ) ( ) ( ) {0,,, n} ( ) ( ) exp( log( )), 5
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 xe λ f ( x ) I ( x) I ( x) e exp( λlog( x)), then hx ( ) = I{0,, }( x)/ x!, c( λ) = e λ, tx ( ) = log( x), x! x! = λ λ λ () λ = {0,, } = {0,, } and w( λ) (3) x f ( x β ) = exp( ), x 0 β β >, then hx ( ) = I ( x) (0,, c( β ) = / β,() tx ) = x,and w( β ) = β ( x μ) x xμ μ (4) f ( x μσ, ) = exp( ) = exp( + ), then hx ( ) =, πσ σ πσ σ σ σ μ c( μσ, ) = exp( ) πσ σ, t ( x) = x /, w ( μ, σ ) = / σ, t ( x) = x, and w ( μ, σ ) = μ/ σ A Re-parameterization of Exponential Families (Canonical Form): f ( x η) = h( x) c( η*)exp( ηt ( x)), where hx ( ) and t ( x) are the same as in the original parameterization The set i k i= k H = { η = ( η,, η ): hx ( )exp( ηt( x)) dx < }, which is called the natural parameter space for the family i i k i = i i Example 346 (Re-parameterization of the Normal Distribution) Solution: f ( x η η η η, η ) = exp( )exp( x + x ), where η = μ/ σ and η = / σ η π η 6
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Definition 347: A Curved exponential family is a family of densities of the form (34) for which the dimension of the vector θ is equal to d < k If d = k, the family is a full exponential family Example 348: Normal with mean μ and variance σ = μ Notes: Theorem 34 also applied to curved exponential families Exponential families have nice properties that are very useful in statistical inference Section 35: Location and Scale Families Three types of families of interest: location families scale families 3 location-scale families Notes: Each of these families is constructed from a single pdf (or pmf) known as the standard pdf (pmf) for the family All other pdfs (or pmfs) in the family are obtained by transforming the standard pdf (or pmf) in a prescribed way 7
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Theorem 35: Let f ( x) be any pdf and let μ and σ > 0 be any given constants Then the function is a valid pdf Proof: x μ gx ( μσ, ) = f( ) 0 and σ σ x μ gx ( μσ, ) = f( ) σ σ x μ x μ gx ( μσ, ) dx= f( ) dx= f( ydy ) = ( y= ) σ σ σ Definition 35: Let f ( x ) be any pdf Then the family of pdfs f ( x μ), indexed by the parameter μ ( < μ < ), is called the location family with standard pdf f ( x) and μ is called the location parameter for the family Notes: The effect of location parameter shifts the density to the left or right but the shape remains unchanged If Z has a pdf f ( z ), then = Z + μ has density f ( x μ) x Example 353 (Exponential location family) Let f( x) = e, x 0, and f ( x) = 0, x< 0 To form a location family we replace x with x μ to obtain ( x μ) ( x μ) e, x μ 0 e, x μ f( x μ) = = 0, x μ < 0 0, x< μ 8
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 If we use the indicator function to represent this, we have f ( x μ) = e I ( x μ) = e I ( x) ( x μ) ( x μ) [0, ) [ μ, ) Definition 354: Let f ( x) be any pdf Then for any σ > 0, the family of pdfs / σ f ( x / σ ), indexed by the parameter σ, is called the scale family with standard pdf f ( x) and σ is called the scale parameter of the family Note: The effect of scale parameter σ is either to stretch or to contract the graph f ( x ) maintaining the same basic shape of the graph Example (Normal Distribution): x f( x σ) = exp( ), x, σ 0 πσ σ < < > Definition 355: Let f ( x ) be any pdf Then for any μ( < μ < ), and any σ > 0, the family of pdfs / σ f (( x μ)/ σ), indexed by the parameter ( μ, σ ), is called the location-scale family with standard pdf f ( x ) ; μ is called the location parameter and σ is called the scale parameter Examples (Normal and double exponential distributions) ( x μ) f( x μσ, ) = exp( ), < x<, < μ<, σ> 0 πσ σ x μ f( x μσ, ) = exp( ), < x<, < μ<, σ> 0 σ σ 9
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Theorem 356: Let f ( x ) be any pdf Let μ be any real number, and let σ be any positive real number Then is a random variable with pdf / σ f (( x μ)/ σ) if and only if there exists a random variable Z with pdf f ( z ) and = σ Z + μ Proof; To prove the if part, define gz ( ) = σ z+ μ and let = g( Z) gz ( ) is the a monotone function, d g ( x) = ( x μ)/ σ and g ( x ) = Thus by Theorem 5, we have dx σ μ f( x) = fz( g ( x)) g ( x) = f( ) dx σ σ = and let Z = g( ) It is similar to prove only if part: define gx ( ) ( x μ)/ σ d x Theorem 357: Let Z be a random variable with pdf f ( z ) Suppose EZ and VarZ exist If is a random variable with pdf / σ f (( x μ)/ σ), then E = EZ + μ and Var = σ VarZ Proof: = σ Z + μ Section 36 Inequalities and Identities Theorem 36 (Chebychev s Inequality): Let be a random variable and let g( x ) be a nonnegative function Then, for any r > 0, we have Eg( ) Pg ( ( ) r) r 0
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Proof: {: xg( x) r} {: xg( x) r} Eg( ) = gxf ( ) ( xdx ) gxf ( ) ( xdx ) r f ( xdx ) = rpg ( ( ) r) Example 36: Let gx ( ) ( x μ) / σ =, where E μ = and Var = σ And let r = t for convenience Then ( μ) ( μ) P( t ) E = σ t σ t Therefore, it follows that P( μ tσ) and P( μ < tσ) t t For instance, if t = 3, then P( μ < 3 σ ) / 9 = 08889 Hence, the probability that any random variable will be within 3 standard deviation of its mean is at least 8889% Example 344 (A normal probability inequality) If Z ~ n (0,), then t / e P( Z t) ( t > 0) π t Proof: t x e = = π π t π t x / x / PZ ( t) e dx e dx t t /
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Note: Chebychev s Inequality is widely applicable but very conservative For example, if ~ n (0,), and t =, then according to the Chebychev s Inequality, we have P( Z t) / = 05 According to this example, we / e have P( Z t) = 0054 While P( Z t) = 00455 Actually, you can prove that π t / e t / t / t e t e t π t π π Lemma 365 (Stein s Lemma): Let Then Proof: ~ n( μ, σ ), and let g be a differentiable function satisfying E g'( ) < E g Eg [ ( )( μ)] = σ '( ) ( x μ) Eg [ ( )( μ)] = gx ( )( x μ)exp( ) dx πσ σ ( x μ) ( x μ) = [ σ gx ( )exp( ) σ g'( x)exp( ) dx] + πσ σ σ = σ Eg '( ) Example 366 (Higher-order normal moments) If ~ n( μ, σ ), then E = E( μ + μ) = E( μ) + μ = μ, E = E ( μ + μ) = E ( μ) + μe = σ + μ,
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 E = E ( μ + μ) = E ( μ) + μe = σ E + μe 3 = + + = + 3 μσ μ( σ μ ) 3 μσ μ 3