2.1 Probability, stochastic variables and distribution functions
|
|
- Shon Pierce
- 5 years ago
- Views:
Transcription
1 Chapter 2 Probability and statistics 2.1 Probability, stochastic variables and distribution functions The defining characteristic of a stochastic experiment E is that it produces different outcomes under ostensibly similar circumstances. Although it is not possible to know with certainty the outcome of a stochastic experiment, it is possible to describe all possible outcomes. We denote by Ω the set of all possible outcomes from an experiment. The set Ω can be finite, countable or uncountable. Here are some examples: E (toss a coin, Ω = {Heads, T ails} E (observe the number of days a light bulb lasts, Ω = {, 1, } E (observe the price of a stock at time t, Ω = (, The collection F of subsets of Ω that are of interest are called events. The collection of events F must satisfy 1. Ω F 2. if E F then E c F 3. if E 1, E 2, F then i=1e i F. We associate with each event E a probability by defining the set function where F is the collection of events. P must satisfy P (E : F [, 1], (2.1 1
2 1. P (E, P (Ω = 1 2. For E 1, E 2, F such that E i E j = we have P ( i=1e i = i=1 P (E i. The set function P has the following properties: 1. for any event E, P (E c = 1 P (E, 2. P ( =, 3. if A B then P (A P (B (monotonicity property, 4. for any two events A and B, P (A B = P (A + P (B P (A B P (A + P (B, 1 5. if A 1 A 2 A 3 and we define A = i=1a i, then P (A = lim n P (A n. Also, if A 1 A 2 A 3 and we define A = i=1a i, then P (A = lim n P (A n (continuity property. For any two events E 1 and E 2, the conditional probability of E 2 given E 1 is denoted by P (E 2 E 1 = P (E 2 E 1 P (E 1 whenever P (E 1 >. For fixed E 2, P (E 1 E 2 is a legitimate probability measure in that it satisfies the requirements 1 and 2 following (2.1. The events E 1, E 2,, E n are independent if for any 1 i 1 < i 2 < < i k n we have where k = 1,, n. P (E i1 E i2 E ik = P (E i1 P (E ik, (2.2 We call {B 1, B 2,, B K } a partition of Ω if K i=1b i = Ω and for all i and j, we have that B i B j =. For any event A, we have that and by construction Thus, A = (A B 1 (A B 2 (A B K P (A = P (A B 1 + P (A B P (A B K. P (B j A = P (B j A P (A = P (A B jp (B j P (A = P (A B jp (B j K i=1 P (A B ip (B i. (2.3 Equation (2.3 is called Bayes law. It allows the calculation of the probability associated with a particular event (in this case B j with knowledge that event A has occurred. What follows is a simple example on how Bayes law can be used to update our beliefs on probabilities. 1 In fact, if A 1, A 2, is a sequence of events P ( i=1 A i i=1 P (A i (subadditivity property. 2
3 Example 2.1. Suppose the stochastic experiment under consideration is the tossing of a coin. Then, Ω = {H, T } and let F = {, {H}, {T }, {H, T }}. Let B 1 = {H}, B 2 = {T } and note that B 1 B 2 = and B 1 B 2 = Ω. A fair coin is such that a = P (B 1 =.5 and consequently P (B 2 = 1 a =.5. Suppose that our prior belief is that either a =.5 (coin fair with probability.5 or a =.8 (coin unfair with probability.5. Suppose that after tossing the coin three times we observe H in each of the tosses. If A represents such occurrence and the tosses are independent, by Bayes law P (a =.5 A =.5 P (A a =.5.5 P (A a = P (A a =.8 =.5 ( ( (.8 3 = P (a =.5 A represents the probability that the coin is fair conditional, or updated, on the fact three heads were observed after the tosses. It is commonly called the posterior probability (after observing the outcome of the tosses that the coin is fair. Not surprisingly, it is smaller than the prior probability (.5. Let X be a function X(s : Ω R. We call X a stochastic variable if, and only if, the inverse image of any interval (, x] under X is an event. Formally, X 1 ((, x] F for all x R. In this case, we can write, P (X 1 ((, x] = P X 1 ((, x] = P X ((, x] (2.4 for all x R. We can now think of Ω as the real line R, the events as (suitable subsets of R and P X as a probability on subsets of R. When we fix X and P, that is, when we have a particular stochastic variable and probability in mind, we can think of P X ((, x] as a function of x and define the distribution function of X as We have the following properties for F X. F X (x = P X ((, x] for all x R. (2.5 Theorem 2.1. Let F X (x : R [, 1] be the distribution function associated with the stochastic variable X. Then, a F X is continuous from the right; b F X is monotonically nondecreasing; c lim x F X (x = 1 and lim F X(x =. x Proof. We first prove b. Note that for x < y, (, x] (, y], hence F X (x = P X ((, x] P X ((, y] = F X (y. c Consider a sequence {x n } such that x n. F (x n = P X ((, x n ] and lim n F (x n = lim n P X ((, x n ] = P X (lim n (, x n ] = P X (R = 1. Similarly, if {x n } such that x n. F (x n = P X ((, x n ] and lim n F (x n = 3
4 lim n P X ((, x n ] = P X (lim n (, x n ] = P X ( =. Finally, to prove a we need to show that F (x n F X (x as x n x, but this follows directly from the fact that (, x n ] (, x] and the continuity property of P X. It is convenient to classify stochastic variables as discrete or continuous. Discrete stochastic variables are those whose image forms a countable set. Continuous stochastic variables have uncountable image. Hence, for a discrete stochastic variable X that takes values {x 1, x 2, } we have that A j = X 1 (x j for j = 1, 2, are events with P (A j = P X (X = x j = p j > and j=1 p j = 1. We say that that X (or F X is absolutely continuous if there exists a non-negative function f X : R [, that satisfies F X (a = a f X (xdx. (2.6 We call f X the density function associated with X. It is easy to verify that P (a X b = b f a X(xdx = F X (b F X (a, lim F X (a = 1 and lim F a X(a =. a If F X (x is continuous an strictly increasing, it has an inverse function which we denote by F 1 X : (, 1 R. For each q (, 1 there exists x q such that x q = F 1 X (q F X(x q = q. (2.7 x q is called the q-quantile associated with the distribution of the stochastic variable X. For example, if q =.95, then x.95 is the value of the stochastic variable that will be exceeded with probability 5 percent. Or, alternatively, the value of the stochastic variable that will not be exceeded with probability.95. If F is not strictly increasing, then there might exist several values of X associated with a particular quantile q. If X is a discrete stochastic variable, that is, a stochastic variable that takes on countably many values, then F X is a step function and does not have an inverse. 2.2 Expectation and variance The expected value of a continuous stochastic variable X, denoted by E(X, is given (whenever it exists by E(X = R xf X (xdx. (2.8 If the stochastic variable is discrete, taking on countably many values {x 1, x 2, }, we write E(X = x i P (X = x i (2.9 i=1 4
5 whenever the summation exists. The variance of a stochastic variable X, denoted by V (X, is given (whenever it exists by V (X = E((X E(X 2. (2.1 It is easy to show that V (X = E(X 2 (E(X 2. The standard deviation of a stochastic variable is given by V (X. Note that E(X = R xf X(xdx = xf X(xdx + xf X (xdx = I 1 + I 2. If I 1 and I 2 are both finite, then E(X exists as real number. If I 1 (I 2 is a real number but I 2 = ± (I 1 = ± then E(X = ±, and if I 1 = and I 2 = then E(X is not defined, or does not exist. If E(X does not exist, neither does V (X. Note also that even if E(X is finite, V (X can be infinite provided that E(X 2 =. 2.3 Functions of stochastic variables If X is a stochastic variable and g is a continuous function defined on the set in which X takes values, then Y = g(x is also a stochastic variable. Furthermore, if g is strictly increasing p = P (Y y = F Y (y = P (g(x y = P (X g 1 (y = F X (g 1 (y, (2.11 and differentiating we have d dy F Y (y = d dy y If g is strictly decreasing, we have f Y (zdz = f Y (y = d dy F X(g 1 (y = f X (g 1 (y d dy g 1 (y. p = P (Y y = F Y (y = P (g(x y = P (X g 1 (y = 1 F X (g 1 (y, (2.12 and differentiating we have d dy F Y (y = d dy y f Y (zdz = f Y (y = d dy (1 F X(g 1 (y = f X (g 1 (y d dy g 1 (y. Hence, for strictly monotone functions we have f Y (y = f X (g 1 (y d dy g 1 (y. (2.13 Also, note that if p = F Y (y, from equation (2.11 we have that F 1 X (p = g 1 (y = g(f 1 1 (p = y = F (p. (2.14 X That is, the p-quantile of Y is just the mapping under g of the p-quantile of X. 5 Y
6 Example 2.2. Let g(x = a + bx for b. Then g 1 (y = y a and d b dy g 1 (y = 1/b and ( f Y (y = f y a X. Also, F b b Y (p = a + bfx (p if b > and FY (p = a + bfx (1 p if b <. 2.4 Samples A sample of size n is a collection of values (realizations of stochastic variables. We denote by, χ n = {x 1, x 2,, x n } with x i being the realization of stochastic variable X i. To avoid additional notation, we will use uppercase X i to denote both a stochastic variable (a function and its realized value. As such, it will be clear from the context when {X i } n i=1 represents a collection of stochastic variables or a collection of realizations of stochastic variables. If the stochastic variables {X i } n i=1 are independent, i.e., P ({X 1 A 1 } {X 2 A 2 } {X n A n } = P ({X 1 A 1 }P ({X 2 A 2 } P ({X n A n } and if X i = X for all i, we say that χ n is a stochastic sample and {X i } n i=1 is a sequence of independent and identically distributed stochastic variables. The sample average, normally denoted by X, is given by X = n 1 n i=1 X i. The sample variance, normally denoted by s 2, is given by s 2 = (n 1 1 n i=1 (X i X 2 and the sample standard deviation is given by s = s Parametric models Often it assumed that the density f X (x associated with a stochastic variable X is an element of an indexed class of densities. Let the index be represented by θ and the set in which the index takes value be represented by Θ R K, K a positive integer. Then, we write f X (x; θ call θ a finite dimensional parameter and Θ the parameter space. If there exists a one-to-one relation between Θ and the class of densities we say that the parameter θ is identified. As a consequence, knowledge of θ is equivalent to knowledge of f X. In this case, E(X = m(θ, V (X = h(θ and F 1 X (q = x q(θ. Example 2.3. Let Y = X + µ for µ R. From above we have that f Y (y = f X (y µ, E(Y = E(X + µ, V (Y = V (X and F 1 1 (q = µ + F (q for q (, 1. Y 6 X
7 More generally, let Y = θx + µ for µ R and θ >. Then, E(Y = θe(x + µ and V (Y = θ 2 V (X. Hence, we have that f Y (y; µ, θ = 1f ( y µ θ X θ. Note that fx can be viewed as a special case (µ = and θ = 1 of a family of distributions given by F = {f Y (y; µ, θ : µ R, θ (, }. This is called a location-scale family, where the location is given by µ (location parameter and the scale is given by θ (scale parameter. What follows are examples of parametrically indexed family of distributions: Binomial - Let n be the number of trials associated with a stochastic experiment that allows for two outcomes: success with probability θ and failure with probability 1 θ. Let X be the total number of success in n trials, then ( n P (X = k = θ k (1 θ n k k for k = 1, 2,, n. The probability distribution P ( is called a binomial distribution with parameters n and θ and is denoted B(n, θ. In this case we say that X B(n, p. If n = 1 we say that X has a Bernoulli distribution. Theorem 2.2. If X B(n, p, then E(X = nθ and V (X = nθ(1 θ. Proof. Recall that (1 + x n = ( n n k= k to x we have n(1 + x n 1 = and multiplying both sides by x we get nx(1 + x n 1 = x k. Differentiating both sides with respect n k= n k= ( n k ( n k kx k 1 kx k. (2.15 Now, E(X = n k= ( n k = (1 θ n n θ k (1 θ n k k = k= ( n k k n k= ( k θ. 1 θ ( n k θ k (1 θ n (1 θ k k 7
8 Letting x = θ 1 θ in (2.15, we have ( ( θ E(X = (1 θ n n 1 + θ n 1 = nθ. 1 θ 1 θ For the variance, we put S n,k = (1 θ ( n n n ( θ k k= k2 k 1 θ and note that V (X = S n,k n 2 θ 2. Differentiating (2.15 with respect to x we obtain n(n 1(1 + x n 2 x 2 = = n ( n k n ( n k k= k= k(k 1x k k 2 x k n k= ( n k k(k 1x k. (2.16 Letting x = θ/(1 θ, multiplying both sides of (2.16 by (1 θ n and noting that (1 θ ( n n n k= kx k k = nθ we have, n(n 1θ 2 = S n,k nθ. Thus, V (X = n(n 1θ 2 + nθ n 2 θ 2 = nθ(1 θ. Figure 2.1 provides a graph of a B(5,.6 distribution. The height of each of the 5 bars provides the probability of X = 1,, 5 in 5 trials (see MATLAB code binomial.m P(X=x x Figure 2.1: Plot of a Binomial probability function with n = 5 and p =.6. 8
9 Uniform - Let X be a continuous stochastic variable that takes values in the interval [a, b] for a, b R with density f(x = { 1 b a if x [a, b] if x / [a, b]. In this case we say that X U[a, b] with parameters a and b. It is easy to show that E(X = a+b 2 and V (X = (b a2 12. Note that we can reparametrize this density by setting µ = a+b 2 and σ = (b a/ 12. The first parametrization emphasizes the endpoints of the set in which X takes values and the second parametrization emphasizes the expected value and variance of the distribution. Since strictly increasing cdfs (F have inverses, it is always possible to generate a stochastic sample from any such distributions by first generating a stochastic sample from U[, 1], say {u 1,, u n } and then obtaining x i = F 1 (u i cdfs take values in [, 1]. normal (Normal - Let X be a continuous stochastic variable that takes values in the interval (, with density f(x = 1 1 (x µ 2 2πσ 2 e 2 σ 2, with µ R and σ >. In this case we say that X N(µ, σ 2 with parameters µ and σ 2. It can be shown (the best way to do this is integrating using polar coordinates that E(X = µ and V (X = σ 2. When µ = and σ 2 = 1 we write X N(, 1 and say that X has a standard normal (normal density. Note that if Z N(, 1, then Y = µ + σz where µ R and σ > is such that E(Y = µ, V (Y = σ 2. Furthermore, f Y (y = 1 f ( y µ σ Z σ = 1 1 (y µ 2 2πσ 2 e 2 σ 2. Also, F Y (y = y f Y (αdα = y 1 f ( α µ σ Z σ dα. Changing variables by letting z = α µ we σ have that y µ/σ 1 y µ/σ F Y (y = σ f Z(zσdz = f Z (zdz. Figures 2.2, 2.3 and 2.4 show the graphs of normal densities with different µ and σ 2, a normal distribution function and a normal quantile function (see MATLB code normgen.m. Log-normal - Let Y N(µ, σ 2, then X = exp(y is said to have a Log-Normal density 9
10 .4.35 N(,1 N(,2 N(2,1.3 Value of the density Realizations of the Random Variable Figure 2.2: normal densities N(,1 CDF(,1 Value of the distribution function Realizations of the Random Variable Figure 2.3: Standard normal density and distribution function 4 Standard Normal Quantiles Quantiles probabilities Figure 2.4: Standard normal quantile function 1
11 .35.3 LN(1,1 LN(1,.25 LN(1.5, Value of the density Realizations of the Random Variable Figure 2.5: log-normal densities and we write X LN(µ, σ 2. Clearly, f X (x = 1 x f Y (log x (2.17 = x 2πσ 2 e 2 (log(x µ 2 σ 2 (2.18 for < x <. It can be shown that E(X = e (µ+(1/2σ2 and V (X = e 2µ+σ2 (e σ2 1. Figure 2.5 contains the graphs of three log-normal densities (see MATLAB code lognormgen.m. Exponential - Let X be a stochastic variable taking values in (, with density given by f(x = e x/θ with θ > θ In this case we say that X has an exponential density with parameter θ and we write X exp(θ. It can be shown that E(X = θ and V (X = θ 2. Pareto (Type 1 - Let X be a stochastic variable taking values in (c,, c > such that P (X > x = ( c α, ( x α >. Then, F (x = 1 c α x and f(x = αc α x α 1. The E(X = αc/(α 1, V (X = ( c 2 α. An important characteristic of the Pareto α 1 α 2 distribution is that it decays at a slow polynomial rate compared to densities that decay exponentially. 11
12 Value of the Normal and Student t densities t(8,,8/6 N(,8/ Realizations of the Random Variable Figure 2.6: normal and t densities with E(X =, V (X = 8/6 and v = Important densities related to the normal Suppose that {Z i } v i=1 is a sequence of independent and identically distributed stochastic variables with Z i N(, 1. Then, X = v i=1 Z2 i χ 2 (v where v is the parameter of the density. Note that X takes values in [,. This density is called a χ-squared density and the parameter v is called its degrees of freedom. It can be shown that E(X = v and V (X = 2v. If Z N(, 1, W χ 2 (v and Z and W are independent, then the ratio Y = Z W v t(v = v+1 Γ( ( v x2 2 vπγ(v/2 v where Γ(x = t x 1 exp( tdt is the gamma function. It can be shown that E(Y = if v > 1 (if v = 1, E(Y does not exist and V (Y = if v > 2 (if v = 2, V (Y is v v v 2 infinite. If v > 2 and X = µ + σy, then E(X = µ and V (X = σ 2. Also, f X (x = v 2 v ( v 2 x µ f v σ Y which we denote by t(v, µ, σ. Figure 2.6 shows the graph of a t-density σ v 2 and a normal density (same expected value and variance and Figure 2.7 shows the tails of the same densities in Figure 2.6. If V χ 2 (v 1 and W χ 2 (v 2, and V and W are independent then for v 1, v 2 > X = V/v 1 W/v 2 F (v 1, v 2, where F (v 1, v 2 denotes a Fisher s F - distribution with v 1 and v 2 degrees of freedom. 12
13 .1.9 t(8,,8/6 N(,8/6 Value of the Normal and Student t densities Realizations of the Random Variable Figure 2.7: Tail of normal and t densities with E(X =, V (X = 8/6 and v = Characteristic functions If X is a stochastic variable with density function f X we define the characteristic function associated with f X as the complex valued function φ fx (τ = exp(iτxf X (xdx E(exp(iτx (2.19 where τ R and i 2 = 1. Characteristic functions always exist because by the triangle inequality exp(iτxf X (xdx exp(iτx f X (xdx, and since exp(iτx = cos(τx+isin(τx = cos 2 (τx+sin 2 (τx = 1, we have exp(iτxf X (xdx 1. In mathematical parlance, the characteristic function is called the exponential Fourier transform of the density f X. some density functions. Example 2.4. The following example gives the characteristic functions of Let Y U[ a, a] then φ fy (τ = sina(τ aτ ; Let Y N (µ, σ 2 then φ fy (τ = exp ( iτµ 1 2 τ 2 σ 2 ; Let Y exp(θ then φ fy (τ = θ ; θ iτ There is a very important result that establishes that the characteristic function of a density function uniquely determines (or characterizes the density function. Put differently, for every characteristic function there is one, and only one, density function. This result is known as the Uniqueness Theorem for characteristic functions and we will state it without proof. 13
14 Theorem 2.3. The characteristic function φ fx associated with f X uniquely determines f X. Proof. The proof is advanced for this course. If interested you may consult Jacod and Protter (2[p. 17] or Resnick (25[p. 32]. The usefulness of this theorem is illustrated by the following result, that shows that linear combinations of independently distributed stochastic variables are normally distributed. Theorem 2.4. Let {Z j } n j=1 be a collection of independent stochastic variables such that Z j N (µ j, σj 2 and consider Y = n j=1 a jz j where a j R are non-stochastic. Then, ( n Y N j=1 a jµ j, n j=1 σ2 j a 2 j Proof. φ fy (τ = E(exp(iτY = E(exp(iτ n j=1 a jz j = n i=1 E(exp(iτa jz j, where the last equality follows from the independence of the Z j s. Note that E(exp(iτa j Z j is the characteristic function of a normally distributed stochastic variable evaluated at τa j. From example 4 we have E(exp(iτa j Z j = exp(iτa j µ j 1 2 σ2 j a 2 jτ 2, and consequently, ( ( n n ( n φ fy (τ = exp(iτa j µ j 1 2 σ2 j a 2 jτ 2 = exp iτ µ j a j 1 2 τ 2 σj 2 a 2 j. j=1 j=1 This is a characteristic function that is uniquely associated with a normal density given by ( n n N µ j a j, σj 2 a 2 j. j=1 j=1 j=1 2.8 Order statistics and empirical distributions Let X be a stochastic variable with distribution function given by F X and χ n = {X i } n i=1 be a stochastic sample of size n. The empirical distribution associated with χ n is given by F n (x = 1 n I {ω:xi x}(ω, (2.2 n i=1 where I A (ω is the indicator function associated with set A, that is, I A (ω = 1 if ω A and I A (ω = if ω / A. The empirical distribution is a stochastic variable (it depends on χ n and it is easy to see that E(F n (x = 1 n E(I {Xi x}(x i = 1 n n n(1 F X(x + (1 F X (x = F X (x (2.21 i=1 14
15 and, by independence of the X i s, we have V (F n (x = 1 n 2 n V (I {Xi x}(x i = 1 n F X(x(1 F X (x. (2.22 i=1 The order statistics associated with the sample χ n are the elements of χ n listed in ascending order, and denoted by X (1, X (2, X (n. The empirical distribution is discontinuous with jumps at the order statistics. In this case, we define the quantile of order a (, 1 to be F X (a = inf {x : F X(x a}. Then, we have that F n (a = inf {x : F n (x a} = inf {X (j : F n (X (j a} = inf {X (j : j/n a} = inf {X (j : j na}. Thus, the quantile F n (a is the smallest order statistic X (j for which j na. Since j is a natural number, we write q n (a = F n (a = { X(na if na N X ([na]+1 if na / N, (2.23 where [x] denotes the integer part of the number x. For example, if n = 1 and a =.95, then the.95 quantile associated with the empirical distribution is X (na which in this case is X (95. If na is not a natural number, then we round to the next largest order statistic, that is, X ([na]+1. We note that if X N(µ, σ 2 then F 1 1 (a = µ + σf (a for a (, 1. This equation X Z says that the quantiles of X can be written as a linear function of the quantiles of a standard normal distribution with intercept given by µ and slope given by σ. σf 1 (a can be easily Z calculated and F 1 X (a can be approximated by order statistics of a sample of observations on X. If X is indeed normally distributed, then plotting q n (a Fn (a against F 1 Z (a should produce a graph close to a linear function. Deviations from a linear function should result only from the fact that q n (a is an estimator for F 1 X (a. The resulting plot is called a Q-Q plot and significant deviations from a linear function are evidence of nonnormality. Figures 2.8 and 2.9 show Q-Q plots for stochastic samples from a N(, 8/6 and a t(5,, 8/6. Figure 2.1 provides a Q-Q plot for a sequence of 527 daily log-returns for three month future contracts for wheat at the Chicago Board of Trade, where the last trading day is September 8, 214. Visual inspection suggests that the Q-Q plot for the daily log-returns for wheat contracts is more similar to that of a t-distribution than to that of a normal distribution, exhibiting much thicker tails than those associated with a normal distribution. 15
16 5 QQ Plot of Sample Data versus Standard Normal Quantiles of Input Sample Standard Normal Quantiles Figure 2.8: Q-Q plot for a normal density with E(X =, V (X = 8/6 6 QQ Plot of Sample Data versus Standard Normal 4 Quantiles of Input Sample Standard Normal Quantiles Figure 2.9: Q-Q plot for a Student-t density with E(X =, V (X = 8/6 and v = 5.15 QQ Plot of Sample Data versus Standard Normal.1.5 Quantiles of Input Sample Standard Normal Quantiles Figure 2.1: Q-Q plot for daily log-returns for wheat contracts at the Chicago Board of Trade 16
17 2.9 Skewness and kurtosis Skewness and kurtosis of a distribution are measures of its shape. A measure of skewness captures the extent to which a distribution is asymmetric. We say that a distribution is symmetric about a point µ, if P (X µ + x = P (X µ x for all x in the set where the stochastic variable takes values. When µ = E(X we say that the distribution of X is symmetric about the mean. When it exists, the skewness of a distribution is given by, Sk(X = E((X E(X3 σ 3, where V (X = σ 2. (2.24 The next theorem shows that if a distribution is symmetric about E(X then Sk(X =. Theorem 2.5. Let F be symmetric about µ = xdf (x. Then, (x µ 3 df (x =. Proof. Let z = x µ. Then, I = (x µ 3 df (x = z 3 df (z + µ = z3 df (z + µ + z 3 df (z + µ. Now, letting z = y z 3 df (z + µ = ( y 3 df ( y + µ and by symmetry of F, F (µ y = 1 F (µ + y. Consequently, Hence, df (µ y = df (µ + y. z 3 df (z + µ = y 3 df (µ + y. Thus, I =. The normal, student-t and uniform densities are examples of densities which are symmetric about µ. Examples of asymmetric distributions include the log-normal, where Sk(X = (e σ2 + 2(e σ2 1 1/2 >, the binomial, where Sk(X = 1 2θ (nθ(1 θ 1/2. When Sk(X > we speak of right skewness and when Sk(X < we speak of left skewness. Kurtosis is a measure of the relative probability weight of the tails and center of a density relative to its shoulder. It is arbitrary what constitute the center, shoulders and tails of a distribution. Normally, for a distribution that is symmetric about µ the center is defined as [µ σ, µ + σ], the shoulders as (µ + σ, µ + 2σ] [µ 2σ, µ σ and the tails as (, µ 2σ (µ + 2σ,. Kurtosis is normally defined as, K(X = E((X E(X4 σ 4, where V (X = σ 2. (
18 The greater the probability mass on the center and (specially the tails relative to the shoulder, the greater the kurtosis. If X N(, 1, then K(X = E(X 4 = x 4 f X (xdx. Letting Y = X 2 we have that K(X = y 2 f Y (ydy = V (Y + (E(Y 2 = = 3, since Y χ 2 1. It is common to measure kurtosis relative to K = 3. Hence, if a distribution has K(X > 3 we say that it has excess kurtosis (relative to the standard normal distribution it has more weights on the center and tails. Figures 6 and 7 show that this is the case for the Student-t distribution. In fact, the kurtosis for a student t distribution t(v,, v/(v 2 is given by K = 3+ 6 for v > 4. When a distribution is not symmetric, the measure of kurtosis embeds v 4 both a measure of asymmetry and relative probability weight, making it more difficult to interpret the meaning of K(X Tail behavior An important way in which the normal and Student-t densities differ has to do with their tail behavior. From the analytical expression for the normal, we see that it decays to zero as x at an exponential rate, that is f(x exp(.5x 2, whereas as the Student-t density decays at a polynomial rate, since f(x x.5(v+1. Note that the rate of decay for the Student-t slows down as v gets smaller. In this sense, the tail behavior of the Student-t is akin to the tail behavior of the Pareto density, with the constraint, in the case of the Student-t density, that v is an integer rather than a continuous parameter. An arbitrary distribution function F (x is said to have a Pareto right tail if 1 F (x = L(x x α (2.26 for some α > where L(x is slowly varying at. By slowly varying at it is meant that L(xλ L(x 1 as x for all λ >. To understand what this condition means about L(x, take λ = 1/2, then for x sufficiently large L(x/2 and L(x are nearly the same, that is, in parts of the domain where x is sufficiently large, multiplying x by 2 produces little change in L. Put differently, for x sufficiently large L is nearly a constant, and as a result for x large enough 1 F (x is nearly proportional to x α. The probability 1 F (x is called the survival function of the stochastic variable X. 18
19 2.9.2 Multivariate distributions It is often the case that we are interested in multiple stochastic variables. Suppose, for example, that rather than dealing with a single stochastic variable X we are interested in a collection of d stochastic variables. It is convenient to collect them in vector X 1 X 2 X =.. X d In this case we speak of a stochastic vector. We are interested in attaching probabilities to the event A = {X 1 (, x 1 ]} {X d (, x d ]}, where x = ( x 1 x d. The joint cumulative distribution associated with the vector X is given by F X (x = P (X A. When F X admits a density f X (x : R d R, it must satisfy F X (y = y1 yd If the stochastic vector has independent components, than f X (x 1,, x d dx 1 dx d. (2.27 f X (x 1,, x d = f X1 (x 1 f Xd (x d and f Xi (x i is called the marginal density of X i. One synthetic measure of how two stochastic variables behave relative to each other is called the covariance. Whenever it exists, we define it as, C(X 1, X 2 = E((X 1 E(X 1 (X 2 E(X 2. It is easy to show that C(X 1, X 2 = E(X 1 X 2 E(X 1 E(X 2. Furthermore it follows that if X 1 and X 2 are independent, than C(X 1, X 2 =. In fact, for any two (continuous functions g and h, when X 1 and X 2 are independent, we have E(g(X 1 h(x 2 = E(g(X 1 E(h(X 2. The correlation between two stochastic variables X 1 and X 2 is given by ρ(x 1, X 2 = C(X 1, X 2 V (X1 V (X 2. We note that for any a R, E ( (a(x 1 E(X 1 + (X 2 E(X 2 2 = f(a = a 2 V (X 1 + 2aC(X 1, X 2 + V (X 2. 19
20 This is a nonnegative quadratic function and consequently, it must be that 4C 2 (X 1, X 2 4V (X 1 V (X 2, which implies that C(X 1, X 2 V (X 1 V (X 2 (2.28 and ρ(x 1, X 2 1. The inequality in (2.28 is a special case of a more general inequality called the Cauchy-Schwarz inequality. as, Given a sample of two stochastic variables {(X 1i, X 2i } n i=1 we define the sample covariance Ĉ = 1 n and the sample correlation as ˆρ = n (X 1i X 1 (X 2i X 2 i=1 Ĉ. s 2 X1 s 2 X2 The conditional density of X d given X d 1 X 1 is given by f Xd X d 1 X 1 (x = f X(X 1, x f X d (X d (2.29 where X d = (X 1,, X d 1 and f X d (X d is the joint marginal distribution of X d. We define E(X d X d 1 X 1 = zf Xd X d 1 X 1 (zdz and V (X d X d 1 X 1 = (z E(X d X d 1 X 1 2 f Xd X d 1 X 1 (zdz. Example 2.5. Let f XY (x, y = 2 if < x < 1 and x < y < 1. Then, f X (x = 1 2dy = 2(1 X X. f Y X (y = 1. E(Y X = X ydy =. Similar calculations yield V (Y X = 1 x 1 X 2 (1 X Multivariate normal X Definition 2.1. The stochastic vector X = ( X 1 X 2 X d is said to have a multivariate normal distribution if for any set of constants a 1,, a d, the stochastic variable Y = d a i X i N(µ, σ 2, for some µ R and some σ 2 (,. (2.3 i=1 Clearly, if all a i =, except for i = j where a j = 1, then we can conclude that Y = X j N(E(X j, V (X j. Also, E(Y = d i=1 a ie(x i = a E(X, where E(X 1 a 1 E(X 2 a 2 E(X =. E(X d 2, a =. a d
21 and a represents the transposition of the vector a. The variance of Y is given by, ( d 2 V (Y = E a i (X i E(X i (2.31 i=1 ( d = E a 2 i (X i E(X i = i=1 d a 2 i V (X i + 2 i=1 d i=1 d a i a j (X i E(X i (X j E(X j (2.32 j<i d a i a j C(X i, X j (2.33 i=1 j<i = c cov(xc (2.34 V (X 1 C(X 1, X 2 C(X 1, X d C(X 2, X 1 V (X 2 C(X 2, X d where cov(x =..... In this case we write that.. C(X d, X 1 C(X d, X 2 V (X d X N(E(X, cov(x. It is useful to have an expression for the characteristic function of a stochastic vector that has a multivariate normal distribution. Theorem 2.6. X N(E(X, cov(x if, and only if, the characteristic function of its joint density is written as and t R d. ( φ fx (t = exp it E(X 1 2 t cov(xt (2.35 Proof. Suppose (2.35 holds, we need to show that Y = a X is univariate normal for any a R. Note that φ fy (u = E(exp(iuY = E(iua X = φ fx (ua = exp ( iua E(X 1 2 u2 a cov(xa, which implies that Y N(a E(X, a cov(xa by Theorem 2.4. Now, suppose X is multivariate normal, then Y = t X is univariate normal for any t R d and φ fy (u = exp(iuµ 1 2 u2 σ 2 where µ = d i=1 t ie(x i and σ 2 = t cov(xt. Then, φ fy (1 = exp(it E(X 1 2 t cov(xt, which is (2.35. If X is partitioned as X = (, ( X 1 E(X as E(X = E(X1 E(X 1 and cov(x as and if X N(E(X, cov(x then cov(x = X 1 ( V (X1 Σ 1, 1 Σ 1, 1 Σ 1, 1 X 1 X 1 N ( E(X 1 + Σ 1, 1 Σ 1 1, 1(X 1 E(X 1, V (X 1 Σ 1, 1 Σ 1 1, 1Σ 1, 1. (2.36 Equation (2.36 states that components of a multivariate normally distributed stochastic vector have normal conditional distributions. 21
22 2.1 Estimation Given a sample S n = {X 1,, X n } of observations on the stochastic variable X F (x; θ, for θ Θ, an estimator is a function ˆθ(X 1,, X n : S n Θ. The bias of an estimator ˆθ is defined as B(ˆθ = E(ˆθ θ and the mean squared error of the estimator is defined MSE(ˆθ = E ((ˆθ θ 2. We normally seek estimators that are efficient, in that, MSE minimized. It is clearly the case that MSE(ˆθ = V (ˆθ + B(ˆθ 2. (2.37 Hence, efficiency calls for estimators that have small variance and bias. When an estimator is such that B(ˆθ =, we call the estimator unbiased. When this is the case efficiency involves variance minimization Two basic estimation procedures We will consider two generic estimation procedures: maximum likelihood (ML and method of moments (MM estimation. Maximum likelihood estimation: Let S n = {X i } n i=1 be a sample and let X be a vector with component X i and assume that X f(x; θ where θ Θ R p. The function L(θ = f(x; θ : Θ R (for fixed X is called the likelihood function associated with the sample S n. The maximum likelihood estimator for θ, denoted by ˆθ ML, is (whenever it exists defined as ˆθ ML = argmax θ Θ Often, it is easier to maximize the logarithm of f(x; θ. increasing function of x, it follows that we can similarly define ˆθ ML as ˆθ ML = argmax θ Θ f(x; θ. (2.38 Since, the log(x is a strictly log f(x; θ. (2.39 It is often the case that enough assumptions are placed on the structure of the optimization in (2.39 to assure that ˆθ ML is the unique solution of log f(x; θ =. For example, if θ log f(x; θ is strictly concave in Θ, differentiable and reaches a maximum in the interior of Θ, then ˆθ ML is indeed the solution for log f(x; θ =. In this case, the maximum θ likelihood estimator can be defined as the value of θ that solves log f(x; θ =. Note θ that the last equality defines a system of p-equations. The vector log f(x; θ is called the θ score. 22
23 Example 2.6. Suppose, S n is a stochastic sample from N(µ, σ 2. Then, f(x i ; µ, σ 2 = 1 exp 1 (X i µ 2 2πσ 2 2 σ 2 and f(x; µ, σ 2 = n i=1 f(x i; µ, σ 2. Hence, log f(x; µ, σ 2 = n i=1 log f(x i; µ, σ 2 = n(log 2 σ2 + log 2π 1 n 2σ 2 i=1 (X i µ 2. The score vector in this case is given by ( ( log f(x; µ, µ σ2 log f(x; µ, σ 2 σ 2 and solving log f(x; µ, µ σ2 log f(x; µ, σ 2 σ 2 and ˆσ ML 2 = 1 n n i=1 (X i ˆµ ML 2. ( = 1 n 2σ 2 i=1 2(X i µ n + 1 n 2σ 2 2σ 4 i=1 (X i µ 2 (2.4 = and solving for µ and σ 2 gives ˆµ ML = 1 n n i=1 X i Example 2.7. Suppose, S n is a stochastic sample from a Pareto distribution with parameters (c, α where c is know (if c is not known, it can be estimated by ĉ = min{x 1,, X n }. Since the density associated with a Pareto density is given by f(x i ; c, α = function is given by and log f(x; c, α = f(x; c, α = n i=1 αc α X α+1 i Taking the first derivative and solving log f(x; α = gives α n ˆα ML = n i=1 log(x i/c. αcα X α+1 i, the likelihood (2.41 n (logα + αlog c (α + 1logX i. (2.42 i=1 Often, the solution for optimization problems such as (2.39 cannot be obtained analytically. In this case, it is necessary to numerically maximize log f(x, θ. In MATLAB the function fminsearch allows for conducting numerical maximization and minimization. See the code norm mle.m that conducts numerical optimization to obtain maximum likelihood estimation of the parameters µ and σ 2 for a normal density. Method of Moments estimation: The main idea behind method of moments estimation is to substitute theoretical moments with their sample equivalents. Consider the following two examples. Example 2.8. Consider a stochastic sample S n from a stochastic variable X N(µ, σ 2. Since E(X = µ, we define the estimator ˆµ M = 1 n n i=1 X i. That is, E(X is estimated by the sample average. Also, since σ 2 = V (X = E(X E(X 2 we define the estimator ˆσ 2 M = 1 n n i=1 (X i ˆµ M 2. 23
24 Example 2.9. Consider a stochastic sample S n from a stochastic variable X which has a Pareto distribution with parameters (c, α where c is known. Recall that E(X = αc, hence α 1 we write 1 n n i=1 X i = ˆα M c which implies that ˆα ˆα M 1 M = 1 n n i=1 X i. n i=1 X i c Note that whereas in the case of estimation of the parameters of a normal density the MM and ML estimators coincide, this is not the case when estimating the parameters of the Pareto distribution Evaluating estimators Since estimators are functions of stochastic variables, they are themselves (in general stochastic variables. As a result, it is of interest to inquire what are their distributions. Ideally, an arbitrary estimator ˆθ ought to have values that are very close and concentrated around the true parameter value θ. Unbiasedness, mentioned above, is a measure of closeness of ˆθ to θ. In essence, it says that the distribution of ˆθ is located at θ. Furthermore, a small variance (and in the case of unbiasedness, a small MSE means that the distribution of ˆθ is largely concentrated around θ. There are other useful ways to ascertain how close ˆθ is to θ. Some of the most useful concepts of closeness are related to the behavior of the estimator when the sample size n grows to, i.e, n. The collection of concepts and results that pertain to the behavior of ˆθ (or any sequence of stochastic variables X n as n is called Asymptotic Theory. One of the most used asymptotic concepts of closeness between an estimator ˆθ(X 1,, X n and the true parameter value θ is that of convergence in probability. An estimator is said to converge in probability to θ if for all ɛ, δ > there exists N ɛ,δ such that whenever n > N ɛ,δ we have P ({ ˆθ(X 1,, X n θ > ɛ} < δ. If this is the case we say that ˆθ(X 1,, X n θ. If θ R q, then we can write that ˆθ(X 1,, X n p θ if for all ɛ, δ > there exists N ɛ,δ such that whenever n > N ɛ,δ we have p 1 n P ( ˆθ(X 1,, X n θ > ɛ < δ. where ˆθ(X 1,, X n θ is the Euclidean distance between ˆθ(X 1,, X n and θ. Knowing that Z n ˆθ θ gets arbitrarily close to zero with probability approaching 1 as the sample size grows is useful, but it conveys no useful information about the distribution F n (z of Z n, other than the fact that as n it degenerates to {, if z < F (z = 1, if z. 24
25 A more useful result would be to know the circumstances under which there exists a sequence a n, which may be stochastic or non-stochastic, such that the distribution F anz n (z of a n Z n converges to a non-degenerate F (z. That is, F anz n (z F (z as n. Formally, we say that a sequence {X n } of stochastic variables with distribution functions F n converges in distribution to the stochastic variable X with distribution function F, if F n (x F (x for every point x where F is continuous. In this case we write X n d X. Theorem 2.7. Let X : Ω R be a stochastic variable, h : R [, such that E(h(X <. Then, for all M >, P ({ω : h(x(ω M} E(h(X M. Proof. Let A M = {ω : h(x(ω M} and note that for all ω Ω we have h(x(ω MI AM. Hence, E(h(X MP (A M. If we take h(x = X in Theorem 2.7 we conclude that P ({ω : X(ω M} E( X M. provided E( X <. This is called Markov s Inequality. Also, if we take h(x = (X E(X 2 in Theorem 2.7 we conclude that P ({ω : X(ω E(X(ω M} E((X E(X2 M 2 = V (X M 2. provided E( X <. Thus we have the following corollary called the Bienaymé-Chebyshev inequality. Corollary 2.1. Let h(x = X E(X in Theorem 2.7. Then, P ( X E(X M V (X M 2. 25
26 26
27 Bibliography Jacod, J., Protter, P., 2. Probability Essentials. Springer, New York, NY. Resnick, S. I., 25. A Probability Path. Birkhäuser, Boston, MA. 27
Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.
9 Point estimation 9.1 Rationale behind point estimation When sampling from a population described by a pdf f(x θ) or probability function P [X = x θ] knowledge of θ gives knowledge of the entire population.
More informationRandom Variables Handout. Xavier Vilà
Random Variables Handout Xavier Vilà Course 2004-2005 1 Discrete Random Variables. 1.1 Introduction 1.1.1 Definition of Random Variable A random variable X is a function that maps each possible outcome
More informationBusiness Statistics 41000: Probability 3
Business Statistics 41000: Probability 3 Drew D. Creal University of Chicago, Booth School of Business February 7 and 8, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office: 404
More informationChapter 2. Random variables. 2.3 Expectation
Random processes - Chapter 2. Random variables 1 Random processes Chapter 2. Random variables 2.3 Expectation 2.3 Expectation Random processes - Chapter 2. Random variables 2 Among the parameters representing
More informationLecture 10: Point Estimation
Lecture 10: Point Estimation MSU-STT-351-Sum-17B (P. Vellaisamy: MSU-STT-351-Sum-17B) Probability & Statistics for Engineers 1 / 31 Basic Concepts of Point Estimation A point estimate of a parameter θ,
More informationRandom Variables and Probability Distributions
Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering
More informationProbability and Random Variables A FINANCIAL TIMES COMPANY
Probability Basics Probability and Random Variables A FINANCIAL TIMES COMPANY 2 Probability Probability of union P[A [ B] =P[A]+P[B] P[A \ B] Conditional Probability A B P[A B] = Bayes Theorem P[A \ B]
More informationUQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.
UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. Random Variables 2 A random variable X is a numerical (integer, real, complex, vector etc.) summary of the outcome of the random experiment.
More informationcontinuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence
continuous rv Let X be a continuous rv. Then a probability distribution or probability density function (pdf) of X is a function f(x) such that for any two numbers a and b with a b, P(a X b) = b a f (x)dx.
More informationChapter 4: Asymptotic Properties of MLE (Part 3)
Chapter 4: Asymptotic Properties of MLE (Part 3) Daniel O. Scharfstein 09/30/13 1 / 1 Breakdown of Assumptions Non-Existence of the MLE Multiple Solutions to Maximization Problem Multiple Solutions to
More informationCase Study: Heavy-Tailed Distribution and Reinsurance Rate-making
Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making May 30, 2016 The purpose of this case study is to give a brief introduction to a heavy-tailed distribution and its distinct behaviors in
More informationLikelihood Methods of Inference. Toss coin 6 times and get Heads twice.
Methods of Inference Toss coin 6 times and get Heads twice. p is probability of getting H. Probability of getting exactly 2 heads is 15p 2 (1 p) 4 This function of p, is likelihood function. Definition:
More informationLecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.
Sufficient Statistics Lecture Notes 6 Sufficiency Data reduction in terms of a particular statistic can be thought of as a partition of the sample space X. Definition T is sufficient for θ if the conditional
More informationWeek 1 Quantitative Analysis of Financial Markets Basic Statistics A
Week 1 Quantitative Analysis of Financial Markets Basic Statistics A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationChapter 5. Statistical inference for Parametric Models
Chapter 5. Statistical inference for Parametric Models Outline Overview Parameter estimation Method of moments How good are method of moments estimates? Interval estimation Statistical Inference for Parametric
More informationA New Hybrid Estimation Method for the Generalized Pareto Distribution
A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD
More informationChapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as
Lecture 0 on BST 63: Statistical Theory I Kui Zhang, 09/9/008 Review for the previous lecture Definition: Several continuous distributions, including uniform, gamma, normal, Beta, Cauchy, double exponential
More informationContinuous random variables
Continuous random variables probability density function (f(x)) the probability distribution function of a continuous random variable (analogous to the probability mass function for a discrete random variable),
More informationTwo hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER
Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS Answer any FOUR of the SIX questions.
More informationExercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation
Exercise Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1 Exercise S 2 = = = = n i=1 (X i x) 2 n i=1 = (X i µ + µ X ) 2 = n 1 n 1 n i=1 ((X
More informationRandom variables. Contents
Random variables Contents 1 Random Variable 2 1.1 Discrete Random Variable............................ 3 1.2 Continuous Random Variable........................... 5 1.3 Measures of Location...............................
More informationAn Improved Skewness Measure
An Improved Skewness Measure Richard A. Groeneveld Professor Emeritus, Department of Statistics Iowa State University ragroeneveld@valley.net Glen Meeden School of Statistics University of Minnesota Minneapolis,
More informationدرس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی
یادگیري ماشین توزیع هاي نمونه و تخمین نقطه اي پارامترها Sampling Distributions and Point Estimation of Parameter (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی درس هفتم 1 Outline Introduction
More informationProbability. An intro for calculus students P= Figure 1: A normal integral
Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided
More informationCapital Allocation Principles
Capital Allocation Principles Maochao Xu Department of Mathematics Illinois State University mxu2@ilstu.edu Capital Dhaene, et al., 2011, Journal of Risk and Insurance The level of the capital held by
More informationPoint Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel
STATISTICS Lecture no. 10 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 8. 12. 2009 Introduction Suppose that we manufacture lightbulbs and we want to state
More informationChapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi
Chapter 4: Commonly Used Distributions Statistics for Engineers and Scientists Fourth Edition William Navidi 2014 by Education. This is proprietary material solely for authorized instructor use. Not authorized
More informationStatistics for Business and Economics
Statistics for Business and Economics Chapter 5 Continuous Random Variables and Probability Distributions Ch. 5-1 Probability Distributions Probability Distributions Ch. 4 Discrete Continuous Ch. 5 Probability
More informationHomework Assignments
Homework Assignments Week 1 (p. 57) #4.1, 4., 4.3 Week (pp 58 6) #4.5, 4.6, 4.8(a), 4.13, 4.0, 4.6(b), 4.8, 4.31, 4.34 Week 3 (pp 15 19) #1.9, 1.1, 1.13, 1.15, 1.18 (pp 9 31) #.,.6,.9 Week 4 (pp 36 37)
More informationPoint Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage
6 Point Estimation Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Point Estimation Statistical inference: directed toward conclusions about one or more parameters. We will use the generic
More informationIEOR 165 Lecture 1 Probability Review
IEOR 165 Lecture 1 Probability Review 1 Definitions in Probability and Their Consequences 1.1 Defining Probability A probability space (Ω, F, P) consists of three elements: A sample space Ω is the set
More informationSlides for Risk Management
Slides for Risk Management Introduction to the modeling of assets Groll Seminar für Finanzökonometrie Prof. Mittnik, PhD Groll (Seminar für Finanzökonometrie) Slides for Risk Management Prof. Mittnik,
More informationPractice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.
Practice Exercises for Midterm Exam ST 522 - Statistical Theory - II The ACTUAL exam will consists of less number of problems. 1. Suppose X i F ( ) for i = 1,..., n, where F ( ) is a strictly increasing
More informationQualifying Exam Solutions: Theoretical Statistics
Qualifying Exam Solutions: Theoretical Statistics. (a) For the first sampling plan, the expectation of any statistic W (X, X,..., X n ) is a polynomial of θ of degree less than n +. Hence τ(θ) cannot have
More informationChapter 8: Sampling distributions of estimators Sections
Chapter 8 continued Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample
More informationNormal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is
Normal Distribution Normal Distribution Definition A continuous rv X is said to have a normal distribution with parameter µ and σ (µ and σ 2 ), where < µ < and σ > 0, if the pdf of X is f (x; µ, σ) = 1
More informationMATH 3200 Exam 3 Dr. Syring
. Suppose n eligible voters are polled (randomly sampled) from a population of size N. The poll asks voters whether they support or do not support increasing local taxes to fund public parks. Let M be
More informationThe rth moment of a real-valued random variable X with density f(x) is. x r f(x) dx
1 Cumulants 1.1 Definition The rth moment of a real-valued random variable X with density f(x) is µ r = E(X r ) = x r f(x) dx for integer r = 0, 1,.... The value is assumed to be finite. Provided that
More informationCS134: Networks Spring Random Variables and Independence. 1.2 Probability Distribution Function (PDF) Number of heads Probability 2 0.
CS134: Networks Spring 2017 Prof. Yaron Singer Section 0 1 Probability 1.1 Random Variables and Independence A real-valued random variable is a variable that can take each of a set of possible values in
More informationChapter 7: Estimation Sections
1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:
More informationDiscrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)
3 Discrete Random Variables and Probability Distributions Stat 4570/5570 Based on Devore s book (Ed 8) Random Variables We can associate each single outcome of an experiment with a real number: We refer
More information2 of PU_2015_375 Which of the following measures is more flexible when compared to other measures?
PU M Sc Statistics 1 of 100 194 PU_2015_375 The population census period in India is for every:- quarterly Quinqennial year biannual Decennial year 2 of 100 105 PU_2015_375 Which of the following measures
More informationFinancial Risk Forecasting Chapter 9 Extreme Value Theory
Financial Risk Forecasting Chapter 9 Extreme Value Theory Jon Danielsson 2017 London School of Economics To accompany Financial Risk Forecasting www.financialriskforecasting.com Published by Wiley 2011
More informationSTAT/MATH 395 PROBABILITY II
STAT/MATH 395 PROBABILITY II Distribution of Random Samples & Limit Theorems Néhémy Lim University of Washington Winter 2017 Outline Distribution of i.i.d. Samples Convergence of random variables The Laws
More informationPoint Estimation. Some General Concepts of Point Estimation. Example. Estimator quality
Point Estimation Some General Concepts of Point Estimation Statistical inference = conclusions about parameters Parameters == population characteristics A point estimate of a parameter is a value (based
More informationMartingales. by D. Cox December 2, 2009
Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a
More informationIEOR E4703: Monte-Carlo Simulation
IEOR E4703: Monte-Carlo Simulation Generating Random Variables and Stochastic Processes Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationELEMENTS OF MONTE CARLO SIMULATION
APPENDIX B ELEMENTS OF MONTE CARLO SIMULATION B. GENERAL CONCEPT The basic idea of Monte Carlo simulation is to create a series of experimental samples using a random number sequence. According to the
More information4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.
4-1 Chapter 4 Commonly Used Distributions 2014 by The Companies, Inc. All rights reserved. Section 4.1: The Bernoulli Distribution 4-2 We use the Bernoulli distribution when we have an experiment which
More informationActuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems
Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems Spring 2005 1. Which of the following statements relate to probabilities that can be interpreted as frequencies?
More information25 Increasing and Decreasing Functions
- 25 Increasing and Decreasing Functions It is useful in mathematics to define whether a function is increasing or decreasing. In this section we will use the differential of a function to determine this
More informationAMH4 - ADVANCED OPTION PRICING. Contents
AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5
More informationWeek 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals
Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg :
More informationIntroduction to Probability Theory and Stochastic Processes for Finance Lecture Notes
Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes Fabio Trojani Department of Economics, University of St. Gallen, Switzerland Correspondence address: Fabio Trojani,
More informationStatistics for Managers Using Microsoft Excel 7 th Edition
Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 5 Discrete Probability Distributions Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap 5-1 Learning
More informationSubject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018
` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.
More informationModelling Environmental Extremes
19th TIES Conference, Kelowna, British Columbia 8th June 2008 Topics for the day 1. Classical models and threshold models 2. Dependence and non stationarity 3. R session: weather extremes 4. Multivariate
More informationProblems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:
Math 224 Fall 207 Homework 5 Drew Armstrong Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman: Section 3., Exercises 3, 0. Section 3.3, Exercises 2, 3, 0,.
More informationSYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data
SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 5, 2015
More information1 Residual life for gamma and Weibull distributions
Supplement to Tail Estimation for Window Censored Processes Residual life for gamma and Weibull distributions. Gamma distribution Let Γ(k, x = x yk e y dy be the upper incomplete gamma function, and let
More informationExam M Fall 2005 PRELIMINARY ANSWER KEY
Exam M Fall 005 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 C 1 E C B 3 C 3 E 4 D 4 E 5 C 5 C 6 B 6 E 7 A 7 E 8 D 8 D 9 B 9 A 10 A 30 D 11 A 31 A 1 A 3 A 13 D 33 B 14 C 34 C 15 A 35 A
More informationProbability Distributions for Discrete RV
Probability Distributions for Discrete RV Probability Distributions for Discrete RV Definition The probability distribution or probability mass function (pmf) of a discrete rv is defined for every number
More informationCharacterization of the Optimum
ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing
More informationApplied Statistics I
Applied Statistics I Liang Zhang Department of Mathematics, University of Utah July 14, 2008 Liang Zhang (UofU) Applied Statistics I July 14, 2008 1 / 18 Point Estimation Liang Zhang (UofU) Applied Statistics
More informationCS340 Machine learning Bayesian model selection
CS340 Machine learning Bayesian model selection Bayesian model selection Suppose we have several models, each with potentially different numbers of parameters. Example: M0 = constant, M1 = straight line,
More informationModelling Environmental Extremes
19th TIES Conference, Kelowna, British Columbia 8th June 2008 Topics for the day 1. Classical models and threshold models 2. Dependence and non stationarity 3. R session: weather extremes 4. Multivariate
More informationCommonly Used Distributions
Chapter 4: Commonly Used Distributions 1 Introduction Statistical inference involves drawing a sample from a population and analyzing the sample data to learn about the population. We often have some knowledge
More informationRandom Variables and Applications OPRE 6301
Random Variables and Applications OPRE 6301 Random Variables... As noted earlier, variability is omnipresent in the business world. To model variability probabilistically, we need the concept of a random
More informationIEOR E4602: Quantitative Risk Management
IEOR E4602: Quantitative Risk Management Basic Concepts and Techniques of Risk Management Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationSimulation Wrap-up, Statistics COS 323
Simulation Wrap-up, Statistics COS 323 Today Simulation Re-cap Statistics Variance and confidence intervals for simulations Simulation wrap-up FYI: No class or office hours Thursday Simulation wrap-up
More informationStatistical estimation
Statistical estimation Statistical modelling: theory and practice Gilles Guillot gigu@dtu.dk September 3, 2013 Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 1 / 27 1 Introductory example 2
More informationUniversität Regensburg Mathematik
Universität Regensburg Mathematik Modeling financial markets with extreme risk Tobias Kusche Preprint Nr. 04/2008 Modeling financial markets with extreme risk Dr. Tobias Kusche 11. January 2008 1 Introduction
More informationBROWNIAN MOTION Antonella Basso, Martina Nardon
BROWNIAN MOTION Antonella Basso, Martina Nardon basso@unive.it, mnardon@unive.it Department of Applied Mathematics University Ca Foscari Venice Brownian motion p. 1 Brownian motion Brownian motion plays
More informationLecture 2. Probability Distributions Theophanis Tsandilas
Lecture 2 Probability Distributions Theophanis Tsandilas Comment on measures of dispersion Why do common measures of dispersion (variance and standard deviation) use sums of squares: nx (x i ˆµ) 2 i=1
More informationPoint Estimation. Copyright Cengage Learning. All rights reserved.
6 Point Estimation Copyright Cengage Learning. All rights reserved. 6.2 Methods of Point Estimation Copyright Cengage Learning. All rights reserved. Methods of Point Estimation The definition of unbiasedness
More informationDependence Structure and Extreme Comovements in International Equity and Bond Markets
Dependence Structure and Extreme Comovements in International Equity and Bond Markets René Garcia Edhec Business School, Université de Montréal, CIRANO and CIREQ Georges Tsafack Suffolk University Measuring
More informationBasic notions of probability theory: continuous probability distributions. Piero Baraldi
Basic notions of probability theory: continuous probability distributions Piero Baraldi Probability distributions for reliability, safety and risk analysis: discrete probability distributions continuous
More information8.1 Estimation of the Mean and Proportion
8.1 Estimation of the Mean and Proportion Statistical inference enables us to make judgments about a population on the basis of sample information. The mean, standard deviation, and proportions of a population
More informationProbability and Statistics
Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 3: PARAMETRIC FAMILIES OF UNIVARIATE DISTRIBUTIONS 1 Why do we need distributions?
More informationOptimizing Portfolios
Optimizing Portfolios An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2010 Introduction Investors may wish to adjust the allocation of financial resources including a mixture
More informationME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.
ME3620 Theory of Engineering Experimentation Chapter III. Random Variables and Probability Distributions Chapter III 1 3.2 Random Variables In an experiment, a measurement is usually denoted by a variable
More informationConstruction and behavior of Multinomial Markov random field models
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2010 Construction and behavior of Multinomial Markov random field models Kim Mueller Iowa State University Follow
More informationBasic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract
Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, 2013 Abstract Introduct the normal distribution. Introduce basic notions of uncertainty, probability, events,
More informationHomework Problems Stat 479
Chapter 10 91. * A random sample, X1, X2,, Xn, is drawn from a distribution with a mean of 2/3 and a variance of 1/18. ˆ = (X1 + X2 + + Xn)/(n-1) is the estimator of the distribution mean θ. Find MSE(
More informationDrunken Birds, Brownian Motion, and Other Random Fun
Drunken Birds, Brownian Motion, and Other Random Fun Michael Perlmutter Department of Mathematics Purdue University 1 M. Perlmutter(Purdue) Brownian Motion and Martingales Outline Review of Basic Probability
More informationIntroduction to Algorithmic Trading Strategies Lecture 8
Introduction to Algorithmic Trading Strategies Lecture 8 Risk Management Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Value at Risk (VaR) Extreme Value Theory (EVT) References
More information**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:
**BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,
More informationProbability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016
Probability Theory Probability and Statistics for Data Science CSE594 - Spring 2016 What is Probability? 2 What is Probability? Examples outcome of flipping a coin (seminal example) amount of snowfall
More informationReview for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom
Review for Final Exam 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom THANK YOU!!!! JON!! PETER!! RUTHI!! ERIKA!! ALL OF YOU!!!! Probability Counting Sets Inclusion-exclusion principle Rule of product
More informationSTAT 830 Convergence in Distribution
STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2013 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2013 1 / 31
More informationMath489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5
Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5 Steve Dunbar Due Fri, October 9, 7. Calculate the m.g.f. of the random variable with uniform distribution on [, ] and then
More informationAsymptotic Notation. Instructor: Laszlo Babai June 14, 2002
Asymptotic Notation Instructor: Laszlo Babai June 14, 2002 1 Preliminaries Notation: exp(x) = e x. Throughout this course we shall use the following shorthand in quantifier notation. ( a) is read as for
More informationEVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz
1 EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu
More informationSection 7.1: Continuous Random Variables
Section 71: Continuous Random Variables Discrete-Event Simulation: A First Course c 2006 Pearson Ed, Inc 0-13-142917-5 Discrete-Event Simulation: A First Course Section 71: Continuous Random Variables
More informationTangent Lévy Models. Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford.
Tangent Lévy Models Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford June 24, 2010 6th World Congress of the Bachelier Finance Society Sergey
More informationFinancial Econometrics
Financial Econometrics Volatility Gerald P. Dwyer Trinity College, Dublin January 2013 GPD (TCD) Volatility 01/13 1 / 37 Squared log returns for CRSP daily GPD (TCD) Volatility 01/13 2 / 37 Absolute value
More informationStatistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015
Statistical Analysis of Data from the Stock Markets UiO-STK4510 Autumn 2015 Sampling Conventions We observe the price process S of some stock (or stock index) at times ft i g i=0,...,n, we denote it by
More informationLog-linear Modeling Under Generalized Inverse Sampling Scheme
Log-linear Modeling Under Generalized Inverse Sampling Scheme Soumi Lahiri (1) and Sunil Dhar (2) (1) Department of Mathematical Sciences New Jersey Institute of Technology University Heights, Newark,
More informationEstimating the Greeks
IEOR E4703: Monte-Carlo Simulation Columbia University Estimating the Greeks c 207 by Martin Haugh In these lecture notes we discuss the use of Monte-Carlo simulation for the estimation of sensitivities
More informationCovariance and Correlation. Def: If X and Y are JDRVs with finite means and variances, then. Example Sampling
Definitions Properties E(X) µ X Transformations Linearity Monotonicity Expectation Chapter 7 xdf X (x). Expectation Independence Recall: µ X minimizes E[(X c) ] w.r.t. c. The Prediction Problem The Problem:
More information