MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 17

MS&E 32 Spring 2-3 Stochastic Systems June, 203 Prof. Peter W. Glynn Page of 7 Section 0: Martingales Contents 0. Martingales in Discrete Time............................... 0.2 Optional Sampling for Discrete-Time Martingales.................... 5 0.3 Martingales for Discrete-Time Markov Chains...................... 0 0.4 The Strong Law for Martingales............................. 4 0.5 The Central Limit Theorem for Martingales....................... 6 0. Martingales in Discrete Time A fundamental tool in the analysis of DTMC s and continuous-time Markov processes is the notion of a martingale. Martingales also underlie the definition we will adopt for defining stochastic integrals with respect to Brownian motion. A martingale is basically a real-valued sequence that is a suitable generalization of a random walk with independent, mean-zero increments. Definition 0.. Let (M n : n 0) be a sequence of real-valued random variables. Then, (M n : n 0) is said to be a martingale (with respect to the sequence of random elements (Z n : n 0) if: (i) E M n < for n 0; (ii) for each n 0, there exists a deterministic function g n ( ) such that (iii) E[M n+ Z 0, Z,..., Z n ] = M n, for n 0. M n = g n (Z 0, Z,..., Z n ); Remark 0.. When a process (M n : n 0) satisfies condition (ii), one says that (M n : n 0) is adapted to (Z n : n 0). The critical component of the martingale definition is condition (iii). If we view M n as the fortune of a gambler at time n, then condition (iii) is asserting that the gambler is involved in playing a fair game, in which he/she has no propensity (in expectation) to either win or lose on any given gamble. As we asserted earlier, a random walk with independent mean-zero increments is a martingale. To see this, let S 0, X, X 2,... be independent random variables with finite mean, and suppose that EX i = 0 for i. Set Z n = S n = S 0 + X + + X n. Then, conditions (i) and (ii) of Definition 0.. are trivial to verify. For condition (iii), observe that E[S n+ S 0,..., S n ] = E[S n + X n+ S 0,..., S n ] = S n + E[X n+ S 0,..., S n ] = S n + EX n+ = S n. Martingales inherit many of the properties of mean-zero random walks. In view of the analogy with random walks, it is natural to consider the increments D i = M i M i, i

namely, the martingale differences. The following proposition is a clear generalization of two of the most important properties of mean-zero random walks. Proposition 0.. Let (M n : n 0) be a martingale with respect to (Z n : n 0). Then, In addition, if EM 2 n < for n 0, then EM n = EM 0 n 0. (0..) Cov(D i, D j ) = 0, i j (0..2) so that Var[M n ] = Var[M 0 ] + Var[D i ]. (0..3) Proof: Relation (0..) is immediate from condition (iii) of the martingale definition. For (0..2), note that (0..) implies that ED i = 0, so that (0..2) is equivalent to asserting that ED i D j = 0 for i < j. But E[D i D j Z 0,..., Z j ] = D i E[D j Z 0,..., Z j ] = 0, where condition (ii) of the martingale definition was used for the first equality, and condition (iii) was used for the final step. Taking expectations with respect to (Z 0,..., Z j ), we get (0..2). Finally, (0..3) is immediate from (0..2). Definition 0..2 A martingale (M n : n 0) for which EM 2 n < for n 0 is called a squareintegrable martingale. Before we turn to exploring further properties of martingales, let us develop some additional examples of martingales in the random walk setting. Example 0.. Let (X n : n ) be a sequence of iid mean-zero random variables with finite variance σ 2. Let S n = X + + X n and let M n = S 2 n nσ 2. Then (M n : n 0) is a martingale with respect to (S n : n 0). The critical property to verify is (iii). Note that E[M n+ S 0,..., S n ] = E[(S n + X n+ ) 2 (n + )σ 2 S 0,..., S n ] = E[S 2 n + 2S n X n+ + X 2 n+ (n + ) 2 σ 2 S 0,..., S n ] = S 2 n + 2S n E[X n+ S 0,..., S n ] + E[X 2 n+ S 0,..., S n ] (n + ) 2 σ 2 = S 2 n + σ 2 (n + ) 2 σ 2 = M n. Example 0..2 Let (X n : n ) be a sequence of iid random variables with common density g. Suppose that f is another density with the property that whenever g(x) = 0, then f(x) = 0. Set L 0 = and n f(x i ) L n = g(x i ), n 2

Then, (L n : n 0) is a martingale with respect to (X n : n ). Again, the critical property is verifying (iii). Here, [ ] [ ] f(x n+ ) E[L n+ X,..., X n ] = E L n g(x n+ ) X f(x n+ ) f(x),..., X n = L n E L n = L n g(x n+ ) g(x) g(x) dx = L n, since f is a density that integrates to. This is known as a likelihood ratio martingale. To show why the likelihood ratio martingale arises naturally, suppose that we have observed an iid sample from a population, yielding observations X, X 2,..., X n. Assume that the underlying population is known to be iid, either with common density f or with common density g. To test the hypothesis that the X i s have common density f (the f-hypothesis ) against the hypothesis that the X i s have common density g (the g-hypothesis ), the Neyman-Pearson lemma asserts that we should accept the f-hypothesis if the relative likelihood f(x ) f(x n ) g(x ) g(x n ) (0..4) is sufficiently large, and reject it otherwise. So, studying L n in the case where the X i s have common density g corresponds to studying the test statistic (0..4) when the state of nature is that the g-hypothesis is true. Given this interpretation, it seems natural to expect that L n converges to zero as the sample size n goes to positive infinity. This is because for a large sample size n, it is extremely unlikely that such a sample will be better explained by the f-hypothesis than by the other one. The fact that L n ought to go to zero as n is perhaps a bit surprising, given that EL n = for n 0. To prove that L n 0 almost surely as n, note that log L n = ( ) f(xi ) log. g(x i ) Then, the strong law of large numbers guarantees that ( ) n log L f(xi ) n E log g(x i ) a.s. as n. In other words, n log L n log ( ) f(x) g(x) dx. (0..5) g(x) (The right-hand side of (0..5) is what is known as a relative entropy.) Since log is strictly concave, Jensens inequality asserts that if f g, E log ( ) ( ) f(xi ) f(xi ) < log E = 0 (0..6) g(x i ) g(x i ) As a consequence, not only does L n converge to zero as n a.s, but the rate of convergence is exponentially fast. It is worth noting that this is an example of a sequence of random variables (L n : n 0) for which L n 0 a.s. and yet EL n 0 as n (in other words, passing limits through expectations is not always valid). 3

Example 0..3 In this example, we specialize the likelihood ratio martingale a bit. Suppose that the X i s are iid with common density g, and suppose that the moment generating function m X (θ) = Ee θx i converges in some neighborhood of the origin. For θ within the domain of convergence of m X ( ), let f(x) = eθx g(x) m X (θ), or, equivalently, f(x) = e θx ψ(θ) g(x), where ψ(θ) = log m X (θ). In this case, n f(x i ) L n = g(x i ) = eθsn nψ(θ) (0..7) The martingale (L n : n 0) defined by (0..7) is known as an exponential martingale. Because the random walk (S n : n 0) appears explicitly in the exponent of the martingale, (L n : n 0) is well-suited to studying random walks. Some indication of the power of this martingale should be apparent, if we explicitly display the dependence of L n on θ as follows: L n (θ) = e θsn nψ(θ) The defining property (iii) of a martingale asserts that E[L n+ (θ) S 0,..., S n ] = L n (θ). For θ inside the domain of convergence of m X ( ), one can interchange the derivative and expectation, yielding E[L n+(θ) S 0,..., S n ] = L n(θ). In particular, (L n(0) : n 0) is a martingale. But L n(0) = S n nψ (0). It turns out that ψ (0) = EX. So, by differentiating our exponential martingale, we retrieve the random walk martingale. And by differentiating a second time, it turns out that L n(0) is the martingale of Example 0..2. Through successive differentiation, we can obtain a whole infinite family of such martingales. Exercise 0.. (a) Prove that ψ( ) is convex. (b) Prove that ψ ( ) = EX. (c) Prove that ψ (0) = Var[X ]. (d) Prove that L n(0) = (S n nµ) 2 nσ 2. (e) Compute L n (0). We now turn to a fundamental result in the theory of martingales known as the Martingale Convergence Theorem. Theorem 0.. (Martingale Convergence Theorem in L 2 ) Let (M n : n 0) be a martingale with respect to (Z n : n 0). If sup n 0 EM 2 n <, then there exists a square-integrable random variable M such that E[(M n M 0 ) 2 ] 0 as n, i.e. M n converges to M in mean square. 4

Proof: The space L 2 of square-integrable random variables is a Hilbert space under the inner product X, Y = E[XY ]. Since EMn 2 = EM0 2 + EDi 2, it follows that ED2 i <. For ɛ > 0, choose m = m(ɛ) so that n 2 > n m, E(M n2 M n ) 2 = n 2 j=n + ED 2 j < ɛ i=m ED2 i < ɛ. Then, for so that (M n : n 0) is a Cauchy sequence in L 2. Then, the completeness of L 2 yields the conclusion of the theorem. Actually, one does not need square integrability in order that the Martingale Convergence Theorem hold. Theorem 0..2 (Martingale Convergence Theorem) Let (M n : n 0) be a martingale with respect to (Z n : n 0). If sup n 0 E M n <, then there exists a finite-valued random variable M such that M n M a.s. as n. For a proof, see p. 233 of Probability: Theory and Examples 3rd ed. by R. Durrett. We conclude this section with a brief discussion of stochastic integrals in discrete time. Let (M)n : n 0) be a square-integrable martingale with respect to (Z n : n 0). Suppose that (W n : n 0) is a sequence of random variables that is adapted to (Z n : n 0). We define the stochastic integral of (W n : n 0) with respect to (M n : n 0) as the sequence V n = W i D i = W i (M i M i ) = W i M i We could also have defined the stochastic integral here as n W i M i. But in that case, we would lose the nice properties listed below. Exercise 0..2 Let (M n : n 0) be a square-integrable martingale with respect to (Z n : n 0), with M 0 = 0. Suppose (W n : n 0) is a square-integrable sequence that is adapted to (Z n : n 0). (a) Prove that if V 0 = 0 and V n = n W i M i for n, then (V n : n 0) is a martingale with respect to (Z n : n 0). (b) Suppose that the martingale differences (D i : i ) are a stationary sequence of independent random variables. Show that EVn 2 = σ 2 n i=0 EW i 2, where σ 2 = Var[D i ]. 0.2 Optional Sampling for Discrete-Time Martingales An important property of martingales is EM n = EM 0 n 0 (0.2.) The theory of optional sampling is concerned with extending (0.2.) from deterministic n to random times T. As in the discussion of the strong Markov property, it is natural to restrict ourselves to stopping times. However, EM T = EM 0 (0.2.2) 5

fails to hold for all finite-valued stopping times T. Example 0.2. Let (S n : n 0) be a random walk with S 0 = 0 and iid increments (X n : n ) defined by P(X n = ) = P(X n = ) = 2. Put T = inf{n 0 : S n = }. Since (S n : n 0) is null recurrent, T < a.s. and S T =. Therefore, ES T = and ES 0 = 0, and so ES T ES 0. Hence, the class of stopping times needs to be restricted somewhat. Theorem 0.2. Let (M n : n 0) be a martingale with respect to (Z n : n 0). Suppose that T is a bounded random variable that is a stopping time with respect to (Z n : n 0). Then EM T = EM 0. Proof: Let m be such that P (T m) =. Then M T = M 0 + m D ii(t i), and thus EM T = EM 0 + E m D i I(T i) (0.2.3) Because T is a stopping time, E[D i I(T i) Z 0,..., Z i ] = I(T i)e[d i Z 0,..., Z i ] = 0, and so m E D i I(T i) = 0 If T is a stopping time, then T n is a stopping time for n 0 (and is clearly bounded). So, optional sampling applies at T n (see Theorem 0.2.), i.e. EM T n = EM 0 for n 0. If T < a.s., then M T n M T a.s. as n. Hence, if then (0.2.2) holds, since E lim n M T n = lim n EM T n, (0.2.4) EM T = E lim n M T n = lim n EM T n = lim n EM 0 = EM 0. Therefore, the key to establishing (0.2.2) is (0.2.4). There are various results which one can invoke to justify (0.2.4); the most powerful of these results is the Dominated Convergence Theorem. To apply this result, we need to find a random variable W having finite mean, such that M T n W for n 0. The obvious candidate for W is W = M 0 + T D i, (0.2.5) where D i = D i. So, if EW <, we conclude that (0.2.4) is valid. Proposition 0.2. Suppose that there exists c < such that P(D i c) = for i. ET <, then EM T = EM 0. If 6

Proof: Note that W M 0 + ct. Since ET <, then EW <. Then, the Dominated Convergence Theorem implies that EM T n EM T as n, yielding the result. Now, let s turn to an application of optional sampling. Application 0.2. Let (S n : n 0) be a random walk with S 0 = 0 and iid increments (X n : n ) defined by P(X n = ) = P(X n = ) = 2. Let T = inf{n 0 : S n a or S n b} be the exit time from [ a, b]. Suppose that we wish to compute for P(S T = a), the probability that the random walk exits the left boundary. (This is basically the gambler s ruin computation for the probability of ruin.) Note that D i = and ET < (see Exercise 0.2.). Hence, Proposition 0.2. applies and ES T = 0. However, ES T = ap(s T = a) + bp(s T = b) = ap(s T = a) + b[ P(S T = a)]. Therefore, P(S T = a) = b/(a + b). Exercise 0.2. (a) Prove that ET < in Application 0.2.. (b) Compute the value of P(S T = a) by setting up a suitable system of linear equations involving the unknowns P x (S T = a) and solving them. (This is an alternative approach to computing the exit probability.) Application 0.2.2 In this continuation of Application 0.2., we wish to compute ET (In the gambler s ruin setting, this is the mean duration of the game). Let M n = Sn 2 nσ 2, where σ 2 = VarX i =. Assuming that (0.2.2) holds, Solving for EST 2, we have ES 2 T = σ 2 ET = ET. (0.2.6) ES 2 T = a 2 P(S T = a) + b 2 P(S T = b) = a2 b + ab 2 so ET = ab. Does Proposition 0.2. apply? Here, a + b = ab, D i = S 2 i S 2 i = (S i + S i )X i. Clearly, the D i do not satisfy the hypotheses of Proposition 0.2., so something else is needed here. Proposition 0.2.2 Suppose that there exists c < for which If ET <, then EM T = EM 0. E[ D i Z 0, Z,..., Z i ] c on {T i} for i. Proof: Note that EW = E M 0 + E D i I(T i) = E M 0 + E D i I(T i). However, E[ D i I(T i) Z 0, Z,..., Z i ] = I(T i)e[ D i Z 0, Z,..., Z i ] ci(t i). Thus, EW E M 0 + c EI(T i) = E M0 + cet <, and consequently the Dominated Convergence Theorem applies. 7

Application 5.2.2 (continued) Here, D i ( S i + S i )+. So, on {T i}, D i 2( a b )+, validating the hypotheses of Proposition 0.2.2, and thus completing the desired computation. How do we perform corresponding calculations if the random walk does not have mean zero? Specifically, suppose that (S n : n 0) is a random walk with S 0 = 0 and iid increments (X n : n ) given by P(X n = ) = p = P(X n = ). Here, the key is to switch to our exponential martingale. Application 0.2.3 Here, m X (θ) = pe θ + ( p)e θ, so ψ(θ) = log(pe θ + ( p)e θ ). Then, the martingale of interest is L n (θ) = e θsn nψ(θ). Assuming that optional sampling applies at time T, we arrive at EL T (θ) =, or, in other words, Ee θs T T ψ(θ) =. (0.2.7) To compute the exit probabilities from [ a, b], it is desirable to eliminate the term T ψ(θ) from the exponent of (0.2.7). Recall that ψ is convex (see Exercise 0..). There exists a unique θ 0 such that ψ(θ ) = 0, given by ( ) p θ = log. p Substituting θ = θ into (0.2.7), we get Ee θ S T =. But Eeθ S T = e θ ap(s T = a) + eθ bp(s T = b). Hence, P(S T = a) = ( p p ( ) b p p ) b ) a ( p p (This is basically the probability of ruin in a gambler s ruin problem that is not fair.) Exercise 0.2.2 Rigorously apply the optional sampling theorem in Application 0.2.3. Application 0.2.4 Let (S n : n 0) be a random walk with S 0 = 0 and iid increments (X n : n ) given by P(X n = ) = p = P(X n = ) with p > /2. This is a walk with positive drift, so that T < a.s. if we set T = inf{n 0 : S n b}. Our goal here is to compute the moment generating function of T, using martingale methods. Assuming that we can invoke the optional sampling theorem at T, Ee θsn nψ(θ) = (0.2.8) For T as described above, S T = b (This is a consequence of the continuity of the nearestneighbor random walk. If X i can take on values greater than or equal to 2, then S T would not be deterministic, and this calculation becomes much harder). Relation (0.2.8) yields Set γ = γ(θ) so that θ = ψ (γ). Then, Ee T ψ(θ) = e θb Ee T γ = e ψ (γ)b 8

is the moment generating function of T (In computing ψ (γ), one may find multiple roots; to formally determine the appropriate root, note that the function Ee γt of the non-negative random variable T must be non-increasing in γ). To make this result rigorous, note that if p > /2, then ψ (0) > 0. The convexity of ψ( ) then guarantees that ψ(θ) > 0 for θ > 0. Consequently, for θ > 0, e θs T n ψ(θ)(t n) e θs T n e θb, so the Dominated Convergence Theorem ensures that (0.2.8) holds for θ > 0. Then for γ > 0, let η = ψ (γ) be the non-negative root of ψ(η) = γ (0.2.9) Relation (0.2.9) yields the expression Ee γt = e ψ (γ)b = e ηb, where ψ is defined as above. Note that a rigorous application of optional sampling theory has led us to the correct choice of root for the equation (0.2.9). A similar analysis is possible for the one-sided hitting time T = inf{n 0 : S n a} with a > 0. Since p > /2, T is infinite with positive probability in this case. Again, consider the sequence e θs T n ψ(θ)(t n). Note that if θ < 0 and ψ(θ) > 0, this sequence is bounded above by e θa. Hence, we may interchange limits and expectations in the expression thereby yielding the identity e θa E[e T ψ(θ) I(T n)] + E[e θsn nψ(θ) I(T > n)] =, E[e T ψ(θ) ; T < ] = e θa, for θ < 0 satisfying ψ(θ) > 0. So, for γ 0, let η = ψ (γ) be the root less than or equal to θ = log(( p)/p) < 0 defined by ψ(η) = γ. For the root defined as above, we then have E[e γt ; T < ] = e ψ (γ)a. Note that by setting γ = 0, we obtain the identity P(T < ) = e θ a. In other words, we have computed the probability that a positive drift nearest neighbor random walk ever drops below a. The theory of optional sampling extends beyond the martingale setting to supermartingales and submartingales. Definition 0.2. Let (M n : n 0) be an integrable sequence of random variables that is adapted to (Z n : n 0). If for n 0, E[M n+ Z 0,..., Z n ] M n, then (M n : n 0) is said to be a supermartingale with respect to (Z n : n 0). On the other hand, if E[M n+ Z 0,..., Z n ] M n, then (M n : n 0) is said to be a submartingale with respect to (Z n : n 0). 9

If M n corresponds to the fortune of a gambler at time n, then a supermartingale indicates that the game is unfavorable to the gambler, whereas a submartingale indicates that the game is favorable. Proposition 0.2.3 Let T be a stopping time that is adapted to (Z n : n 0). If (M n : n 0) is a supermartingale with respect to (Z n : n 0), then EM T n EM 0, n 0. On the other hand, if (M n : n 0) is a submartingale with respect to (z n : n 0), then Exercise 0.2.3 Prove Proposition 0.2.3. EM T n EM 0, n 0. Exercise 0.2.4 Let (M n : n 0) be a martingale with respect to (Z n : n 0). Suppose that φ : R R is a convex function for which E φ(m n ) < for n 0. Prove that (φ(m n ) : n 0) is a submartingale with respect to (Z n : n 0). 0.3 Martingales for Discrete-Time Markov Chains In this section, we show how the random walk martingales introduced earlier generalize to the DTMC setting. Each of the martingales constructed here will have natural analogs in the SDE context. Let (Y n : n 0) be a real-valued sequence of random variables, not necessarily Markov. A standard trick for constructing a martingale in this very general setting is to set D i = Y i E[Y i Y 0,..., Y i ] for i. Assuming that the Y i s are integrable, then the D i s are martingale differences with respect to the Y i s. Hence, M n = [Y i E[Y i Y 0,..., Y i ]] is a martingale. The same kind of idea works nicely in the DTMC setting. For f : S R that is bounded, note that D i = f(x i ) E[f(X i ) X 0,..., X i ] = f(x i ) E[f(X i ) X i ] = f(x i ) (P f)(x i ) is a martingale difference with respect to (X i : i 0). Hence, is a mean-zero martingale. But M n = = M n = [f(x i ) (P f)(x i )] [f(x i ) (P f)(x i )] n [f(x i ) (P f)(x i )] + f(x n ) f(x 0 ) i=0 n = f(x n ) f(x 0 ) (Af)(X i ) 0 i=0

It follows easily that M n = f(x n ) n i=0 (Af)(X i) is a martingale whenever f is bounded. We have proved the following result. Proposition 0.3. For f : S R bounded, M n = f(x n ) n i=0 (Af)(X i) is a martingale with respect to (X n : n ). This martingale is known as the Dynkin martingale. Viewing (Af)(X i ) as the increment of a random walk-type process, this is clearly the DTMC analog to the random walk martingale. Suppose that Af = 0. Then Proposition 0.3. implies that (f(x n ) : n 0) is a martingale with respect to (X n : n 0). Definition 0.3. A function f : S R for which Af = 0 is called a harmonic function. The term harmonic function is widely used in the analysis literature. It refers to functions f : R d R for which f = 0, where = 2 2 x 2 + 2 2 x 2 + + 2 2 2 x 2. d (The operator is known as the Laplacian operator.) Note that if the Markov chain X corresponds (for example) to simple random walk on the lattice plane, then { /4 if (x 2, y 2 ) {(x +, y ), (x, y ), (x, y + ), (x, y )} P ((x, y ), (x 2, y 2 )) =. 0 otherwise Requiring that f be harmonic in this setting forces f to satisfy f(x +, y ) + f(x, y ) + f(x, y + ) + f(x, y ) 4f(x, y ) 4 = 0 (0.3.) The left-hand side term turns out to be a finite-difference approximation to f in two dimensions. Thus, Definition 0.3. legitimately extends the classical notion of harmonic functions. Proposition 0.3.2 (a) If f is a bounded function for which Af 0, then (f(x n ) : n 0) is a supermartingale with respect to (X n : n 0). (b) If f is a bounded function for which Af 0, then (f(x n ) : n 0) is a submartingale with respect to (X n : n 0). Exercise 0.3. Prove Proposition 0.3.2. Definition 0.3.2 A function f for which Af 0 is said to be superharmonic. If instead Af 0, then f is said to be subharmonic. Again, this definition extends the classical usage, which states that f is superharmonic if f 0 and subharmonic if f 0. It is in order to remain consistent with the classical usage that we apply the term supermartingale rather than submartingale to an unfavorable game in which M n has a tendency to decrease in expectation. There is a nice connection between harmonic functions and recurrence.

Exercise 0.3.2 Suppose that X is an irreducible DTMC. (a) If X is recurrent, prove that all the bounded harmonic functions are constants. (Hint: This is easy if S <. To prove the general case, use Theorem 0..2.) (b) If X is transient, show that there always exists at least one non-constant bounded harmonic function. To apply martingale theory to additive processes of the form n g(x j ) (0.3.2) j=0 with X Markov, the obvious device to apply is Proposition 0.3.. So, note that if we could find f such that Af = g (0.3.3) then we effectively would have our desired martingale for (0.3.2), namely n M n = f(x n ) + g(x j ) (0.3.4) (In the Markov setting, one cannot expect (0.3.2) itself to be a martingale it just isn t. But (0.3.4) shows that it can be represented as a martingale if one adds on the correction term f(x n ).) Because (0.3.3) plays a key role in representing (0.3.2) as a martingale, this equation has an important place in the theory of Markov processes. Equation (0.3.3) is called Poisson s equation. (In the symmetric simple random walk setting, (0.3.3) is just a finite-difference approximation to f = g, which is Poisson s equation in the partial differential equations setting.) j=0 Poisson s equation need not have a solution for arbitrary g. Exercise 0.3.3 Suppose that X is an irreducible transient DTMC. If g has finite support (i.e. {x S : g(x) 0} has finite cardinality), show that Poisson s equation has a solution. Exercise 0.3.4 Suppose that X is an irreducible finite-state DTMC. Let π be the stationary distribution of X. Let Π be the matrix in which all rows are identical to π (a) Prove that ΠP = P Π = Π 2. (b) Prove that (P Π) n = P n Π for n. (c) Prove that if X is aperiodic, then n=0 (P Π)n converges absolutely. (d) Prove that if X is aperiodic, then (I P + Π) exists. (e) Extend (d) to the periodic case. (f) Prove that if g is such that πg = 0, then f = (Π A) g solves Poisson s equation Af = g. (g) Prove that if g is such that πg 0, then Af = g has no solution. 2

Exercise 0.3.5 We extend here the existence of solutions to Poisson s equation to infinite state irreducible positive recurrent Markov chains X = (X n : n 0). Let f : S R be such that x π(x) f(x) <. Set f c(x) = f(x) y π(y)f(y), and put where τ(z) = inf{n : X n = z}. τ(z) u (x) = E x n=0 f c (X n ), (a) Prove that E x τ(z) n=0 f c (X n ) < for each x S (so that u ( ) is finite-valued). (b) Prove that u (x) = f c (x) + y S P (x, y)u (y) so that u is a solution of Poisson s equation. We now turn to developing an analog to the likelihood ratio martingale that was discussed in the random walk setting. Let X = (X n : n 0) be an S-valued DTMC with initial distribution ν and (one-step) transition matrix Q = (Q(x, y) : x, y S). Suppose that we select a stochastic vector µ and transition matrix P such that (i) µ(x) = 0 whenever ν(x) = 0 for x S; (ii) P (x, y) = 0 whenever Q(x, y) = 0 for x, y S. Proposition 0.3.3 The sequence (L n : n 0) is a martingale with respect to (X n : n 0), where L n = µ(x n 0) P (X j, X j+ ) ν(x 0 ) Q(X j=0 j, X j+ ), n 0 Exercise 0.3.6 Prove Proposition 0.3.3. We close this section with a discussion of the exponential martingale s extension to the DTMC setting. Suppose that we wish to study an additive process of the form n j=0 g(x j), where (X n : n 0) is an irreducible finite-state DTMC. In the random walk setting, the moment generating function of the random walk played a critical role in constructing the exponential martingale. This suggests considering u n (θ, x, y) = E x [e θ n j=0 g(xj) ; X n = y] for x, y S. Observe that u n (θ, x, y) = x,...,xn e θg(x) P (x, x )e θg(x ) P (x, x 2 ) e θg(x n ) P (x n, y) = K n (θ, x, y), where K n (θ, x, y) the x y;th component of the nth power of the matrix K(θ), where K(θ, x, y) = e θg(x) P (x, y). (0.3.5) 3

Note that K(θ) is a non-negative finite irreducible matrix. Then, the Perron-Frobenius theorem for non-negative matrices implies that there exists a positive eigenvalue λ(θ) and corresponding positive column eigenvector r(θ) such that Let ψ(θ) = log λ(θ). We can rewrite (0.3.6) as e ψ(θ) y Substituting (0.3.5) into (0.3.7), we obtain e θg(x) ψ(θ) r(θ, x) P (x, y) r(θ, y) = or equivalently, Proposition 0.3.4 For each θ R, is a martingale with respect to (X n : n 0). y K(θ)r(θ) = λ(θ)r(θ). (0.3.6) r(θ, x) K(θ, x, y) =, x S. (0.3.7) r(θ, y) E x e θg(x) ψ(θ) r(θ, X ) r(θ, X 0 ) = L n (θ) = e θ n j=0 g(x j) nψ(θ) r(θ, X n ) r(θ, X 0 ) Proof: The critical verification involves showing that E[L n+ (θ) X 0,..., X n ] = L n (θ). But [ E[L n+ (θ) X 0,..., X n ] = L n (θ)e e θg(xn) ψ(θ) r(θ, X ] n+) r(θ, X n ) X 0,..., X n = L n (θ). We can rewrite this martingale as follows. Set h(θ, x) = log r(θ, x). Then, Proposition 0.3.4 asserts that e h(θ,xn)+θ n j=0 g(x j) nψ(θ) is a martingale. This exponential martingale can be used in a manner identical to the random walk setting to study n j=0 g(x j). 0.4 The Strong Law for Martingales As for sums of independent mean zero rv s, we expect that in great generality, n D i 0 a.s. (0.4.) as n. This is easy to establish if we weaken the a.s. convergence to convergence in probability, since P ( n D i > a) E(n n D i) 2 ɛ 2 4 = n 2 ɛ 2 EDi 2,

so that if it clearly follows that sup EDn 2 <, (0.4.2) n n p D i 0 (0.4.3) as n. To prove (0.4.3) to a.s. convergence, we need to apply the Martingale Convergence Theorem. Since (n n D i : n ) is not a martingale, we need to use something to bridge the gap between n n D i and the world of martingales. The appropriate bridge is Kronecker s lemma. Kronecker s Lemma: If (x n : n ) and (a n : n ) are two real-valued sequences for which (a n : n ) is non-negative and increasing to infinity, then the existence of a finite-valued z such that ( ) xj z as n implies that as n. j= a j a n j= x j 0 To apply this result in our martingale setting, let M n = j= ( Dj and observe that ( M n : n 0) is a martingale for which j ) E M n E D j /j j= = EDj 2/j2, j= so that in the presence of (0.4.2), the Martingale Convergence Theorem can be applied, yielding the conclusion that there exists a finite-valued M for which 2 M n M a.s. as n. An application of Kronecker s lemma path-by-path then yields a.s. as n. n D i 0 5

Exercise 0.4. Use the above argument to prove the strong law f(x i ) z n n i=0 π(z)f(z) a.s. as n for a given finite-state irreducible Markov chain (with equilibrium distribution π = (π(x) : x S)). 0.5 The Central Limit Theorem for Martingales We discuss here general conditions under which n /2 as n in discrete time or under which D i σn(0, ) t /2 M(t) σn(0, ) (0.5.) as n in continuous time. Since discrete-time martingales are just a special case of continuous time martingales, we focus on (0.5.). Note that M(t) = M(0) + (M(it/n) M((i )t/n)) so that in the presence of square integrability EM 2 (t) = EM 2 (0) + E (M(it/n) M((i )t/n)) 2. For a given square-integrable martingale (M(t)t 0) define the quadratic variation of M to be [M](t) = lim n (M(it/n) M((i )t/n)) 2. i=0 Theorem 0.5. Let (M(t) : t 0) be a square-integrable martingale with right continuous paths with left limits. If either: E sup M(s) M(s ) 0 t 0 s t and as t, or t [M](t) p σ 2 t E sup M(s) M(s ) 2 0, 0 s t t E sup M (s) M (s ) 0 0 s t 6

and as t, then t M (t) p σ 2 t /2 M(t) σn(0, ) as t. Remark 0.5. Note that Markov jump processes have right continuous paths with left limits, so this result applies in the Markov jump process setting. Remark 0.5.2 When specialized to discrete time, [M](n) = M 2 (0) + D 2 i and M (n) = E[Di 2 Z 0,..., Z i ] Exercise 0.5. Use the Martingale CLT to prove that there exists σ for which ( n n /2 f(x i ) n ) π(z)f(z) σn(0, ) z i=0 as n, provided that (X n : n 0) is a finite state irreducible Markov chain. 7