An Introduction to Stochastic Calculus

Similar documents
An Introduction to Stochastic Calculus

An Introduction to Stochastic Calculus

S t d with probability (1 p), where

BROWNIAN MOTION Antonella Basso, Martina Nardon

Stochastic calculus Introduction I. Stochastic Finance. C. Azizieh VUB 1/91. C. Azizieh VUB Stochastic Finance

Drunken Birds, Brownian Motion, and Other Random Fun

Math 416/516: Stochastic Simulation

STOCHASTIC CALCULUS AND BLACK-SCHOLES MODEL

AMH4 - ADVANCED OPTION PRICING. Contents

Stochastic Calculus, Application of Real Analysis in Finance

Limit Theorems for the Empirical Distribution Function of Scaled Increments of Itô Semimartingales at high frequencies

Homework Assignments

Stochastic Dynamical Systems and SDE s. An Informal Introduction

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

1 IEOR 4701: Notes on Brownian Motion

Non-semimartingales in finance

IEOR E4703: Monte-Carlo Simulation

Modeling via Stochastic Processes in Finance

RMSC 4005 Stochastic Calculus for Finance and Risk. 1 Exercises. (c) Let X = {X n } n=0 be a {F n }-supermartingale. Show that.

Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes

CONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES

Martingales. by D. Cox December 2, 2009

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 11 10/9/2013. Martingales and stopping times II

Lévy models in finance

A No-Arbitrage Theorem for Uncertain Stock Model

Replication and Absence of Arbitrage in Non-Semimartingale Models

Stochastic Calculus - An Introduction

1 Mathematics in a Pill 1.1 PROBABILITY SPACE AND RANDOM VARIABLES. A probability triple P consists of the following components:

Risk Neutral Valuation

BROWNIAN MOTION II. D.Majumdar

1.1 Basic Financial Derivatives: Forward Contracts and Options

Martingale representation theorem

Continuous Processes. Brownian motion Stochastic calculus Ito calculus

Equivalence between Semimartingales and Itô Processes

No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate

I Preliminary Material 1

Advanced Probability and Applications (Part II)

Lesson 3: Basic theory of stochastic processes

IEOR E4703: Monte-Carlo Simulation

Functional vs Banach space stochastic calculus & strong-viscosity solutions to semilinear parabolic path-dependent PDEs.

Introduction to Stochastic Calculus With Applications

The Black-Scholes Model

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation

How to hedge Asian options in fractional Black-Scholes model

Brownian Motion. Richard Lockhart. Simon Fraser University. STAT 870 Summer 2011

Outline Brownian Process Continuity of Sample Paths Differentiability of Sample Paths Simulating Sample Paths Hitting times and Maximum

Lecture 1: Lévy processes

Optimal stopping problems for a Brownian motion with a disorder on a finite interval

Stochastic Calculus for Finance Brief Lecture Notes. Gautam Iyer

The Black-Scholes Model

PAPER 27 STOCHASTIC CALCULUS AND APPLICATIONS

Stochastic Differential equations as applied to pricing of options

Last Time. Martingale inequalities Martingale convergence theorem Uniformly integrable martingales. Today s lecture: Sections 4.4.1, 5.

Randomness and Fractals

Lecture 17. The model is parametrized by the time period, δt, and three fixed constant parameters, v, σ and the riskless rate r.

4 Martingales in Discrete-Time

Random Variables Handout. Xavier Vilà

The stochastic calculus

Monte Carlo Methods in Financial Engineering

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

Continuous Time Finance. Tomas Björk

STAT/MATH 395 PROBABILITY II

1 The continuous time limit

H. P. Geering, G. Dondi, F. Herzog, S. Keel. Stochastic Systems. April 14, 2011

Reading: You should read Hull chapter 12 and perhaps the very first part of chapter 13.

MS-E2114 Investment Science Lecture 5: Mean-variance portfolio theory

Using of stochastic Ito and Stratonovich integrals derived security pricing

Path Dependent British Options

Option Pricing. Chapter Discrete Time

Lecture Note 8 of Bus 41202, Spring 2017: Stochastic Diffusion Equation & Option Pricing

Stochastic Modelling Unit 3: Brownian Motion and Diffusions

Derivatives Pricing and Stochastic Calculus

Risk Neutral Measures

Financial Mathematics. Spring Richard F. Bass Department of Mathematics University of Connecticut

Theoretical Statistics. Lecture 4. Peter Bartlett

Lecture Notes for Chapter 6. 1 Prototype model: a one-step binomial tree

Polynomial processes in stochastic portofolio theory

Introduction to Stochastic Calculus and Financial Derivatives. Simone Calogero

The Black-Scholes PDE from Scratch

THE MARTINGALE METHOD DEMYSTIFIED

MAS452/MAS6052. MAS452/MAS Turn Over SCHOOL OF MATHEMATICS AND STATISTICS. Stochastic Processes and Financial Mathematics

Applied Stochastic Processes and Control for Jump-Diffusions

IEOR 165 Lecture 1 Probability Review

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE

ECON FINANCIAL ECONOMICS I

M5MF6. Advanced Methods in Derivatives Pricing

Are stylized facts irrelevant in option-pricing?

Lecture 4. Finite difference and finite element methods

Beyond the Black-Scholes-Merton model

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

Modelling financial data with stochastic processes

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Remarks: 1. Often we shall be sloppy about specifying the ltration. In all of our examples there will be a Brownian motion around and it will be impli

Economics has never been a science - and it is even less now than a few years ago. Paul Samuelson. Funeral by funeral, theory advances Paul Samuelson

Math-Stat-491-Fall2014-Notes-V

Stochastic Processes and Financial Mathematics (part two) Dr Nic Freeman

Hedging under Arbitrage

Rohini Kumar. Statistics and Applied Probability, UCSB (Joint work with J. Feng and J.-P. Fouque)

Hedging under arbitrage

Continuous random variables

Transcription:

An Introduction to Stochastic Calculus Haijun Li lih@math.wsu.edu Department of Mathematics and Statistics Washington State University Lisbon, May 218 Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 1 / 169

Outline Basic Concepts from Probability Theory Random Vectors Stochastic Processes Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 2 / 169

Notations Sample or outcome space Ω := {all possible outcomes ω of the underlying experiment}. σ-field or σ-algebra F: A non-empty class of subsets (or observable events) of Ω closed under countable union, countable intersection and complements. Probability measure P( ) on F: P(A) denotes the probability of event A. Random variable X : Ω R is a real-valued measurable function defined on Ω. That is, events X 1 (a, b) F are observable for all a, b R. Induced probability measure P X (B) := P(X B) = P({ω : X(ω) B}), for any Borel set B R. Distribution function F X (x) := P(X x), x R. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 3 / 169

Continuous and Discrete Random Variables Random variable X is said to be continuous if the distribution function F X has no jumps, that is, lim F X (x + h) = F X (x), x R. h Most continuous distributions of interest have a density f X : F X (x) = x f X (y)dy, x R where f X (y)dy = 1. Random variable X is said to be discrete if the distribution function F X is a pure jump function: F X (x) = p k, x R k:x k x where the probability mass function {p k } satisfies that 1 p k and k=1 p k = 1. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 4 / 169

Expectation, Variance and Moments A General Formula For a real-valued function g, the expectation of g(x) is given by Eg(X) = g(x)df X (x). The k-th moment of X is given by E(X k ) = x k df X (x). The mean µ X (or center of gravity ) of X is the first moment. The variance (or spread out ) of X is defined as σ 2 X = var(x) := E(X µ X ) 2. Clearly σ 2 X = E(X 2 ) µ 2 X. If the variance exists, then the Chebyshev inequality holds: P( X µ X > kσ X ) k 2, k >. That is, the probability of tail regions that are k standard deviations away from the mean is bounded by 1/k 2. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 5 / 169

Random Vectors Let (Ω, F, P) be a probability space. X = (X 1,..., X d ) : Ω R d denotes a d-dimensional random vector, where its components X 1,..., X d are real-valued random variables. The induced probability measure: P X (B) = P(X B) := P({ω : X(ω) B}) for all Borel subsets B of R d. The distribution function F X (x) := P(X 1 x 1,..., X d x d ), x = (x 1,..., x d ) R d. If X has a density f X, then x1 xd F X (x) = f X (x)dx with f X (x)dx = 1. For any J {1,..., d}, let X J := (X j ; j J) be the J-margin of X. The marginal density of X J is given by f X J (x J ) = f X (x)dx J c. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 6 / 169

Expectation, Variance, and Covariance The expectation or mean value of X is denoted by µ X = EX := (E(X 1 ),..., E(X d )). The covariance matrix of X is defined as Σ X := (cov(x i, X j ); i, j = 1,..., d) where the covariance of X i and X j is defined as cov(x i, X j ) := E[(X i µ Xi )(X j µ Xj )] = E(X i X j ) µ Xi µ Xj. The correlation of X i and X j is denoted by corr(x i, X j ) := cov(x i, X j ) σ Xi σ Xj. It follows from the Cauchy-Schwarz inequality that 1 corr(x i, X j ) 1. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 7 / 169

Independence and Dependence The events A 1,..., A n are independent if for any 1 i 1 < i 2 < < i k n, P( k j=1 A i j ) = k P(A ij ). The random variables X 1,..., X n are independent if for any Borel sets B 1,..., B n, the events {X 1 B 1 },..., {X n B n } are independent. The random variables X 1,..., X n are independent if and only if F X1,...,X n (x 1,..., x n ) = n i=1 F X i (x i ), for all (x 1,..., x n ) R n. The random variables X 1,..., X n are independent if and only if E[ n i=1 g i(x i )] = n i=1 Eg i(x i ) for any real-valued functions g 1,..., g n. In the continuous case, the random variables X 1,..., X n are independent if and only if f X1,...,X n (x 1,..., x n ) = n i=1 f X i (x i ), for all (x 1,..., x n ) R n. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 8 / 169 j=1

Two Examples Let X = (X 1,..., X d ) have a d-dimensional Gaussian distribution. The random variables X 1,..., X d are independent if and only if corr(x i, X j ) = for i j. For non-gaussian random vectors, however, independence and uncorrelatedness are not equivalent. Let X be a standard normal random variable. Since both X and X 3 have expectation zero, X and X 2 are uncorrelated: cov(x, X 2 ) = E(X 3 ) EXE(X 2 ) =. But X and X 2 are clearly dependent (co-monotone). Since {X [ 1, 1]} = {X 2 [, 1]}, we obtain P(X [ 1, 1], X 2 [, 1]) = P(X [ 1, 1]) > [P(X [ 1, 1])] 2 = P(X [ 1, 1])P(X 2 [, 1]). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 9 / 169

Autocorrelations For a time series X, X 1, X 2,... the autocorrelation at lag h is defined by corr(x, X h ), h =, 1,.... Log-returns X t := log S t S t 1, where S t is the price of a speculative asset (equities, indexes, exchange rates and commodity) at the end of the t-th period. If the relative returns are small, then X t S t S t 1 S t 1. Note that the log-returns are scale-free, additive, stationary,... Stylized Fact #1: Log-returns X t are not iid (independent and identically distributed) although they show little serial autocorrelation. Stylized Fact #2: Series of absolute X t or squared Xt 2 returns show profound serial autocorrelation (long-range dependence). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 1 / 169

Stochastic Processes A stochastic process X := (X t, t T ) is a collection of random variables defined on some space Ω, where T R. If index set T is a finite or countably infinite set, X is said to be a discrete-time process. If T is an interval, then X is a continuous-time process. A stochastic process X is a (measurable) function of two variables: time t and sample point ω. Fix time t, X t = X t (ω), ω Ω, is a random variable. Fix sample point ω, X t = X t (ω), t T, is a sample path. Example: An autoregressive process of order 1 is given by X t = φx t 1 + Z t, t Z, where φ is a real parameter. Time series models can be understand as discretization of stochastic differential equations. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 11 / 169

Finite-Dimensional Distributions All possible values of a stochastic process X = (X t, t T ) constitute a function space of all sample paths (X t (ω), t T ), ω Ω. Specifying the distribution of X on this function space is equivalent to specifying which information is available in terms of the observable events from the σ-field generated by X. The distribution of X can be described by the distributions of the finite-dimensional vectors (X t1,..., X tn ), for all possible choices of times t 1,..., t n T. Example: A stochastic process is called Gaussian if all its finite-dimensional distributions are multivariate Gaussian. The distribution of this process is determined by the collection of the mean vectors and covariance matrices. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 12 / 169

Expectation and Covariance Functions The expectation function of a process X = (X t, t T ) is defined as µ X (t) := µ Xt = EX t, t T. The covariance function of X is given by C X (t, s) := cov(x t, X s ) = E[(X t EX t )(X s EX s )], t, s T. In particular, the variance function of X is given by σx 2 (t) = C X (t, t) = var(x t ), t T. Example: A Gaussian white noise X = (X t, t 1) consists of iid N(, 1) random variables. In this case its finite-dimensional distributions are given by, for any t 1 t n 1, n n P(X t1 x 1,..., X tn x n ) = P(X ti x i ) = Φ(x i ), x R n. i=1 Its expectation and covariance functions are given by µ X (t) =, { 1 if t = s C X (t, s) = if t s Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 13 / 169 i=1

Dependence Structure A process X = (X t, t T ) is said to be strictly stationary if for any t 1,..., t n T (X t1,..., X tn ) = d (X t1 +h,..., X tn+h). That is, its finite-dimensional distribution functions are invariant under time shifts. A process X = (X t, t T ) is said to have stationary increments if X t X s = d X t+h X s+h, t, s, t + h, s + h T. A process X = (X t, t T ) is said to have independent increments if for all t 1 < < t n in T, X t2 X t1,..., X tn X tn 1 are independent. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 14 / 169

Strictly Stationary vs Stationary A process X is said to be stationary (in the wide sense) if µ X (t + h) = µ X (t), and C X (t, s) = C X (t + h, s + h). If second moments exist, then the strictly stationarity implies the stationarity. Example: Consider a strictly stationary Gaussian process X. The distribution of X is determined by µ X () and C X (t, s) = g X ( t s ) for some function g X. In particular, for Gaussian white noise X, g X () = 1 and g X (x) = for any x. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 15 / 169

Homogeneous Poisson Process A stochastic process X = (X t, t ) is called an Poisson process with intensity rate λ > if X =, it has stationary, independent increments, and for every t >, X t has a Poisson distribution Poi(λt). Simulation of Poisson Processes Simulate iid exponential Exp(λ) random variables Y 1, Y 2,..., and set T n := n i=1 Y i. The Poisson process can be constructed by X t := #{n : T n t}, t. Example: Claims arriving in an insurance portfolio. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 16 / 169

Outline Brownian Motion Simulation of Brownian Sample Paths Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 17 / 169

Definition A stochastic process B = (B t, t [, )) is called a (standard) Brownian motion or a Wiener process if B =, it has stationary, independent increments, for every t >, B t has a normal N(, t) distribution, and it has continuous sample paths. Historical Note: Brownian motion is named after the botanist Robert Brown who first observed, in the 182s, the irregular motion of pollen grains immersed in water. By the end of the nineteenth century, the phenomenon was understood by means of kinetic theory as a result of molecular bombardment. in 19, L. Bachelier had employed it to model the stock market, where the analogue of molecular bombardment is the interplay of the myriad of individual market decisions that determine the market price. Norbert Wiener (1923) was the first to put Brownian motion on a firm mathematical basis. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 18 / 169

Distributional Properties of Brownian Motion For any t > s, B t B s = d B t s B = B t s has an N(, t s) distribution. That is, the larger the interval, the larger the fluctuations of B on this interval. µ B (t) = EB t = and for any t > s, C B (t, s) = E[((B t B s ) + B s )B s ] = E(B t B s )EB s + s = min(s, t). Brownian motion is a Gaussian process: its finite-dimensional distributions are multivariate Gaussian. Question: How irregular are Brownian sample paths? Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 19 / 169

Self-Similarity A stochastic process X = (X t, t ) is H-self-similar for some H > if it satisfies the condition (T H X t1,..., T H X tn ) = d (X Tt1,..., X Ttn ) for every T > and any choice of t i, i = 1,..., n. Self-similarity means that the properly scaled patterns of a sample path in any small or large time interval have a similar shape. Non-Differentiability of Self-Similar Processes For any H-self-similar process X with stationary increments and < H < 1, X t X t lim sup =, at any fixed t. t t t t That is, sample paths of H-self-similar processes are nowhere differentiable with probability 1. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 2 / 169

Path Properties of Brownian Motion Brownian motion is.5-self-similar. Its sample paths are nowhere differentiable. That is, any sample path changes its shape in the neighborhood of any time epoch in a completely non-predictable fashion (Wiener, Paley and Zygmund, 193s). Unbounded Variation of Brownian Sample Paths n sup B ti (ω) B ti 1 (ω) =, a.s., τ i=1 where the supremum is taken over all possible partitions τ : = t < < t n = T of any finite interval [, T ]. The unbounded variation and non-differentiability of Brownian sample paths are major reasons for the failure of classical integration methods, when applied to these paths, and for the introduction of stochastic calculus. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 21 / 169

Brownian Bridge Let B = (B t, t [, )) denote Brownian Motion. The process X := (B t tb 1, t 1) satisfies that X = X 1 =. This process is called the (standard) Brownian bridge. Since multivariate normal distributions are closed under linear transforms, the finite-dimensional distributions of X are Gaussian. The Brownian bridge is characterized by two functions µ X (t) = and C X (t, s) = min(t, s) ts, for all s, t [, 1]. The Brownian bridge appears as the limit process of the normalized empirical distribution function of a sample of iid uniform U(, 1) random variables. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 22 / 169

Brownian Motion with Drift Let B = (B t, t [, )) denote Brownian Motion. The process X := (µt + σb t, t ), for constant σ > and µ R, is called Brownian motion with (linear) drift. X is a Gaussian process with expectation and covariance functions µ X (t) = µt, C X (t, s) = σ 2 min(t, s), s, t. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 23 / 169

Geometric Brownian Motion (Black, Scholes and Merton 1973) The process X = (exp(µt + σb t ), t ), for constant σ > and µ R, is called geometric Brownian motion. Since Ee tz = e t2 /2 for an N(, 1) random variable Z, it follows from the self-similarity of Brownian motion that µ X (t) = e µt Ee σb t = e µt Ee σt1/2 B 1 = e (µ+.5σ2 )t. Since B t B s and B s are independent for any s t, and B t B s = d B t s, then C X (t, s) = e (µ+.5σ2 )(t+s) (e σ2t 1). In particular, σ 2 X (t) = e(2µ+σ2 )t (e σ2t 1). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 24 / 169

Central Limit Theorem Consider a sequence {Y 1, Y 2,..., } of iid non-degenerate random variables with mean µ Y = EY 1 and variance σ 2 Y = var(y 1) >. Define the partial sums: R :=, R n := n i=1 Y i, n 1. Central Limit Theorem (CLT) If Y 1 has finite variance, then the sequence (R n ) obeys the CLT via the following uniform convergence: ( ) Rn ER n P [var(r n )] 1/2 x Φ(x), as n, sup x R where Φ(x) denotes the distribution of the standard normal distribution. That is, for large sample size n, the distribution of [R n ER n ]/[var(r n )] 1/2 is approximately standard normal. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 25 / 169

Functional Approximation Let (Y i ) be a sequence of iid random variables with mean µ Y = EY 1 and variance σ 2 Y = var(y 1) >. Consider the process S n = (S n (t), t [, 1]) with continuous sample paths on [, 1], { (σ 2 S n (t) = Y n) 1/2 (R i µ Y i), if t = i/n, i =,..., n linearly interpolated, otherwise. Example: If Y i s are iid N(, 1), consider the restriction of the process S n on the points i/n: S n (i/n) = n 1/2 ik=1 Y k, i =,..., n. S n () =. S n has independent increments: for any i 1 i m n, S n (i 2 /n) S n (i 1 /n),..., S n (i m /n) S n (i m 1 /n) are independent. For any i n, S n (i/n) has a normal N(, i/n) distribution. S n and Brownian motion B on [, 1], when restricted to the points i/n, have very much the same properties. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 26 / 169

Functional Central Limit Theorem Let C[, 1] denote the space of all continuous functions defined on [, 1]. With the maximum norm, C[, 1] is a complete separable space. Donsker s Theorem If Y 1 has finite variance, then the process S n obeys the functional CLT: Eφ(S n ( )) Eφ(B), as n, for all bounded continuous functionals φ : C[, 1] R, where B(t) is the Brownian motion on [, 1]. The finite-dimensional distributions of S n converge to the corresponding finite-dimensional distributions of B: As n, P(S n (t 1 ) x 1,..., S n (t m ) x m ) P(B t1 x 1,..., B tm x m ), for all possible t i [, 1] x i R. The max functional max i n S n (t i ) converges in distribution to max t 1 B t as n. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 27 / 169

Functional CLT for Jump Processes Stochastic processes are infinite-dimensional objects, and therefore unexpected events may happen. For example, the sample paths of the converging processes may fluctuate very wildly with increasing n. In order to avoid such irregular behavior, a so-called tightness or stochastic compactness condition must be satisfied. The functional CLT remains valid for the process S n = ( S n (t), t [, 1]), where S n (t) = (σ 2 Y n) 1/2 (R [nt] µ Y [nt]) and [nt] denotes the integer part of the real number nt. In contrast to S n, the process S n is constant on the intervals [(i 1)/n, i/n) and has jumps at the points i/n. S n and S n coincide at the points i/n, and the differences between these two processes are asymptotically negligible: the normalization n 1/2 makes the jumps of S n arbitrarily small for large n. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 28 / 169

Simulating a Brownian Sample Path Plot the paths of the processes S n, or S n, for sufficiently large n, and get a reasonable approximation to Brownian sample paths. Since Brownian motion appears as a distributional limit, completely different graphs for different values of n may appear for the same sequence of realizations Y i (ω)s. Simulating a Brownian Sample Path on [, T ] Simulate one path of S n, or S n on [, 1], then scale the time interval by the factor T and the sample path by the factor T 1/2. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 29 / 169

Lévy-Ciesielski Representation Since Brownian sample paths are continuous functions, we can try to expand them in a series. However, the paths are random functions: for different ω, we obtain different path functions. This means that the coefficients of this series are random variables. Since the process is Gaussian, the coefficients must be Gaussian as well. Lévy-Ciesielski Expansion Brownian motion on [, 1] can be represented in the form B t (ω) = n=1 Z n (ω) φ n (x)dx, t [, 1], where Z n s are iid N(, 1) random variables and (φ n ) is a complete orthonormal function system on [, 1]. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 3 / 169

Paley-Wiener Representation There are infinitely many possible representations of Brownian motion. Let (Z n, n ) be a sequence of iid N(,1) random variables, then t B t (ω) = Z (ω) (2π) 1/2 + 2 π 1/2 Z n (ω) sin(nt/2), t [, 2π]. n n=1 This series converges for every t, and uniformly for t [, 2π]. Simulating a Brownian Path via Paley-Wiener Expansion Calculate Z (ω) t j (2π) 1/2 + 2 π 1/2 M n=1 Z n (ω) sin(nt j/2) n, t j = 2πj, for j N. N The problem of choosing the right values for M and N is similar to the choice of the sample size n in the functional CLT. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 31 / 169

Outline Conditional Expectation: An Illustration General Conditional Expectation Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 32 / 169

Discrete Conditioning Let X be a random variable defined on s probability space (Ω, F, P), and B Ω with P(B) >. The conditional distribution function of X given B is defined as F X (x B) := P(X x B) = P({X x} B). P(B) The conditional expectation of X given B is given by B x dp({x x}) E(X B) = x df X (x B) = = E(XI B) P(B) P(B), where I B (ω) is the indicator function of the event B. E(X B) can be viewed as our estimate to X given the information that the event B has occurred. E(X B c ) is similarly defined. Together, E(X B) and E(X B c ) provide our estimate to X depending on whether or not B occurs. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 33 / 169

Conditional Expectation Under Discrete Conditioning Think I B as a random variable that carries the information on whether the event B occurs, and the conditional expectation E(X I B ) of X given I B is a random variable defined as { E(X B), if ω B E(X I B )(ω) = E(X B c ), if ω / B. The random variable E(X I B ) is our estimate to X based on the information provided by I B. Consider a discrete random variable Y on Ω taking distinct values y i, i = 1, 2,...,. Let A i = {ω Ω : Y (ω) = y i }. Note that Y carries the information on whether or not events A i s occur. Define the conditional expectation of X given Y : E(X Y )(ω) := E(X A i ) = E(X Y = y i ), if ω A i, i = 1, 2,.... The random variable E(X Y ) can be viewed as our estimate to X based on the information carried by Y. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 34 / 169

Example: Uniform Random Variable Consider the random variable X(ω) = ω on Ω = (, 1], endowed with the probability measure P((a, b]) := b a for any (a, b] (, 1]. Assume that one of the events ( i 1 A i = n, i ], i = 1,..., n, n occurred. Then E(X A i ) = 1 xf X (x)dx = 1 2i 1 (i.e., the center of A i ). P(A i ) A i 2 n The value E(X A i ) is the updated expectation on the new space A i, given the information that A i occurred. Define Y (ω) := n i=1 i 1 n I A i (ω), i = 1,..., n. The conditional expectation E(X Y )(ω) = 1 2i 1 2 n if ω A i, i = 1,..., n. Since E(X Y )(ω) is the average of X given the information that ω ((i 1)/n, i/n], E(X Y ) is a coarser version of X, that is, an approximation to X, given the information that any of the A i s occurred. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 35 / 169

Properties of Conditional Expectation The conditional expectation is linear: for random variables X 1, X 2 and constants c 1, c 2, EX = E[E(X Y )]. E([c 1 X 1 + c 2 X 2 ] Y ) = c 1 E(X 1 Y ) + c 2 E(X 2 Y ). If X and Y are independent, then E(X Y ) = EX. The random variable E(X Y ) is a (measurable) function of Y : E(X Y ) = g(y ), where g(y) = E(X Y = y i )I {yi }(y). i=1 Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 36 / 169

σ-fields Observe that the values of Y did not really matter for the definition of E(X Y ) under discrete conditioning, but it was crucial that conditioning events A i s describe the information carried by all the distinct values of Y. That is, we estimate the random variable X via E(X Y ) based on the information provided by observable events A i s and their composite events, such as A i A j and A i A j,... Definition of σ-fields A σ-field F on Ω is a collection of subsets (observable events) of Ω satisfying the following conditions: F and Ω F. If A F, then A c F. If A 1, A 2, F, then i=1 A i F, and i=1 A i F. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 37 / 169

Generated σ-fields For any collection C of events, let σ(c) denote the smallest σ-field containing C, by adding all possible unions, intersections and complements. σ(c) is said to be generated by C. The following are some examples. F = {, Ω} = σ({ }). F = {, Ω, A, A c } = σ({a}). F = {A : A Ω} = σ({a : A Ω}). Let C = {(a, b] : < a < b < }, then any set in B 1 = σ(c) is called a Borel subset in R. Let C = {(a, b] : < a i < b i <, i = 1,..., d}, then any set in B d = σ(c) is called a Borel subset in R d. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 38 / 169

σ-fields Generated By Random Variables Let Y be a discrete random variable taking distinct values y i, i = 1, 2,.... Define A i = {ω : Y (ω) = y i }, i = 1, 2,.... A typical set in the σ-field σ({a i }) is of this form A = i I A i, I {1, 2,... }. σ({a i }) is called the σ-field generated by Y, and denoted by σ(y ). Let Y be a random vector and A(a, b] = {ω : Y (ω) (a, b], < a i < b i <, i = 1,..., d}. The σ-field σ({a(a, b], a, b R d }) is called the σ-field generated by Y, and denoted by σ(y ). σ(y ) provides the essential information about the structure of Y, and contains all the observable events {ω : Y (ω) C}, where C is a Borel subset of R d. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 39 / 169

σ-fields Generated By Stochastic Processes For a stochastic process Y = (Y t, t T ), and any (measurable) set C of functions on T, let A(C) = {ω : the sample path (Y t (ω), t T ) belongs to C}. The σ-field generated by the process Y is the smallest σ-field that contains all the events of the form A(C). Example: For Brownian motion B = (B t, t ), let F t := σ({b s, s t}) denote the σ-field generated by Brownian motion prior to time t. F t contains the essential information about the structure of the process B on [, t]. One can show that this σ-field is generated by all sets of the form A t1,...,t n (C) = {ω : (B t1 (ω),..., B tn (ω)) C} for all the n-dimensional Borel sets C. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 4 / 169

Information Represented by σ-fields For a random variable, a random vector or a stochastic process Y on Ω, the σ-field σ(y ) generated by Y contains the essential information about the structure of Y as a function of ω Ω. It consists of all subsets {ω : Y (ω) C} for suitable sets C. Because Y generates a σ-field, we also say that Y contains information represented by σ(y ) or Y carries the information σ(y ). For any measurable function f acting on Y, since {ω : f (Y (ω)) C} = {ω : Y (ω) f 1 (C)}, measurable set C we have that σ(f (Y )) σ(y ). That is, a function f acting on Y does not provide new information about the structure of Y. Example: For Brownian motion B = (B i, t ), consider the function f (B) = sup t 1 B t. The σ(f (B)) σ({b s, s t}) for any t 1. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 41 / 169

The General Conditional Expectation Let (Ω, F, P) be a probability space, and Y, Y 1 and Y 2 denote random variables (or random vectors, stochastic processes) defined on Ω. The information of Y is contained in F or Y does not contain more information than that contained in F σ(y ) F. Y 1 contains more information than Y 2 σ(y 2 ) σ(y 1 ). Conditional Expectation Given the σ-field Let X be a random variable defined on Ω. The conditional expectation given F is a random variable, denoted by E(X F), with the following properties: E(X F) does not contain more information than that contained in F: σ(e(x F)) F. For any event A F, E(XI A ) = E(E(X F)I A ). By virtue of the Radon-Nikodym theorem, we can show the existence and almost sure (a.s.) uniqueness of E(X F). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 42 / 169

Conditional Expectation Given Generated Information Let Y be a random variable (random vector or stochastic process) on Ω. The conditional expectation of X given Y, denoted by E(X Y ), is defined as E(X Y ) := E(X σ(y )). The random variables X and E(X F) are close to each other, not in the sense that they coincide for any ω, but averages (expectations) of X and E(X F) on suitable sets A are the same. The conditional expectation E(X F) is a coarser version of the original random variable X and is our estimate to X given the information F. Example: Let Y be a discrete random variable taking distinct values y i, i = 1, 2,.... Any set A σ(y ) can be written as A = i I A i = i I {ω : Y (ω) = y i }, for some I {1, 2,... }. Let Z := E(X Y ). Then σ(z ) σ(y ) and Z (ω) = E(X A i ), for ω A i. Observe that E(XI A ) = E(X i I I Ai ) = i I E(XI Ai ) = i I E(X A i )P(A i ) = E(ZI A ). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 43 / 169

Special Cases Classical Conditional Expectation: Let B be an event with P(B) >, P(B c ) >. Define F B := σ({b}) = {, Ω, B, B c }. Then E(X F B )(ω) = E(X B), for ω B. Classical Conditional Probability: If X = I A, then E(I A F B )(ω) = E(I A B) = P(A B), for ω B. P(B) Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 44 / 169

Rules for Calculation of Conditional Expectations Let X, X 1, X 2 denote random variables defined on (Ω, F, P). 1 For any two constants c 1, c 2, E(c 1 X 1 + c 2 X 2 F) = c 1 E(X 1 F) + c 2 E(X 2 F). 2 EX = E[E(X F)]. 3 If X and F are independent, then E(X F) = EX. In particular, if X and Y are independent, then E(X Y ) = EX. 4 If σ(x) F, then E(X F) = X. In particular, if X is a function of Y, then σ(x) σ(y ) and E(X Y ) = X. 5 If σ(x) F, then E(XX 1 F) = XE(X 1 F). In particular, if X is a function of Y, then σ(x) σ(y ) and E(XX 1 Y ) = XE(X 1 Y ). 6 If F and F are two σ-fields with F F, then E(X F) = E[E(X F ) F], and E(X F) = E[E(X F) F ]. 7 Let G be a stochastic process with σ(g) F. If X and F are independent, then for any function h(x, y), E[h(X, G) F] = E(E X [h(x, G)] F) where E X [h(x, G)] = expectation of h(x, G) with respect to X. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 45 / 169

Examples Example 1: If X and Y are independent, then E(XY Y ) = Y EX, and E(X + Y Y ) = EX + Y. Example 2: Consider Brownian motion B = (B t, t ). The σ-fields F s = σ(b x, x s) represent an increasing stream of information about the structure of the process. Find E(B t F s ) = E(B t B x, x s) for s. If s t, then F s F t and thus E(B t F s ) = B t. If s < t, then E(B t F s ) = E(B t B s F s ) + E(B s F s ) = E(B s F s ) = B s. E(B t F s ) = B min(s,t). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 46 / 169

Another Example: Squared Brownian Motion Consider again Brownian motion B = (B t, t ), with the σ-fields F s = σ(b x, x s). Define X t := B 2 t t, t. If s t, then F s F t and thus E(X t F s ) = X t. If s < t, observe that X t = [(B t B s ) + B s ] 2 t = (B t B s ) 2 + B 2 s + 2B s (B t B s ) t. Since (B t B s ) and (B t B s ) 2 are independent of F s, we have E[(B t B s ) 2 F s ] = E(B t B s ) 2 = (t s), E[B s (B t B s ) F s ] = B s E(B t B s ) =. Since σ(b 2 s ) σ(b s ) F s, we have Thus, E(X t F s ) = X s. E(X t F s ) = X min(s,t). E(B 2 s F s ) = B 2 s. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 47 / 169

The Projection Property of Conditional Expectations We now formulate precisely the meaning of the statement that the conditional expectation E(X F) can be understood as the optimal estimate to X given the information F. Define L 2 (F) := {Z : σ(z ) F, EZ 2 < }. If F = σ(y ), then Z L 2 (σ(y )) implies that Z is a function of Y. The Projection Property Let X be a random variable with EX 2 <. The conditional expectation E(X F) is that random variable in L 2 (F) which is closest to X in the mean square sense: E[X E(X F)] 2 = min E(X Z ) 2. Z L 2 (F) If F = σ(y ), then E(X Y ) is that function of Y which has a finite second moment and which is closest to X in the mean square sense. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 48 / 169

The Best Prediction Based on Available Information It follows from the projection property that the conditional expectation E(X F) can be viewed as the best prediction of X given the information F. For example, for Brownian motion B = (B t, t ), we have, s t, E(B t B x, x s) = B s, and E(B 2 t t B x, x s) = B 2 s s. That is, the best predictions of the future values B t and B 2 t t, given the information about Brownian motion until the present time s, are the present values B s and B 2 s s, respectively. This property characterizes the whole class of martingales with a finite second moment. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 49 / 169

Outline Martingales Martingale Transforms Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 5 / 169

Filtration Let (F t, t ) be a collection of σ-fields on the same probability space (Ω, F, P) with F t F, for all t. Definition The collection (F t, t ) of σ-fields on Ω is called a filtration if F s F t, s t. A filtration represents an increasing stream of information. The index t can be discrete, for example, the filtration (F n, n =, 1,...) is a sequence of σ-fields on Ω with F n F n+1 for all n. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 51 / 169

Adapted Processes A filtration is usually linked up with a stochastic process. Definition The stochastic process Y = (Y t, t ) is said to be adapted to the filtration (F t, t ) if σ(y t ) F t, t. The stochastic process Y is always adapted to the natural filtration generated by Y : F t = σ(y s, s t). For a discrete-time process Y = (Y n, n =, 1,... ), the adaptedness means σ(y n ) F n for all n. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 52 / 169

Example Let (B t, t ) denote Brownian motion and (F t, t ) denote the corresponding natural filtration. Stochastic processes of the form X t = f (t, B t ), t, where f is a function of two variables, are adapted to (F t, t ). Examples: X (1) t = B t and X (2) t = Bt 2 t. More Examples: X (3) t = max s t B s and X (4) t = max s t Bs 2. Examples that are not adapted to the Brownian motion filtration: X (5) t = B t+1 and X (6) t = B t + B T for some fixed number T >. Definition If the stochastic process Y = (Y t, t ) is adapted to the natural Brownian filtration (F t, t ) (that is, Y t is a function of (B s, s t) for all t ), we will say that Y is adapted to Brownian motion. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 53 / 169

Adapted to Different Filtrations Consider Brownian motion (B t, t ) and the corresponding natural filtration F t = σ(b s, s t). The stochastic process X t := B 2 t, t, generates its own natural filtration F t = σ(b 2 s, s t), t. The process (X t, t ) is adapted to both F t and F t. Observe that F t F t. For example, we can only reconstruct the whole information about B t from B 2 t ;, but not about B t: we can say nothing about the sign of B t. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 54 / 169

Market Information or Information Histories Share prices, exchange rates, interest rates, etc., can be modelled by solutions of stochastic differential equations which are driven by Brownian motion. These solutions are then functions of Brownian motion. The fluctuations of these processes actually represent the information about the market. This relevant knowledge is contained in the natural filtration. In finance there are always people who know more than the others. For example, they might know that an essential political decision will be taken in the very near future which will completely change the financial landscape. This enables the informed persons to act with more competence than the others. Thus they have their own filtrations which can be bigger than the natural filtration. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 55 / 169

Martingale If information F s and X are dependent, we can expect that knowing F s reduces the uncertainty about the values of X t at t > s. That is, X t can be better predicted via E(X t F s ) with the information F s than without it. Definition The stochastic process X = (X t, t ) adapted to the filtration (F t, t ) is called a continuous-time martingale with respect to (F t, t ), if 1 E X t < for all t. 2 X s is the best prediction of X t given F s : E(X t F s ) = X s for all s t. The discrete-time martingale can be similarly defined by replacing the second condition by E(X n+1 F n ) = X n, n =, 1,.... A martingale has the remarkable property that its expectation function is constant: EX s = E[E(X t F s )] = EX t for all s, t. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 56 / 169

Example: Partial Sums Let (Z n ) be a sequence of independent random variables with finite expectations and Z =. Consider the partial sums R n = n Z i, n. i= and the corresponding natural filtration F n = σ(r,..., R n ) = σ(z,..., Z n ), n. Observe that E(R n+1 F n ) = E(R n F n ) + E(Z n+1 F n ) = R n + EZ n+1. and hence, if EZ n = for all n, then (R n, n ) is a martingale with respect to the filtration (F n, n ). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 57 / 169

Collecting Information About a Random Variable Let Z be a random variable on Ω with E Z < and (F t, t ) be a filtration on Ω. Define X t = E(Z F t ), t. Since F t increases when time goes by, X t gives us more and more information about the random variable Z. In particular, if σ(z ) F t for some t, then X t = Z. An appeal to Jensen s inequality yields E X t = E E(Z F t ) E[E( Z F t )] = E Z <. σ(x t ) F t. E(X t F s ) = E[E(Z F t ) F s ] = E(Z F s ) = X s. X is a martingale with respect to (F t, t ). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 58 / 169

Brownian Motion is a Martingale Let B = (B t, t ) be Brownian motion with the natural filtration F t = σ(b s, s t). B and (B 2 t t, t ) are martingales with respect to the natural filtration. (B 3 t 3tB t, t ) is a martingale. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 59 / 169

Martingale Transform Let X = (X n, n =, 1,... ) be a discrete-time martingale with respect to the filtration (F n, n =, 1,... ). Let Y n := X n X n 1, n 1, and Y := X. The sequence Y = (Y n, n =, 1,... ) is called a martingale difference sequence with respect to the filtration (F n, n =, 1,... ). Consider a stochastic process C = (C n, n = 1, 2,... ), satisfying that σ(c n ) F n 1, n 1. Given F n 1, we completely know C n at time n 1. Such a sequence is called predictable with respect to (F n, n =, 1,... ). Define n n Z =, Z n = C i Y i = C i (X i X i 1 ), n 1. i=1 The process C Y := (Z n, n ) is called the martingale transform of Y by C. Note that if C n = 1 for all n 1, then C Y = X is the original martingale. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 6 / 169 i=1

Martingale Transform Leads to a Martingale Assume that the second moments of C n and Y n are finite. It follows from the Cauchy-Schwarz inequality that E Z n n E C i Y i i=1 n i=1 [EC 2 i EY 2 i ] 1/2 <. Since Y 1,..., Y n do not carry more information than F n, and σ(c 1,..., C n ) F n 1 (predictability), we have σ(z n ) F n. Due to the predictability of C, E(Z n Z n 1 F n 1 ) = E(C n Y n F n 1 ) = C n E(Y n F n 1 ) =. (Z n Z n 1, n ) is a martingale difference sequence, and (Z n, n ) is a martingale with respect to (F n, n =, 1,... ). Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 61 / 169

A Brownian Martingale Transform Consider Brownian motion B = (B s, s t) and a partition = t < t 1 < < t n 1 < t n = t. The σ-fields at these time instants are described by the filtration: F = {, Ω}, F i = σ(b tj, 1 j i), i = 1,..., n. The sequence B := ( i B, 1 i n) defined by B =, i B = B ti B ti 1, i = 1,..., n, forms a martingale difference sequence with respect to the filtration (F i, 1 i n). B := (B ti 1, 1 i n) is predictable with respect to (F i, 1 i n). The martingale transform B B is then a martingale: ( B B) k = k i=1 B t i 1 (B ti B ti 1 ), k = 1,..., n. This is precisely a discrete-time analogue of the Itô stochastic integral B sdb s. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 62 / 169

Martingale as a Fair Game Let X = (X n, n =, 1,... ) be a discrete-time martingale with respect to the filtration (F n, n =, 1,... ). Let Y n = X n X n 1, n, denote the martingale difference, and C n, n 1, be predictable with respect to (F n, n =, 1,... ). Think of Y, as your net winnings per unit stake at the n-th game which are adapted to a filtration (F n, n =, 1,... ). At the n-th game, your stake C n, does not contain more information than F n 1 does. At time n 1, this is the best information we have about the game. C n Y n is the net winnings for stake C n at the n-th game. (C Y ) n = n i=1 C iy i is the net winnings up to time n. The game is fair because the best prediction of the net winnings C n Y n of the n-th game, just before the n-th game starts, is zero: E(C n Y n F n 1 ) =. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 63 / 169

Outline The Itô Integrals The Stratonovich Integrals Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 64 / 169

Integrating With Respect To a Function Let B = (B t, t ) be Brownian motion. Goal: Define an integral of type 1 f (t)db t(ω), where f (t) is a function or a stochastic process on [, 1] and B t (ω) is a Brownian sample path. Difficulty: The path B t (ω) does not have a derivative. The pathwise integral of the Riemann-Stieltjes type is one option. Consider a partition of the interval [, 1]: τ n : = t < t 1 < t 2 <... t n 1 < t n = 1, n 1. Let f and g be two real-valued functions on [, 1] and define i g := g(t i ) g(t i 1 ), 1 i n and the Riemann-Stieltjes sum: S n = n f (y i ) i g = i=1 for t i 1 y i t i, i = 1,..., n. n f (y i )(g(t i ) g(t i 1 )), i=1 Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 65 / 169

Riemann-Stieltjes Integrals Definition If the limit S = lim n S n exists as mesh(τ n ) and S is independent of the choice of the partitions τ n, and their intermediate values y i s, then S, denoted by 1 f (t)dg(t), is called the Riemann-Stieltjes integral of f with respect to g on [, 1]. When does the Riemann-Stieltjes integral 1 f (t)dg(t) exist, and is it possible to take g = B for Brownian motion B on [, 1]? One usual assumption is that f is continuous and g has bounded variation: n sup g(t i ) g(t i 1 ) <. τ n i=1 But Brownian sample paths B t (ω) do not have bounded variation. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 66 / 169

Bounded p-variation The real function h on [, 1] is said to have bounded p-variation for some p > if n sup h(t i ) h(t i 1 ) p < τ n i=1 where the supremum is taken over all partitions τ of [, 1]. Brownian motion have bounded p-variation on any fixed finite interval, provided that p > 2, and unbounded variation for p 2. A Sufficient and Almost Necessary Condition The Riemann-Stieltjes integral 1 f (t)dg(t) exists if 1 The functions f and g do not have discontinuities at the same point t [, 1]. 2 The function f has bounded p-variation and the function g has bounded q-variation such that p 1 + q 1 > 1. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 67 / 169

Existence of the Riemann-Stieltjes Integral Assume that f is a differentiable function with bounded derivative f (t) on [, 1]. Then f has bounded variation. The Riemann-Stieltjes integral 1 f (t)db t (ω) exits for every Brownian sample path B t (ω). Example: 1 tk db t (ω) for k, 1 et db t (ω),... But the existence does not mean that you can evaluate these integrals explicitly in terms of Brownian motion. A more serious issue: How to define 1 B t(ω)db t (ω)? Brownian motion has bounded p-variation for p > 2, not for p 2, and so the sufficient condition 2p 1 > 1 for the existence of the Riemann-Stieltjes integral is not satisfied. In fact, it can be shown that 1 B t(ω)db t (ω) does not exist as a Riemann-Stieltjes integral. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 68 / 169

Another Fatal Blow to the Riemann-Stieltjes Approach It can be shown that if 1 f (t)dg(t) exists as a Riemann-Stieltjes integral for all continuous functions f on [, 1], then g necessarily has bounded variation. But Brownian sample paths do not have bounded variation on any finite interval. Since pathwise average with respect to a Brownian sample path, as suggested by the Riemann-Stieltjes integral, does not lead to a sufficiently large class of integrable functions f, one has to find a different approach to define the stochastic integrals such as 1 B t(ω)db t (ω). We will try to define the integral as a probabilistic average, leading to the Itô Integrals. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 69 / 169

A Motivating Example Let B = (B t, t ) be Brownian motion. Consider a partition of [, t]: τ n : = t < t 1 < < t n 1 < t n = t, with i = t i t i 1, n 1 and the Riemann-Stieltjes sums, for n 1, S n = n B ti 1 i B, with i B := B ti B ti 1 1 i n. i=1 Rewriting: S n = 1 2 B2 t 1 2 n i=1 ( ib) 2 =: 1 2 B2 t 1 2 Q n(t). The limit of S n boils down to the limit of Q n (t), as n. One can show that Q n (t) does not converge for a given Brownian sample path and suitable choices of partitions τ n. We will show that Q n (t) converges in probability to t, as n. This is the key to define the Itô Integral! Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 7 / 169

Quadratic Variation of Brownian Motion Since Brownian motion has independent and stationary increments, E( i B j B) = for i j and E( i B) 2 = Var( i B) = t i t i 1 = i. Thus E(Q n (t)) = n i=1 E( ib) 2 = n i=1 i = t. Var(Q n (t)) = n i=1 Var(( ib) 2 ) = n i=1 [E(( ib) 4 ) 2 i ]. Since E(B1 4 ) = 3 (standard normal), we have E(( i B) 4 ) = EBt 4 i t i 1 = E( 1/2 i B 1 ) 4 = 3 2 i (self-similarity). Thus Var(Q n (t)) = 2 n i=1 2 i. If mesh(τ n ) = max 1 i n i, we obtain that n Var(Q n (t)) = E(Q n (t) t) 2 2mesh(τ n ) i = 2t mesh(τ n ). It follows from the Chebyshev inequality that Q n (t) t in probability as mesh(τ n ) (n ). This limiting function f (t) = t is called the quadratic variation of Brownian motion. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 71 / 169 i=1

Mean Square Limit is a Martingale The quadratic variation is an emerging characteristic only for Brownian motion. Since S n = 1 2 B2 t 1 2 Q n(t) converges in mean square to 1 2 B2 t 1 2 t, we define the Itô Integral in the mean square sense: B s db s = 1 2 (B2 t t). The values of Brownian motion were evaluated at the left end points of the intervals [t i l, t i ], then the martingale transform B B = k B ti 1 (B ti B ti 1 ) i=1 is a martingale with respect to the filtration σ(b ti, i k), for all k = 1,..., n. As a result, the mean square limit lim n S n = 1 2 (B2 t t) is a martingale with respect to the natural Brownian filtration. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 72 / 169

Heuristic Rules The increment i B = B ti B ti 1 on the interval [t i 1, t i ] satisfies E( i B) =, Var( i B) = i = t i t i 1. These properties suggest that ( i B) 2 is of order i. In terms of differentials, we write (db t ) 2 = (B t+dt B t ) 2 = dt. In terms of integrals, we write (db s ) 2 = ds = t. These rules can be made mathematically precise in the mean square sense. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 73 / 169

Stratonovich Integral Consider partitions τ n : = t < t 1 < < t n 1 < t n = t, with mesh(τ n ). Using the same arguments and tools as given for the Itô Integral, the Riemann-Stieltjes sums n S n = B yi i B, with i B := B ti B ti 1, i=1 where y i = 1 2 (t i 1 + t i ), 1 i n, converges to the mean square limit 1 2 B2 t. This quantity is called the Stratonovich stochastic integral and denoted by B t db t = 1 2 B2 t. The Riemann-Stieltjes sums k i=1 B y i i B, k = 1,..., n, do not constitute a martingale, and neither does the limit process 1 2 B2 t. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 74 / 169

Itô Integral vs Stratonovich Integral The Itô Integral is a martingale with respect to the natural Brownian filtration, but its does not obey the classical chain rule of integration. A chain rule which is well suited for Itô integration is given by the Itô lemma. The Stratonovich Integral is not a martingale, but it does obey the classical chain rule of integration. It turns out that the Stratonovich Integral will also be a useful tool for solving Itô stochastic differential equations. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 75 / 169

Outline The Itô Stochastic Integrals Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 76 / 169

Simple Processes Let B = (B t, t ) denote Brownian motion and F t = σ(b s, s t) denote the corresponding natural filtration. Consider a partition on [, T ]: τ n : = t < t 1 <... t n 1 < t n = T. The stochastic process C = (C t, t [, T ]) is said to be simple if there exists a sequence (Z i, i = 1,..., n) of random variables such that (Z i, i = 1,..., n) is adapted to (F ti, i n), i.e., Z i is a function of (B s, s t i 1 ) and EZ 2 i <, 1 i n. C t = n i=1 Z ii{t i 1 t < t i } + Z n I{t = T }. Example: f n (t) = n i=1 i 1 n I [ i 1 n n 1, i )(t) + n I {T }(t) on [, 1]. n Example: C n (t) = n i=1 B t i 1 I [ i 1 n, i n )(t) + B t n 1 I {T } (t) on [, T ]. Note that C t is a function of Brownian motion until time t. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 77 / 169

Itô Stochastic Integrals of Simple Processes Define T C s db s := n C ti 1 (B ti B ti 1 ) = i=1 n Z i i B. Itô Integrals of Simple Processes on [, t], t k 1 t < t k C s db s := T i=1 k 1 C s I [,t] (s)db s = Z i i B + Z k (B t B tk 1 ). Example: f n(s)db s = k 1 i=1 i 1 n (B t i B ti 1 ) + k 1 n (B t B tk 1 ) for k 1 n t < k n. Note that lim n i=1 f n (s)db s = sdb s. Example: o C n(s)db s = k 1 i=1 B t i 1 i B + B tk 1 (B t B tk 1 ) for t k 1 t < t k. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 78 / 169

Itô Integral of a Simple Process is a Martingale The form of the Itô stochastic integral for simple processes very much reminds us of a martingale transform, which results in a martingale. A Martingale Property The stochastic process I t (C) := C sdb s, t [, T ], is a martingale with respect to the natural Brownian filtration (F t, t [, T ]). Using the isometry property, E( I t (C) ) <, for all t [, T ]. I t (C) is adapted to (F t, t [, T ]). E(I t (C) F s ) = I s (C), for s < t. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 79 / 169

Properties The Itô stochastic integral has expectation zero. The Itô stochastic integral satisfies the isometry property: ( ) 2 E C s db s = ECs 2 ds, t [, T ]. For any constants c 1 and c 2, and simple processes C (1) and C (2) on [, T ], (c 1 C (1) s + c 2 C (2) s )db s = c 1 For any t [, T ], T C s db s = C (1) s db s + c 2 C (2) s db s. T C s db s + C s db s. t The process I(C) has continuous sample paths. Haijun Li An Introduction to Stochastic Calculus Lisbon, May 218 8 / 169