k-layer neural networks: High capacity scoring functions + tips on how to train them

Size: px
Start display at page:

Download "k-layer neural networks: High capacity scoring functions + tips on how to train them"

Transcription

1 k-layer neural networks: High capacity scoring functions + tips on how to train them

2 A new class of scoring functions Linear scoring function s = W x + b 2-layer Neural Network s 1 = W 1 x + b 1 h = max(0, s 1 ) s = W 2 h + b 2 xd xd. s3. s1,m hm s3 x3 s2 x3.. s2 x2 s1 x2 s1,1 h1 s1 x1 x1 Input: x Before Output: s = W x + b Input: x s1 = W1x + b1 h = max(0, s1) s = W2h + b2 Now

3 Not restricted to two layers 2-layer Neural Network s 1 = W 1 x + b 1 h = max(0, s 1 ) s = W 2 h + b 2 3-layer Neural Network s 1 = W 1 x + b 1 h 1 = max(0, s 1 ) s 2 = W 2 h 1 + b 2 h 2 = max(0, s 2 ) s = W 3 h 2 + b 3 xd xd. hm s3. h1,m1 h2,m2 s3 x3. s2 x3.. s2 x2 h1 s1 x2 h1,1 h2,1 s1 x1 x1 Input: x s1 = W1x + b1 Output: s = W2h + b2 h = max(0, s1) Input: x s1 = W1x + b1 s2 = W2h1 + b2 Output: s = W3h2 + b3 h1 = max(0, s1) h2 = max(0, s2)

4 Some terminology 3-layer Neural Network s 1 = W 1 x + b 1 W 1 is m 1 d 1st hidden layer activations h 1 = max(0, s 1 ) apply non-linearity via activation fn s 2 = W 2 h 1 + b 2 W 2 is m 2 m 1 2nd hidden layer activations h 2 = max(0, s 2 ) apply non-linearity via activation fn Output responses s = W 3 h 2 + b 3 W 3 is c m 2 xd. h1,m1 h2,m2 s3 x3.. s2 x2 h1,1 h2,1 s1 x1 Input: x s1 = W1x + b1 s2 = W2h1 + b2 Output: s = W3h2 + b3 h1 = max(0, s1) h2 = max(0, s2) Sometimes referred to as a 2-hidden-layer neural network.

5 Computational Graph of our 2-layer neural network W 1 x + b 1 max(0, s 1 ) W 2 h + b 2 x s 1 h s W 1 b 1 W 2 b 2

6 2-layer neural network with probabilistic outputs W 1 x + b 1 max(0, s 1 ) W 2 h + b 2 softmax(s) x s 1 h s p W 1 b 1 W 2 b 2

7 Effect of the number of hidden nodes in a 2 layer network m = 3 m = 20 m = 30 m = 100 m is the number of nodes in the hidden layer. No regularization.

8 Result depends on parameter initialization m = 3 m = 20 m = 30 m = 100 m is the number of nodes in the hidden layer. No regularization. Different random parameter initialization to previous slide.

9 Effect of regularization J(D, λ, Θ) = (x,y) D l(x, y, Θ) + λr(θ) λ = 0 λ =.001 λ =.01 λ =.1 m = 100 nodes in the hidden layer. L 2 regularization. Do not use size of neural network as a regularizer. Use stronger regularization.

10 High-level overview of how to train network Mini-batch GD (or variant) Loop 1. Sample a batch of the training data. 2. Forward propagate it through the graph and calculate loss/cost. 3. Backward propagate to calculate the gradients. 4. Update the parameters using the gradient.

11 Options for activation functions Sigmoid tanh ReLu 1 σ(x) 1 tanh(x) 10 max (0, x) x x x σ(x) = 1 1+exp( x) tanh(x) = exp(x) exp( x) exp(x)+exp( x) ReLu(x) = max(0, x) Activation function is applied independently to each element of the score vector.

12 Options for activation Functions Leaky ReLu ELU 10 8 max (0.1x, x) 10 8 ELU(x) x x max(0.1x, x) ELU(x) = { x if x > 0 α (exp(x) 1)) otherwise Activation function is generally applied independently to each element of vector.

13 Options for Activation Functions Sigmoid tanh ReLu 1 σ(x) 1 tanh(x) 10 max (0, x) x x x σ(x) = 1 1+exp( x) tanh(x) = exp(x) exp( x) exp(x)+exp( x) ReLu(x) = max(0, x) In modern networks ReLU is the most common activation function.

14 Better understanding of gradient flows during BackProp has helped training of neural networks Understanding Effect of Activation Functions

15 Sigmoid 1 σ(x) dσ(x) dx σ(x) = exp( x) 0.5 Problems 1. Saturated activations kill the gradient flow. 2. Sigmoid outputs are not zero-centered. 3. exp() is expensive to compute x

16 tanh 1 tanh(x) d tanh(x) dx tanh(x) = Properties exp(x) exp( x) exp(x) + exp( x) 1. Squashes numbers to range [ 1, 1]. 2. Tanh outputs are zero-centered. 3. Saturated activations kill the gradients x

17 Rectified Linear Unit (ReLu) 10 8 max (0, x) d max (0,x) dx ReLu(x) = max(0, x) Pros 1. Does not saturate for large positive x x 2. Very computationally efficient. 3. In practice training of a ReLu network converges much faster than one with sigmoid/tanh activation functions. 4. Output is not zero-centered 5. Negative activations have zero gradients and freezes some parameter weights.

18 Effect of weight initialization & activation function on gradient flow

19 Some activation histograms Initialize a 10-layer network with 500 nodes at each layer. Use a tanh activation function at each layer. Initialize weights will small random numbers. Generate random input data (N(0, 1 2 )) with d = Layer 1 Layer 2 Layer 3 Layer Layer 5 Layer 6 Layer 7 Layer 8 Histograms of activations at each layer

20 Change the initialization to bigger random numbers Almost all neurons completely saturated, either -1 or +1. = Gradients will be all zero (Remember the picture of the gradient of tanh.) Layer 1 Layer 2 Layer 3 Layer Layer 5 Layer 6 Layer 7 Layer 8 Histograms of activations at each layer

21 Change the initialization to Xavier initialization Initialize a 10-layer network with 500 nodes at each layer. Use a tanh activation function at each layer. Initialize weights with Xavier initialization: W i,lm N(w; 0, 1/ 500). Generate random input data (N(0, 1 2 )) with d = Layer 1 Layer 2 Layer 3 Layer Layer 5 Layer 6 Layer 7 Layer 8 Histograms of activations at each layer

22 Lessening the effect of initialization: Batch normalization

23 Batch Normalization Want unit Gaussian activations at each layer? Just make them unit Guassian! Idea introduced in: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, S. Ioffe, C. Szegedy, arxiv Consider activations at some layer for a batch: s (j) 1, s(j) 2..., s n (j) To make each dimension unit gaussian, apply: ŝ (j) i ( ) = diag(σ 1,..., σ m ) 1 s (j) i µ where µ = 1 n n i=1 s (j) i, σ 2 p = 1 n n (s (j) i, p µ p) 2 i=1

24 Batch Normalization Usually apply normalization after the fully connected layer before non-linearity. Therefore for a k layer network have - for i = 1,..., k 1 for (x (i 1), y) D Apply ith linear transformation to batch s (i) = W i x (i 1) + b i end Compute batch mean and variances of ith layer: µ = 1 s (i), σ 2 j D = 1 ( s (i) ) 2 j µ j for j = 1,..., mi s (i) D D s (i) D for (s (i), y) D Apply BN and activation function ŝ (i) = BatchNormalise(s (i), µ, σ 1,..., σ mi ) x (i) = max (0, ŝ (i)) end end - Apply final linear transformation: s (k) = W k x (k 1) + b k

25 Batch Normalization: Scale & shift range Can also allow the network to squash and shift the range of the ŝ (i) s at each layer. ŝ (i) = γ (i) ŝ (i) + β (i) Can learn the γ (i) s and β (i) s and add them as parameters of the network. To keep things simple this added complexity is often omitted.

26 Benefits of Batch Normalization Improves gradient flow through the network. Reduces the strong dependence on initialization. = learn deeper networks more reliably. Allows higher learning rates. Acts as a form of regularization. If training a deep network, you should use Batch Normalization.

27 Batch Normalization at Test Time At test time do not have a batch. Instead fixed empirical mean and variances of activations at each level are used. These quantities estimated during training (with running averages).

28 Baby sitting the training process

29 Training neural networks not completely trivial Several hyper-parameters affect the quality of your training. These include - learning rate - degree of regularization - network architecture - hyper-parameters controlling weight initialization If these (potentially correlated) hyper-parameters are not appropriately set = you will not learn an effective network. Multiple quantities you should monitor during training. These quantities indicate - a reasonable hyper-parameter setting and/or - how hyper-parameters setting could be changed for the better.

30 What to monitor during training

31 Monitor & Visualize the loss/cost curve Evolution of your training loss is telling you something! Typical training loss over time

32 Telltale sign of a bad initialization

33 Monitor & visualize the accuracy Gap between training and validation accuracy indicates amount of over-fitting. Over-fitting = should increase regularization during training: - increase the degree of L 2 regularization - more dropout - use more training data.

34 Monitor & visualize the accuracy Gap between training and validation accuracy indicates amount of over-fitting. Under-fitting = model capacity not high enough: - increase the size of the network

35 Optimization of the training hyper-parameters

36 Hyperparameters to adjust Initial learning rate. Learning rate decay schedule. Regularization strength - L 2 penalty - Dropout strength

37 Cross-validation strategy Do a coarse fine cross-validation in stages. Stage 0: Identify the range of feasible learning rates & regularization penalties. (usually done interactively and train only for a few updates.) Stage 1: Broad search. Goal is to narrow the search range. Only run training for a few epochs. Stage 2: Finer search. Increase training times. Stage...: Repeat Stage 2 as necessary. Use performance on the validation set to identify good hyper-parameter settings.

38 Prefer random search to grid search randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid Random Search for Hyper-Parameter Optimization, Bergstra and Bengio, 2012

39 Parameter Updates: Variations of Stochastic Gradient Descent

40 One weakness of SGD SGD can be very slow... Example: Use SGD to find the optimum of f(x) = exp(.5x T Σx) 150 iterations, η =.01 Curves show the iso-contours of f(x) SGD has trouble navigating ravines as it oscillates across the bottom of the ravine. Could increase learning rate but increased the learning rate = more likely the optimizer will diverge. Unfortunately, ravines are common around local optima.

41 Solution: SGD with momentum Introduce momentum vector as well as the gradient vector. Let γ [0, 1] and v is the momentum vector v (t+1) = γ v (t) + η x f(x (t) ) x (t+1) = x (t) v (t+1) update vector Typically set γ in somewhere in the range [.9,.99]. e (t+1) η x f(x (t) ) x (t+1) γv (t) γv (t) η x f(x (t) ) x (t) η xf(x (t) )

42 How and why momentum helps How? Momentum helps accelerate SGD in the appropriate direction. Momentum dampens the oscillations of default SGD. = Faster convergence. Why? (γ =.9, η =.01, 150 iterations) For dimensions whose gradient is constantly changing then their entries in the update vector are damped. For dimensions whose gradient is approx. constant then their entries in the update vector are not damped.

43 Momentum not the complete answer When using momentum = can pick up too much speed in one direction. = can overshoot the local optimum. (γ =.9, η =.03)

44 Solution: Nesterov accelerated gradient (NAG) Look and measure ahead. Use gradient at an estimate of the parameters at the next iteration. Let γ [0, 1] then e (t+1) = x (t) γv (t) estimate of x (t+1) v (t+1) = γ v (t) + η x f(e (t+1) ) update vector x (t+1) = x (t) v (t+1) Typically γ set to.9. e (t+1) η xf(x (t) ) x (t+1) e (t+1) η xf(e (t+1) ) γv (t) γv (t) x (t+1) x (t) η xf(x (t) ) γv (t) η xf(x (t) ) Momentum update x (t) η xf(x (t) ) γv (t) η xf(e (t+1) ) NAG update

45 How and why NAG helps The anticipatory update prevents the algorithm having too large updates and overshooting. Algorithm has increased responsiveness to the landscape of f. (γ =.9, η =.01, 150 iterations) Note: NAG shown to greatly increase the ability to train RNNs: Bengio, Y., Boulanger-Lewandowski, N. & Pascanu, R. Advances in Optimizing Recurrent Networks, (2012).

46 Improvements to NAG Want to adapt the updates to each individual parameter. Perform larger or smaller updates depending on the landscape of the cost function. Family of algorithms with adaptive learning rates - AdaGrad - AdaDelta - RMSProp - Adam

47 AdaGrad For a cleaner statement introduce some notation: g t = x f(x (t) ) and g t = (g t,1,..., g t,d ) T. Keep a record of the sum of the squares of the gradients w.r.t. each x i up to time t: G t,i = t j=1 g 2 j,i The AdaGrad update step for each dimension is x (t+1) i = x (t) i Usually set ɛ = 1e 8 and η =.01. η Gt,i + ɛ g t,i J. Duchi, E. Hazan & Y. Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, Journal of Machine Learning Research, 2011.

48 Adagrad s convergence on our toy problem (ɛ = 1e 8, η =.01, 150 iterations)

49 Big weakness of AdaGrad Each g 2 t,i is positive. = Each G t,i = t j=1 g2 j,i keeps growing during training. = the effective learning rate η/( G t,i + ɛ) shrinks and eventually 0. = updates of x (t) stop.

50 AdaDelta Devised as an improvement to AdaGrad. Tackles AdaGrad s convergence to zero of the learning rate as t increases. AdaDelta s two central ideas - scale learning rate based on the previous gradient values (like AdaGrad) but only using a recent time window, - include an acceleration term (like momentum) by accumulating prior updates. M. Zeiler, ADADELTA: An Adaptive Learning Rate Method,

51 Technical details of AdaDelta Compute gradient vector g t at current estimate x (t). Update average of previous squared gradients (AdaGrad-like step) G t,i = ρ G t 1,i + (1 ρ) g 2 t,i Compute the update vector Ut 1,i + ɛ u t,i = Gt,i + ɛ g t,i Compute exponentially decaying average of updates (momentum-like step) The AdaDelta update step: U t,i = ρ U t 1,i + (1 ρ) u 2 t,i x (t+1) i = x (t) i u t,i

52 Adaptive Moment Estimation (Adam) Computes adaptive learning rates for each parameter. How? - Stores an exponentially decaying average of past gradients m (t) and past squared gradients v (t) - m (t) and v (t) are estimates respectively of the first and second moments of the gradient in each dimension. - Uses the variance+mean 2 estimate to damp the update in dimensions with high second moment D. P. Kingma & J. L. Ba, Adam: a Method for Stochastic Optimization, International Conference on Learning Representations, 2015.

53 Update equations for Adam Let g t = x f(x (t) ) m (t+1) = β 1 m (t) + (1 β 1 ) g t v (t+1) = β 2 v (t) + (1 β 2 ) g t. g t Set m (0) = v (0) = 0 = m (t) and v (t) are biased towards zero (especially during the initial time-steps). Counter these biases by setting: The Adam update rule: ˆm (t+1) = m(t+1) 1 β1 t, ˆv (t+1) = v(t+1) 1 β2 t x (t+1) = x (t) η ˆv (t+1) + ɛ ˆm(t+1) Suggested default values β 1 =.9, β 2 =.999, ɛ = 10 8.

54 Adam s performance on our toy problem (default parameter settings, 150 iterations)

55 Comparison of different algorithms on our toy problem Adam Adagrad NAG Momentum SGD (ɛ = 1e 8, γ =.9, η =.01, 150 iterations) (ɛ = 1e 8, γ =.9, η =.03, 150 iterations)

56 Which optimizer to use? Data sparse = likely to achieve best results using one of the adaptive learning-rate methods. Using the adaptive learning-rate methods = won t need to tune the learning rate (much!). RMSprop, AdaDelta, and Adam are very similar algorithms that do well in similar circumstances. Adam slightly outperforms RMSProp near the end of optimization. Adam might be the best overall choice. But vanilla SGD (without momentum) and a simple learning rate annealing schedule may be sufficient. But time until finding a local minimum may be long...

57 Annealing the learning rate

58 Useful to anneal the learning rate When training deep networks, usually helpful to anneal the learning rate over time. Why? - Stops the parameter vector from bouncing around too widely. - = can reach into deeper, but narrower parts of the loss function. But knowing when to decay the learning rate is tricky! Decay too slowly = waste computations bouncing around chaotically with little improvement. Decay too aggressively = system unable to reach the best position it can.

59 Common approaches to learning rate decay Step decay: After every nth epoch set η = αη where α (0, 1). (Instead sometimes people monitor the validation loss and reduce the learning rate when this loss stops improving.) Exponential decay: η = η 0 e kt where t is iteration number (either w.r.t. number of update steps or epochs). Then η 0 and k are hyper-parameters. 1/t decay: η = η kt Step decay most common. Better to decay conservatively and train for longer.

Lecture 4 - k-layer Neural Networks

Lecture 4 - k-layer Neural Networks Lecture 4 - k-layer Neural Networks DD2424 May 9, 207 A new class of scoring functions Linear scoring function s = W x + b 2-layer Neural Network s = W x + b h = max(0, s ) s = W 2 h + b 2 xd xd. s3. s,m

More information

Machine Learning (CSE 446): Pratical issues: optimization and learning

Machine Learning (CSE 446): Pratical issues: optimization and learning Machine Learning (CSE 446): Pratical issues: optimization and learning John Thickstun guest lecture c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 10 Review 1 / 10 Our running example

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Large-Scale SVM Optimization: Taking a Machine Learning Perspective

Large-Scale SVM Optimization: Taking a Machine Learning Perspective Large-Scale SVM Optimization: Taking a Machine Learning Perspective Shai Shalev-Shwartz Toyota Technological Institute at Chicago Joint work with Nati Srebro Talk at NEC Labs, Princeton, August, 2008 Shai

More information

Support Vector Machines: Training with Stochastic Gradient Descent

Support Vector Machines: Training with Stochastic Gradient Descent Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Support vector machines Training by maximizing margin The SVM

More information

Bayesian Finance. Christa Cuchiero, Irene Klein, Josef Teichmann. Obergurgl 2017

Bayesian Finance. Christa Cuchiero, Irene Klein, Josef Teichmann. Obergurgl 2017 Bayesian Finance Christa Cuchiero, Irene Klein, Josef Teichmann Obergurgl 2017 C. Cuchiero, I. Klein, and J. Teichmann Bayesian Finance Obergurgl 2017 1 / 23 1 Calibrating a Bayesian model: a first trial

More information

arxiv: v3 [q-fin.cp] 20 Sep 2018

arxiv: v3 [q-fin.cp] 20 Sep 2018 arxiv:1809.02233v3 [q-fin.cp] 20 Sep 2018 Applying Deep Learning to Derivatives Valuation Ryan Ferguson and Andrew Green 16/09/2018 Version 1.3 Abstract This paper uses deep learning to value derivatives.

More information

distribution of the best bid and ask prices upon the change in either of them. Architecture Each neural network has 4 layers. The standard neural netw

distribution of the best bid and ask prices upon the change in either of them. Architecture Each neural network has 4 layers. The standard neural netw A Survey of Deep Learning Techniques Applied to Trading Published on July 31, 2016 by Greg Harris http://gregharris.info/a-survey-of-deep-learning-techniques-applied-t o-trading/ Deep learning has been

More information

Deep Learning - Financial Time Series application

Deep Learning - Financial Time Series application Chen Huang Deep Learning - Financial Time Series application Use Deep learning to learn an existing strategy Warning Don t Try this at home! Investment involves risk. Make sure you understand the risk

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Scaling SGD Batch Size to 32K for ImageNet Training

Scaling SGD Batch Size to 32K for ImageNet Training Scaling SGD Batch Size to 32K for ImageNet Training Yang You Computer Science Division of UC Berkeley youyang@cs.berkeley.edu Yang You (youyang@cs.berkeley.edu) 32K SGD Batch Size CS Division of UC Berkeley

More information

Deep Learning in Asset Pricing

Deep Learning in Asset Pricing Deep Learning in Asset Pricing Luyang Chen 1 Markus Pelger 1 Jason Zhu 1 1 Stanford University November 17th 2018 Western Mathematical Finance Conference 2018 Motivation Hype: Machine Learning in Investment

More information

Application of Deep Learning to Algorithmic Trading

Application of Deep Learning to Algorithmic Trading Application of Deep Learning to Algorithmic Trading Guanting Chen [guanting] 1, Yatong Chen [yatong] 2, and Takahiro Fushimi [tfushimi] 3 1 Institute of Computational and Mathematical Engineering, Stanford

More information

Is Greedy Coordinate Descent a Terrible Algorithm?

Is Greedy Coordinate Descent a Terrible Algorithm? Is Greedy Coordinate Descent a Terrible Algorithm? Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, Hoyt Koepke University of British Columbia Optimization and Big Data, 2015 Context: Random

More information

Exercise List: Proving convergence of the (Stochastic) Gradient Descent Method for the Least Squares Problem.

Exercise List: Proving convergence of the (Stochastic) Gradient Descent Method for the Least Squares Problem. Exercise List: Proving convergence of the (Stochastic) Gradient Descent Method for the Least Squares Problem. Robert M. Gower. October 3, 07 Introduction This is an exercise in proving the convergence

More information

$tock Forecasting using Machine Learning

$tock Forecasting using Machine Learning $tock Forecasting using Machine Learning Greg Colvin, Garrett Hemann, and Simon Kalouche Abstract We present an implementation of 3 different machine learning algorithms gradient descent, support vector

More information

Predicting stock prices for large-cap technology companies

Predicting stock prices for large-cap technology companies Predicting stock prices for large-cap technology companies 15 th December 2017 Ang Li (al171@stanford.edu) Abstract The goal of the project is to predict price changes in the future for a given stock.

More information

Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks

Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks Yangtuo Peng A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE

More information

Portfolio replication with sparse regression

Portfolio replication with sparse regression Portfolio replication with sparse regression Akshay Kothkari, Albert Lai and Jason Morton December 12, 2008 Suppose an investor (such as a hedge fund or fund-of-fund) holds a secret portfolio of assets,

More information

Introduction to Reinforcement Learning. MAL Seminar

Introduction to Reinforcement Learning. MAL Seminar Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology

More information

Distributed Approaches to Mirror Descent for Stochastic Learning over Rate-Limited Networks

Distributed Approaches to Mirror Descent for Stochastic Learning over Rate-Limited Networks Distributed Approaches to Mirror Descent for Stochastic Learning over Rate-Limited Networks, Detroit MI (joint work with Waheed Bajwa, Rutgers) Motivation: Autonomous Driving Network of autonomous automobiles

More information

Machine Learning and Options Pricing: A Comparison of Black-Scholes and a Deep Neural Network in Pricing and Hedging DAX 30 Index Options

Machine Learning and Options Pricing: A Comparison of Black-Scholes and a Deep Neural Network in Pricing and Hedging DAX 30 Index Options Machine Learning and Options Pricing: A Comparison of Black-Scholes and a Deep Neural Network in Pricing and Hedging DAX 30 Index Options Student Number: 484862 Department of Finance Aalto University School

More information

Machine Learning for Quantitative Finance

Machine Learning for Quantitative Finance Machine Learning for Quantitative Finance Fast derivative pricing Sofie Reyners Joint work with Jan De Spiegeleer, Dilip Madan and Wim Schoutens Derivative pricing is time-consuming... Vanilla option pricing

More information

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2 COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

Stock Market Index Prediction Using Multilayer Perceptron and Long Short Term Memory Networks: A Case Study on BSE Sensex

Stock Market Index Prediction Using Multilayer Perceptron and Long Short Term Memory Networks: A Case Study on BSE Sensex Stock Market Index Prediction Using Multilayer Perceptron and Long Short Term Memory Networks: A Case Study on BSE Sensex R. Arjun Raj # # Research Scholar, APJ Abdul Kalam Technological University, College

More information

2D5362 Machine Learning

2D5362 Machine Learning 2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files

More information

Deep Learning for Forecasting Stock Returns in the Cross-Section

Deep Learning for Forecasting Stock Returns in the Cross-Section Deep Learning for Forecasting Stock Returns in the Cross-Section Masaya Abe 1 and Hideki Nakayama 2 1 Nomura Asset Management Co., Ltd., Tokyo, Japan m-abe@nomura-am.co.jp 2 The University of Tokyo, Tokyo,

More information

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

Investing through Economic Cycles with Ensemble Machine Learning Algorithms Investing through Economic Cycles with Ensemble Machine Learning Algorithms Thomas Raffinot Silex Investment Partners Big Data in Finance Conference Thomas Raffinot (Silex-IP) Economic Cycles-Machine Learning

More information

Journal of Internet Banking and Commerce

Journal of Internet Banking and Commerce Journal of Internet Banking and Commerce An open access Internet journal (http://www.icommercecentral.com) Journal of Internet Banking and Commerce, December 2017, vol. 22, no. 3 STOCK PRICE PREDICTION

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization for Strongly Convex Stochastic Optimization Microsoft Research New England NIPS 2011 Optimization Workshop Stochastic Convex Optimization Setting Goal: Optimize convex function F ( ) over convex domain

More information

Applications of Neural Networks

Applications of Neural Networks Applications of Neural Networks MPhil ACS Advanced Topics in NLP Laura Rimell 25 February 2016 1 NLP Neural Network Applications Language Models Word Embeddings Tagging Parsing Sentiment Machine Translation

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

LendingClub Loan Default and Profitability Prediction

LendingClub Loan Default and Profitability Prediction LendingClub Loan Default and Profitability Prediction Peiqian Li peiqian@stanford.edu Gao Han gh352@stanford.edu Abstract Credit risk is something all peer-to-peer (P2P) lending investors (and bond investors

More information

SYSM 6304: Risk and Decision Analysis Lecture 6: Pricing and Hedging Financial Derivatives

SYSM 6304: Risk and Decision Analysis Lecture 6: Pricing and Hedging Financial Derivatives SYSM 6304: Risk and Decision Analysis Lecture 6: Pricing and Hedging Financial Derivatives M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

Portfolio Management and Optimal Execution via Convex Optimization

Portfolio Management and Optimal Execution via Convex Optimization Portfolio Management and Optimal Execution via Convex Optimization Enzo Busseti Stanford University April 9th, 2018 Problems portfolio management choose trades with optimization minimize risk, maximize

More information

Deep learning analysis of limit order book

Deep learning analysis of limit order book Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Spring 5-18-2018 Deep learning analysis of limit order book

More information

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Solutions to Final Exam

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Solutions to Final Exam Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (30 pts) Answer briefly the following questions. 1. Suppose that

More information

Anurag Sodhi University of North Carolina at Charlotte

Anurag Sodhi University of North Carolina at Charlotte American Put Option pricing using Least squares Monte Carlo method under Bakshi, Cao and Chen Model Framework (1997) and comparison to alternative regression techniques in Monte Carlo Anurag Sodhi University

More information

Machine Learning in Finance: The Case of Deep Learning for Option Pricing

Machine Learning in Finance: The Case of Deep Learning for Option Pricing Machine Learning in Finance: The Case of Deep Learning for Option Pricing Robert Culkin & Sanjiv R. Das Santa Clara University August 2, 2017 Abstract Modern advancements in mathematical analysis, computational

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

Machine Learning (CSE 446): Learning as Minimizing Loss

Machine Learning (CSE 446): Learning as Minimizing Loss Machine Learning (CSE 446): Learning as Minimizing Loss oah Smith c 207 University of Washington nasmith@cs.washington.edu October 23, 207 / 2 Sorry! o office hour for me today. Wednesday is as usual.

More information

1. You are given the following information about a stationary AR(2) model:

1. You are given the following information about a stationary AR(2) model: Fall 2003 Society of Actuaries **BEGINNING OF EXAMINATION** 1. You are given the following information about a stationary AR(2) model: (i) ρ 1 = 05. (ii) ρ 2 = 01. Determine φ 2. (A) 0.2 (B) 0.1 (C) 0.4

More information

Forecasting Foreign Exchange Rate during Crisis - A Neural Network Approach

Forecasting Foreign Exchange Rate during Crisis - A Neural Network Approach International Proceedings of Economics Development and Research IPEDR vol.86 (2016) (2016) IACSIT Press, Singapore Forecasting Foreign Exchange Rate during Crisis - A Neural Network Approach K. V. Bhanu

More information

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage 6 Point Estimation Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Point Estimation Statistical inference: directed toward conclusions about one or more parameters. We will use the generic

More information

1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016 AM 22: Advanced Optimization Spring 206 Prof. Yaron Singer Lecture 9 February 24th Overview In the previous lecture we reviewed results from multivariate calculus in preparation for our journey into convex

More information

Machine Learning in Computer Vision Markov Random Fields Part II

Machine Learning in Computer Vision Markov Random Fields Part II Machine Learning in Computer Vision Markov Random Fields Part II Oren Freifeld Computer Science, Ben-Gurion University March 22, 2018 Mar 22, 2018 1 / 40 1 Some MRF Computations 2 Mar 22, 2018 2 / 40 Few

More information

Financial Econometrics

Financial Econometrics Financial Econometrics Volatility Gerald P. Dwyer Trinity College, Dublin January 2013 GPD (TCD) Volatility 01/13 1 / 37 Squared log returns for CRSP daily GPD (TCD) Volatility 01/13 2 / 37 Absolute value

More information

Course information FN3142 Quantitative finance

Course information FN3142 Quantitative finance Course information 015 16 FN314 Quantitative finance This course is aimed at students interested in obtaining a thorough grounding in market finance and related empirical methods. Prerequisite If taken

More information

arxiv: v1 [q-fin.cp] 6 Oct 2016

arxiv: v1 [q-fin.cp] 6 Oct 2016 Efficient Valuation of SCR via a Neural Network Approach Seyed Amir Hejazi a, Kenneth R. Jackson a arxiv:1610.01946v1 [q-fin.cp] 6 Oct 2016 a Department of Computer Science, University of Toronto, Toronto,

More information

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques 6.1 Introduction Trading in stock market is one of the most popular channels of financial investments.

More information

The Use of Importance Sampling to Speed Up Stochastic Volatility Simulations

The Use of Importance Sampling to Speed Up Stochastic Volatility Simulations The Use of Importance Sampling to Speed Up Stochastic Volatility Simulations Stan Stilger June 6, 1 Fouque and Tullie use importance sampling for variance reduction in stochastic volatility simulations.

More information

COMPARING NEURAL NETWORK AND REGRESSION MODELS IN ASSET PRICING MODEL WITH HETEROGENEOUS BELIEFS

COMPARING NEURAL NETWORK AND REGRESSION MODELS IN ASSET PRICING MODEL WITH HETEROGENEOUS BELIEFS Akademie ved Leske republiky Ustav teorie informace a automatizace Academy of Sciences of the Czech Republic Institute of Information Theory and Automation RESEARCH REPORT JIRI KRTEK COMPARING NEURAL NETWORK

More information

Backpropagation. Deep Learning Theory and Applications. Kevin Moon Guy Wolf

Backpropagation. Deep Learning Theory and Applications. Kevin Moon Guy Wolf Deep Learning Theory and Applications Backpropagation Kevin Moon (kevin.moon@yale.edu) Guy Wolf (guy.wolf@yale.edu) CPSC/AMTH 663 Calculating the gradients We showed how neural networks can learn weights

More information

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (40 points) Answer briefly the following questions. 1. Consider

More information

Artificial Neural Networks Lecture Notes

Artificial Neural Networks Lecture Notes Artificial Neural Networks Lecture Notes Part 10 About this file: This is the printer-friendly version of the file "lecture10.htm". In case the page is not properly displayed, use IE 5 or higher. Since

More information

Foreign Exchange Forecasting via Machine Learning

Foreign Exchange Forecasting via Machine Learning Foreign Exchange Forecasting via Machine Learning Christian González Rojas cgrojas@stanford.edu Molly Herman mrherman@stanford.edu I. INTRODUCTION The finance industry has been revolutionized by the increased

More information

Application of Innovations Feedback Neural Networks in the Prediction of Ups and Downs Value of Stock Market *

Application of Innovations Feedback Neural Networks in the Prediction of Ups and Downs Value of Stock Market * Proceedings of the 6th World Congress on Intelligent Control and Automation, June - 3, 006, Dalian, China Application of Innovations Feedback Neural Networks in the Prediction of Ups and Downs Value of

More information

Lecture Note 9 of Bus 41914, Spring Multivariate Volatility Models ChicagoBooth

Lecture Note 9 of Bus 41914, Spring Multivariate Volatility Models ChicagoBooth Lecture Note 9 of Bus 41914, Spring 2017. Multivariate Volatility Models ChicagoBooth Reference: Chapter 7 of the textbook Estimation: use the MTS package with commands: EWMAvol, marchtest, BEKK11, dccpre,

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS Answer any FOUR of the SIX questions.

More information

- 1 - **** d(lns) = (µ (1/2)σ 2 )dt + σdw t

- 1 - **** d(lns) = (µ (1/2)σ 2 )dt + σdw t - 1 - **** These answers indicate the solutions to the 2014 exam questions. Obviously you should plot graphs where I have simply described the key features. It is important when plotting graphs to label

More information

Barrier Option. 2 of 33 3/13/2014

Barrier Option. 2 of 33 3/13/2014 FPGA-based Reconfigurable Computing for Pricing Multi-Asset Barrier Options RAHUL SRIDHARAN, GEORGE COOKE, KENNETH HILL, HERMAN LAM, ALAN GEORGE, SAAHPC '12, PROCEEDINGS OF THE 2012 SYMPOSIUM ON APPLICATION

More information

Predictive Model Learning of Stochastic Simulations. John Hegstrom, FSA, MAAA

Predictive Model Learning of Stochastic Simulations. John Hegstrom, FSA, MAAA Predictive Model Learning of Stochastic Simulations John Hegstrom, FSA, MAAA Table of Contents Executive Summary... 3 Choice of Predictive Modeling Techniques... 4 Neural Network Basics... 4 Financial

More information

Understanding Deep Learning Requires Rethinking Generalization

Understanding Deep Learning Requires Rethinking Generalization Understanding Deep Learning Requires Rethinking Generalization ChiyuanZhang 1 Samy Bengio 3 Moritz Hardt 3 Benjamin Recht 2 Oriol Vinyals 4 1 Massachusetts Institute of Technology 2 University of California,

More information

A Novel Prediction Method for Stock Index Applying Grey Theory and Neural Networks

A Novel Prediction Method for Stock Index Applying Grey Theory and Neural Networks The 7th International Symposium on Operations Research and Its Applications (ISORA 08) Lijiang, China, October 31 Novemver 3, 2008 Copyright 2008 ORSC & APORC, pp. 104 111 A Novel Prediction Method for

More information

Stock market price index return forecasting using ANN. Gunter Senyurt, Abdulhamit Subasi

Stock market price index return forecasting using ANN. Gunter Senyurt, Abdulhamit Subasi Stock market price index return forecasting using ANN Gunter Senyurt, Abdulhamit Subasi E-mail : gsenyurt@ibu.edu.ba, asubasi@ibu.edu.ba Abstract Even though many new data mining techniques have been introduced

More information

Gradient Descent and the Structure of Neural Network Cost Functions. presentation by Ian Goodfellow

Gradient Descent and the Structure of Neural Network Cost Functions. presentation by Ian Goodfellow Gradient Descent and the Structure of Neural Network Cost Functions presentation by Ian Goodfellow adapted for www.deeplearningbook.org from a presentation to the CIFAR Deep Learning summer school on August

More information

Reinforcement learning and Markov Decision Processes (MDPs) (B) Avrim Blum

Reinforcement learning and Markov Decision Processes (MDPs) (B) Avrim Blum Reinforcement learning and Markov Decision Processes (MDPs) 15-859(B) Avrim Blum RL and MDPs General scenario: We are an agent in some state. Have observations, perform actions, get rewards. (See lights,

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due

More information

FX Smile Modelling. 9 September September 9, 2008

FX Smile Modelling. 9 September September 9, 2008 FX Smile Modelling 9 September 008 September 9, 008 Contents 1 FX Implied Volatility 1 Interpolation.1 Parametrisation............................. Pure Interpolation.......................... Abstract

More information

Predicting Bitcoin Exchange Rate Values Can Machine Learning Algorithms Help?

Predicting Bitcoin Exchange Rate Values Can Machine Learning Algorithms Help? Predicting Bitcoin Exchange Rate Values Can Machine Learning Algorithms Help? Student: Kevin Su dmersen (ID: 1791791) Supervisor: Piotr Jelonek Date: September 12, 2018 University of Warwick Abstract Predicting

More information

SDMR Finance (2) Olivier Brandouy. University of Paris 1, Panthéon-Sorbonne, IAE (Sorbonne Graduate Business School)

SDMR Finance (2) Olivier Brandouy. University of Paris 1, Panthéon-Sorbonne, IAE (Sorbonne Graduate Business School) SDMR Finance (2) Olivier Brandouy University of Paris 1, Panthéon-Sorbonne, IAE (Sorbonne Graduate Business School) Outline 1 Formal Approach to QAM : concepts and notations 2 3 Portfolio risk and return

More information

Learning from Data: Learning Logistic Regressors

Learning from Data: Learning Logistic Regressors Learning from Data: Learning Logistic Regressors November 1, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Learning Logistic Regressors P(t x) = σ(w T x + b). Want to learn w and b using training data. As before:

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL

More information

Application of Soft-Computing Techniques in Accident Compensation

Application of Soft-Computing Techniques in Accident Compensation Application of Soft-Computing Techniques in Accident Compensation Prepared by Peter Mulquiney Taylor Fry Consulting Actuaries Presented to the Institute of Actuaries of Australia Accident Compensation

More information

Macroeconomics of the Labour Market Problem Set

Macroeconomics of the Labour Market Problem Set Macroeconomics of the Labour Market Problem Set dr Leszek Wincenciak Problem 1 The utility of a consumer is given by U(C, L) =α ln C +(1 α)lnl, wherec is the aggregate consumption, and L is the leisure.

More information

Iran s Stock Market Prediction By Neural Networks and GA

Iran s Stock Market Prediction By Neural Networks and GA Iran s Stock Market Prediction By Neural Networks and GA Mahmood Khatibi MS. in Control Engineering mahmood.khatibi@gmail.com Habib Rajabi Mashhadi Associate Professor h_mashhadi@ferdowsi.um.ac.ir Electrical

More information

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,

More information

Trinomial Tree. Set up a trinomial approximation to the geometric Brownian motion ds/s = r dt + σ dw. a

Trinomial Tree. Set up a trinomial approximation to the geometric Brownian motion ds/s = r dt + σ dw. a Trinomial Tree Set up a trinomial approximation to the geometric Brownian motion ds/s = r dt + σ dw. a The three stock prices at time t are S, Su, and Sd, where ud = 1. Impose the matching of mean and

More information

An enhanced artificial neural network for stock price predications

An enhanced artificial neural network for stock price predications An enhanced artificial neural network for stock price predications Jiaxin MA Silin HUANG School of Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR S. H. KWOK HKUST Business

More information

Option Pricing Using Bayesian Neural Networks

Option Pricing Using Bayesian Neural Networks Option Pricing Using Bayesian Neural Networks Michael Maio Pires, Tshilidzi Marwala School of Electrical and Information Engineering, University of the Witwatersrand, 2050, South Africa m.pires@ee.wits.ac.za,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Monte Carlo Methods Mark Schmidt University of British Columbia Winter 2019 Last Time: Markov Chains We can use Markov chains for density estimation, d p(x) = p(x 1 ) p(x }{{}

More information

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty George Photiou Lincoln College University of Oxford A dissertation submitted in partial fulfilment for

More information

Analysing the IS-MP-PC Model

Analysing the IS-MP-PC Model University College Dublin, Advanced Macroeconomics Notes, 2015 (Karl Whelan) Page 1 Analysing the IS-MP-PC Model In the previous set of notes, we introduced the IS-MP-PC model. We will move on now to examining

More information

IN finance applications, the idea of training learning algorithms

IN finance applications, the idea of training learning algorithms 890 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001 Cost Functions and Model Combination for VaR-Based Asset Allocation Using Neural Networks Nicolas Chapados, Student Member, IEEE, and

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline

More information

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0. Outline Coordinate Minimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University November 27, 208 Introduction 2 Algorithms Cyclic order with exact minimization

More information

Design and implementation of artificial neural network system for stock market prediction (A case study of first bank of Nigeria PLC Shares)

Design and implementation of artificial neural network system for stock market prediction (A case study of first bank of Nigeria PLC Shares) International Journal of Advanced Engineering and Technology ISSN: 2456-7655 www.newengineeringjournal.com Volume 1; Issue 1; March 2017; Page No. 46-51 Design and implementation of artificial neural network

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29 Chapter 5 Univariate time-series analysis () Chapter 5 Univariate time-series analysis 1 / 29 Time-Series Time-series is a sequence fx 1, x 2,..., x T g or fx t g, t = 1,..., T, where t is an index denoting

More information

Neuro-Genetic System for DAX Index Prediction

Neuro-Genetic System for DAX Index Prediction Neuro-Genetic System for DAX Index Prediction Marcin Jaruszewicz and Jacek Mańdziuk Faculty of Mathematics and Information Science, Warsaw University of Technology, Plac Politechniki 1, 00-661 Warsaw,

More information

Parallel Multilevel Monte Carlo Simulation

Parallel Multilevel Monte Carlo Simulation Parallel Simulation Mathematisches Institut Goethe-Universität Frankfurt am Main Advances in Financial Mathematics Paris January 7-10, 2014 Simulation Outline 1 Monte Carlo 2 3 4 Algorithm Numerical Results

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Monte Carlo Methods Mark Schmidt University of British Columbia Winter 2018 Last Time: Markov Chains We can use Markov chains for density estimation, p(x) = p(x 1 ) }{{} d p(x

More information

Utility Indifference Pricing and Dynamic Programming Algorithm

Utility Indifference Pricing and Dynamic Programming Algorithm Chapter 8 Utility Indifference ricing and Dynamic rogramming Algorithm In the Black-Scholes framework, we can perfectly replicate an option s payoff. However, it may not be true beyond the Black-Scholes

More information

Sublinear Time Algorithms Oct 19, Lecture 1

Sublinear Time Algorithms Oct 19, Lecture 1 0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation

More information

Eco504 Spring 2010 C. Sims FINAL EXAM. β t 1 2 φτ2 t subject to (1)

Eco504 Spring 2010 C. Sims FINAL EXAM. β t 1 2 φτ2 t subject to (1) Eco54 Spring 21 C. Sims FINAL EXAM There are three questions that will be equally weighted in grading. Since you may find some questions take longer to answer than others, and partial credit will be given

More information

Gamma. The finite-difference formula for gamma is

Gamma. The finite-difference formula for gamma is Gamma The finite-difference formula for gamma is [ P (S + ɛ) 2 P (S) + P (S ɛ) e rτ E ɛ 2 ]. For a correlation option with multiple underlying assets, the finite-difference formula for the cross gammas

More information

An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity

An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity Coralia Cartis, Nick Gould and Philippe Toint Department of Mathematics,

More information