$tock Forecasting using Machine Learning

Similar documents
Application of Deep Learning to Algorithmic Trading

ALGORITHMIC TRADING STRATEGIES IN PYTHON

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Support Vector Machines: Training with Stochastic Gradient Descent

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data

Stock Prediction Using Twitter Sentiment Analysis

STOCK PRICE PREDICTION: KOHONEN VERSUS BACKPROPAGATION

Foreign Exchange Forecasting via Machine Learning

CS 475 Machine Learning: Final Project Dual-Form SVM for Predicting Loan Defaults

STOCK MARKET PREDICTION AND ANALYSIS USING MACHINE LEARNING

Design and implementation of artificial neural network system for stock market prediction (A case study of first bank of Nigeria PLC Shares)

Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks

LendingClub Loan Default and Profitability Prediction

distribution of the best bid and ask prices upon the change in either of them. Architecture Each neural network has 4 layers. The standard neural netw

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Predicting Market Fluctuations via Machine Learning

Role of soft computing techniques in predicting stock market direction

Improving Stock Price Prediction with SVM by Simple Transformation: The Sample of Stock Exchange of Thailand (SET)

Using Structured Events to Predict Stock Price Movement: An Empirical Investigation. Yue Zhang

Development and Performance Evaluation of Three Novel Prediction Models for Mutual Fund NAV Prediction

Predicting stock prices for large-cap technology companies

Predicting the Success of a Retirement Plan Based on Early Performance of Investments

Deep learning analysis of limit order book

ECS171: Machine Learning

Portfolio Recommendation System Stanford University CS 229 Project Report 2015

Artificially Intelligent Forecasting of Stock Market Indexes

Prediction of securities behavior using a multi-level artificial neural network with extra inputs between layers

Journal of Internet Banking and Commerce

Keywords: artificial neural network, backpropagtion algorithm, derived parameter.

Predictive Model Learning of Stochastic Simulations. John Hegstrom, FSA, MAAA

Novel Approaches to Sentiment Analysis for Stock Prediction

Stock Price Prediction using Deep Learning

An enhanced artificial neural network for stock price predications

Wide and Deep Learning for Peer-to-Peer Lending

Relative and absolute equity performance prediction via supervised learning

Forecasting stock market prices

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)

Research Article Design and Explanation of the Credit Ratings of Customers Model Using Neural Networks

Bond Market Prediction using an Ensemble of Neural Networks

International Journal of Research in Engineering Technology - Volume 2 Issue 5, July - August 2017

Application of Support Vector Machine in Predicting the Market's Monthly Trend Direction

PASS Sample Size Software

A Machine Learning Investigation of One-Month Momentum. Ben Gum

Performance analysis of Neural Network Algorithms on Stock Market Forecasting

Machine Learning for Physicists Lecture 10. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

HKUST CSE FYP , TEAM RO4 OPTIMAL INVESTMENT STRATEGY USING SCALABLE MACHINE LEARNING AND DATA ANALYTICS FOR SMALL-CAP STOCKS

An introduction to Machine learning methods and forecasting of time series in financial markets

Option Pricing Using Bayesian Neural Networks

Large-Scale SVM Optimization: Taking a Machine Learning Perspective

Examining Long-Term Trends in Company Fundamentals Data

The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index

Application of Innovations Feedback Neural Networks in the Prediction of Ups and Downs Value of Stock Market *

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

Forecasting Price Movements using Technical Indicators: Investigating the Impact of. Varying Input Window Length

Abstract Making good predictions for stock prices is an important task for the financial industry. The way these predictions are carried out is often

Stock Market Index Prediction Using Multilayer Perceptron and Long Short Term Memory Networks: A Case Study on BSE Sensex

Do Media Sentiments Reflect Economic Indices?

Forecasting Agricultural Commodity Prices through Supervised Learning

Predicting Economic Recession using Data Mining Techniques

International Journal of Computer Engineering and Applications, Volume XII, Issue IV, April 18, ISSN

Predicting Foreign Exchange Arbitrage

Stock Market Prediction using Artificial Neural Networks IME611 - Financial Engineering Indian Institute of Technology, Kanpur (208016), India

COMPARING NEURAL NETWORK AND REGRESSION MODELS IN ASSET PRICING MODEL WITH HETEROGENEOUS BELIEFS

Keywords: artificial neural network, backpropagtion algorithm, capital asset pricing model

A Dynamic Hedging Strategy for Option Transaction Using Artificial Neural Networks

MS&E 448 Final Presentation High Frequency Algorithmic Trading

Accelerated Option Pricing Multiple Scenarios

arxiv: v1 [cs.ai] 7 Jan 2018

Distance-Based High-Frequency Trading

The Influence of News Articles on The Stock Market.

Data Adaptive Stock Recommendation

k-layer neural networks: High capacity scoring functions + tips on how to train them

Tests for One Variance

Estimation of Volatility of Cross Sectional Data: a Kalman filter approach

Barapatre Omprakash et.al; International Journal of Advance Research, Ideas and Innovations in Technology

Mean Reverting Asset Trading. Research Topic Presentation CSCI-5551 Grant Meyers

Visualization on Financial Terms via Risk Ranking from Financial Reports

Based on BP Neural Network Stock Prediction

The method of Maximum Likelihood.

Automated Options Trading Using Machine Learning

A Comparative Study of Ensemble-based Forecasting Models for Stock Index Prediction

Machine Learning and Options Pricing: A Comparison of Black-Scholes and a Deep Neural Network in Pricing and Hedging DAX 30 Index Options

Tests for Two ROC Curves

Multi-factor Stock Selection Model Based on Kernel Support Vector Machine

Scaling SGD Batch Size to 32K for ImageNet Training

Prediction of Stock Closing Price by Hybrid Deep Neural Network

Shynkevich, Y, McGinnity, M, Coleman, S, Belatreche, A and Li, Y

Portfolio replication with sparse regression

Modelling Returns: the CER and the CAPM

CS221 Project Final Report Deep Reinforcement Learning in Portfolio Management

Iran s Stock Market Prediction By Neural Networks and GA

Algorithmic Trading using Sentiment Analysis and Reinforcement Learning Simerjot Kaur (SUNetID: sk3391 and TeamID: 035)

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

Likelihood-based Optimization of Threat Operation Timeline Estimation

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2010, Mr. Ruey S. Tsay Solutions to Final Exam

ANN Robot Energy Modeling

Gamma Distribution Fitting

Transcription:

$tock Forecasting using Machine Learning Greg Colvin, Garrett Hemann, and Simon Kalouche Abstract We present an implementation of 3 different machine learning algorithms gradient descent, support vector machine, and a deep neural network which attempt to model and forecast individual stock prices at some future date based on a set of 11 company related features. We vary internal parameters of each algorithm (optimization step size, epoch number, loss function, hypothesis functions, number of hidden layers, etc.) as well as more intuitive feature-based parameters (forecast horizon, window length, etc.). We summarize and present the performance of several experiments and indicate which physical parameters are most influential to the model in producing the model with the best 1-day, 10-day, and 20-day stock price horizon. I. INTRODUCTION The US stock market is said to be semi-strong efficient meaning at any given time, prices fully reflect all available information on a particular stock and/or market. Therefore, no investor has a legal advantage in predicting a return on a stock because no individual investor has access to information not already available to the public. The goal of this study is to determine if an advantage could be gained by utilizing the available information more effectively than the individual investor. Existing mathematical financial market models include Black-Scholes and Brownian models [1][2]. However, in this study, artificial intelligence, specifically machine learning techniques, were implemented with the goal of predicting future stock prices based on data from past and present metrics which are commonly deemed influential stock value indicators. Machine learning using gradient descent to minimize a soft-max loss, a support vector machine, and a deep neural network were used to predict performance based on 25 years data for 11 different criteria across 6 different companies from different industries. Three technology companies (Apple, Microsoft, and HP), two retail companies (Kohls and Macys), and one manufacturing company *This work was done in partial fulfillment of CMU s Graduate Artificial Intelligence Course (15-780) in the Spring of 2016. G. Colvin, G. Hemann, and S. Kalouche are with the Robotics Institute at Carnegie Mellon University s School of Computer Science. (3M), were evaluated. Daily values for the following criteria spanning 25 years were collected for each of the companies from the Bloomberg Database. II. TECHNICAL APPROACH Three artificial intelligence techniques were used to attempt prediction of stock performance. The three algorithms are a traditional machine learning approach which uses a softmax loss with gradient descent, a support vector machine which minimizes uses a linear kernel (squared hinge loss function), and a neural network which minimizes a logistic loss. The input data parsed from the Bloomberg Database is assembled in the form F = [f 1, f 2, f 3,..., f n ] (1) where F is the complete feature set and n is the number of features (n = 11 for Bloomberg Database features, n = 15 for additional Google Trends data, see II-D). Each machine learning algorithm used a supervised learning approach which requires a trained data set, Y. Y is chosen based on the metric we are attempting to predict. For instance, if we wish to predict 2-day net price change on day i + 1 = 50 with a 10-day window, the Y for that prediction will be the actual 2-day net price change that occurred on day 51 and the X will be the feature set data from day 40 to 50. The feature set data is shaped by the number of features used n, as well as the window length w which indicates how many past days of data (i w) are used to predict the desired feature on day i + 1. The data fed into the algorithm is X R mxn, and the trained data is Y R mxq, where m is the number of days (data points per feature) and q is the number of classes in the classification problem. In the case of 2-day net price change, q = 2 where y i = [0, 1] if the 2-day net price change was positive and y i = [1, 0] if the 2-day net price change was negative. We took a classification approach where two classes indicate a feature s sign. Thus, if argmax(ŷ) = 0 the ML model predicts a negative 2-day net price change and if argmax(ŷ) = 1, then the model predicts a positive 2-day net price change.

We generate our input data set X based on window length by X i = [F i, F i 1, F i 2,...F i w ] (2) where i indexes the day and F i is the full feature set, F, data for day i. Y is formulated to be Y i = [sign(f 3 ) i+1 ] (3) where in the case of predicting 2-day net price change (i.e. f 3 in the feature set) we look one day into the future and we use day (i + 1) s 2-day net price change as the trained data point for X i. We then implement our 3 different ML algorithms and feed these algorithms with X and Y of this form to train the model weights, Θ R qxk, where k = n w. In order to compare each algorithm s performance, the stock price was predicted for each stock while varying the amount of input data (i.e. window length w) and the forecast horizon (1-day, 2-day, 10-day, or 20-day). There were approximately 6200 time steps covering the 25 year period, and this was split using the standard 70/30 convention where 70% becomes training data and 30% is testing data. The test and training prediction error was calculated from the ratio of correctly predicted sign change in metric Y (1-day, 10-day, or 20-day net price change) to the total number of training or testing data samples. The precise equation used for error is 1{sign(ŷ) = sign(y)} error = (4) n where n is the number of testing or training samples and the 1-function returns 1 when the expression in the brackets is satisfied and 0 otherwise. A. Machine Learning Using Gradient Descent A traditional machine learning algorithm using gradient descent to minimize a softmax loss was first implemented. The soft-max loss is defined by l(y, ŷ) = log( exp(y )) y y (5) where ŷ = h θ (x). The gradient of the softmax loss is defined by l(y, ŷ) = exp(y ) exp(y ) y (6) The algorithm was tuned with alpha = 0.025 and was shown to approach steady state after 150 epochs. TABLE I FEATURE SET Index Feature Description f 1 1-Day Net Price Difference in Today s Last Price Change and Yesterdays Last Price f 2 Last Price End of Day Market Price f 3 2-Day Net Price Difference in Today s Last Price Change and Two Days Previous Last Price f 4 10-Day Volatility Standard Deviation of Price Change Over 10 Days f 5 50-Day Moving Mean Market Price Over 50 Days Average f 6 Price to Earnings Market Price per Share/Earnings Ratio per Share Volume f 7 Volume Shares of Stock Outstanding f 8 Enterprise Value Companys Total Value f 9 Overridable Alpha Stock Performance against S&P 500 Sector Index f 10 Overridable Beta Measure of Stocks Price Volatility Compared to Sector Index f 11 Alpha for Beta Plus Alpha Over Beta Minus f 12 Google Trends Trends data indicating search popularity of certain keywords related to a company B. Support Vector Machine A generalization of the traditional gradient descent forms our second algorithm which is the support vector machine (SVM), where hyperplanes classify different segments of data. SVM s can use linear kernels like hinge loss to solve the gradient descent problem, but can also use higher order polynomial kernels for more complex data. In our formulation, we found the linear kernel performs better, which uses a squared hinge loss on a binary class. The squared hinge function is l(y, ŷ) = max{1 y i x T i Θ, 0} 2 + λ Θ 2 i (7) where the second term λ Θ 2 i is the L2 penalty regularizing the loss. The SVM was implemented using Python s SK-learn library. C. Deep Neural Network The third algorithm that was implemented was a deep convolutional neural network (CNN). Although recurrent neural networks (RNNs) are typically used for time-series data (as is the case here) we employ a CNN to predict stock prices movement from a representative picture of a time-series of past price fluctuations. The neural network implementation differs from the previous two algorithms in that a logistic loss was used

to yield a single prediction value ŷ thus making the neural network a regression algorithm as opposed to the previous two classifiers. The CNN implemented a logistic loss (l) and a loss gradient ( l) defined as l(y, h θ (x)) = l(y, ŷ) = log(1 + exp( ŷ y)) (8) l(y, ŷ) = y exp( ŷ y) 1 + exp(= ŷ y (9) Before the optimization occurs a non-linear function fu j is applied to the transformation linear hypothesis function h θ (x). Therefore, the hypothesis function now takes the form h θ (x) = fu j+1 ( W j+1 fu j (W j x + b j ) + b j+1 ) (10) Here, fu is a non-linear activation function where common non-linear functions used are sigmoid, hyperbolic tangent, and the rectified linear unit (ReLU). In our study we found that the ReLU activation function produced the best experimental results for all layers except the last layer which was set to be a linear activation function. The ReLU function is applied element-wise and defined by fu(x) ReLU = max{0, x} (11) where the function returns zero for values where x is negative and x for values of x that are positive. The number of layers (s) was also varied to determine if the prediction improved with a deeper or shallower layer. The total number of layers varied from 5 (3 hidden) to 25 (23 hidden) where each layer size was linearly spaced from layer 1 (L 1 ) having a size of the input data (R mxk ) to the last layer (L s ) having a size of (R 1 ) because the last layers should yield the predicted value of only one feature. It was determined that more layers did not produce a more accurate model and so the network was kept to 5 layers. The weights matrix W and bias vector b, which make up the are initialized randomly and then fed into a stochastic gradient descent optimization algorithm which searched for the parameters θ = [Wb] which minimized the logistic loss function. The optimization used a constant step size of α =.005 and 10 epochs (i.e. 10 runs through the entire input data set. To appropriately compare the error of the three algorithms the predicted values of the CNN are converted into classes based on sign(ŷ), where a positive prediction corresponds to ŷ = [0, 1] and a negative prediction value corresponds to ŷ = [1, 0]. The error of the neural network is then calculated according to eq. 4. As done in the previous two algorithms, the forecast horizon and window length parameters were varied from 1 to 6 weeks and 1 to 20 days, respectively. D. External Features using Google Trends 1 In addition to the Bloomberg Database stock criteria, we evaluated the effects of external data pulled from Google Trends as seen in fig. 4. For the Apple stocks, we used the search terms apple, ipad, iphone, and ipod to see if the magnitude of searches overtime would impact stock predictability. Magnitude is normalized per-topic by the maximum of searches-per-day over the time window (12 years). III. RESULTS We first compare the results of the three algorithms shown in Figs. 1 and 2 with varying forecast horizons. The vertical axes shows the testing error of the algorithm and the horizontal axes show the price change at various distances in the future we were trying to predict: 1, 10, & 20 days. In addition to varying the forward distance of the predication, each bar represents how much of the past data that we based our prediction on: 1, 2, 3, 4, 5, & 6 weeks of data. The plots show that the algorithm preformed the best when predicting the short term price change. It also demonstrates that no algorithm had significant improvement over another. The traditional machine learning algorithm using gradient descent was run on the 6 test companies and compared against a baseline of an always-increase predictive strategy (see Fig. 3). This is admissible because on average, stock prices can be expected to rise steadily at a rate of 2-5% per year. From Fig. 3 it is evident that the model s prediction accuracy varies across companies. Our results demonstrate that external effects (features not included in our algorithms) have greater impact on Apple stock than they do on 3M stock. This may occur due to Apple s volatility is influenced on a weekly/monthly basis by the release, reviews, and issues with their consumer products. On the contrary, 3M is a much less volatile stock whose stock value is less dependent on product releases and thus varies much less on a day-to-day basis. 1 https://www.google.com/trends/

Fig. 1. Comparison of ML algorithms for 3M Stock Fig. 2. Comparison of ML algorithms for Apple stock In an attempt improve Apple stock prediction, we added Google Trends data of apple search terms to the feature set. These additional features showed no prediction improvement, which we believe is the result of Google Trends data not publishing a connotation associated with the keywords. For example, if the keyword iphone has a high search frequency on an arbitrary day, its inconclusive whether that spike is a result of the new iphone being released (good connotation) or the result of an issue or recall with the iphone (bad connotation). Having more richly-annotated data would likely provide more informative results. In addition to predicting the future stock net change, we trained models to predict the other stock characteristics as well. The same data set previously discussed was used and the prediction variable was changed. Fig. 5 shows that only 2-day net price change and 50-day moving average price could be predicted with better than 50% performance. This result is expected since these two metrics have a moving window that filters noise and therefore easier to predict. IV. CONCLUSIONS As expected, machine learning algorithms could only predict slightly better than 50% error, but further evaluation is necessary to determine whether following this prediction is profitable when compared to an index fund. Shorter time horizons were easier to predict but less valuable financially. Google Trends had negligible impact on the performance of the algorithms which may Error (%) 55 50 45 40 35 30 Stock Comparisons for 1 Day Changes vs. Always Increasing Prediction Apple Apple-Always Increase Microsoft Microsoft-Always Increase HP HP-Always Increase Kholes Kholes-Always Increase Macys Macys-Always Increases 3M 3M-Always Increases 25 0 20 40 60 80 100 120 140 160 Fig. 3. The lower and upper limit companies are 3M and Apple respectively. The dotted lines are the always increase baseline predictive strategy. Loss and Error Value 0.75 0.70 0.65 0.60 0.55 0.50 Comparison of GD Algorithm with Google Trends Data Loss (w/o trends) Error (w/o trends) Loss (w/ trends) Error (w/ trends) 0.45 0 50 100 150 200 250 300 Fig. 4. Google Trend data for Apple search terms did not show any improved results for stock prediction. Here, we use the searchper-day value of apple, ipad, iphone, and ipod. be attributed to the lack of information on associated connotation for a given keyword search. Each algorithm

Error (%) 60 55 50 45 40 Stock Characteristic Comparisons for 1 Day Changes - 3M Price Earnings Ratio Volume Enterprise Value Price Change 2 Day Net Volatility 10 Day Moving Average 50 Day Last Price Price Change 1 Day Net 35 30 25 0 50 100 150 200 250 Fig. 5. A comparison of models trained to predict different stock characteristics. Only 2-day net price change and 50-day moving average price could be predicted better than 50%. produced comparable results and the 2-day net and the 50-day moving average were the only two categories that predicted above a 65% success rate. V. FUTURE WORK Using external data such as Google Trends could help make a more informed and higher accuracy prediction of stocks. More thorough analysis is required as to what parameters have impact that do not overfit to any one company. Additional work can be done to determine profitability of short-term stock prediction as our results indicate some degree of success in comparison to our baseline. To determine profitability we must take into account the value of the loss in dollars when our prediction is incorrect and make judgments on thresholds of when to buy, sell, or hold a stock. APPENDIX ACKNOWLEDGMENT The authors would like to acknowledge Professor Zico Kolter, Professor Tuomas Sandholm and the Spring 2016 15-780 course TAs. REFERENCES [1] BlackScholes model. Wikipedia. [2] Brownian model of financial markets. Wikipedia. [3] Y. Dai, Y. Zhang. Machine Learning in Stock Price Trend Forecasting. Stanford School of Computer Science Course Project, 2013. [4] A. Siripurapu. Convolutional Networks for Stock Trading. Stanford School of Computer Science Course Project, 2014.