A Big Data Framework for the Prediction of Equity Variations for the Indian Stock Market

A Big Data Framework for the Prediction of Equity Variations for the Indian Stock Market Cerene Mariam Abraham 1, M. Sudheep Elayidom 2 and T. Santhanakrishnan 3 1,2 Computer Science and Engineering, Kochi, Kerala, India 3 Defence Research & Development Organization, Kochi, Kerala, India ABSTRACT Recently the National Stock exchange (NSE) is in the news for creating a record Nifty index, which clearly reveals the high increase and positivity in the financial transactions that is happening in the Indian stock market. Prediction of stock prices plays a crucial role in getting the maximum benefit of these transactions. Prediction is indeed difficult but extremely an important problem that demands the development of algorithms for predicting trading opportunities by detecting patterns from past data. This paper explores three simple but less commonly used indicators derived from stock data for prediction and show how they can be used in practice to successfully identify investor thumb rules in a quantitative manner. The work till date on this problem analyses ninety days stock data of three companies in the derivative market and experimentally proves how these three indicators affect the variations in the price of the stock. KEYWORDS Time series Regression; Derivative Market; Open Interest; Deliverable quantity 1. INTRODUCTION Big Data analytics is becoming an important tool where ever there is availability of huge volume of data such as the financial industry. More than 80% of stored corporate data is unstructured, data that is not encompassed by a relational database management system. Big data analytics refers to tools and methodologies that aim to transform such massive data into useful data for analytical purposes. Mining of Time series data (TSD) from such a tool in financial sector provides useful information to the investors and fund managers in banks and insurance companies to channel their funds properly for better returns. Although there exists several prediction models, there is a scope for further improvement in prediction accuracy and preserving such data trend. Since the financial TSD is more volatile in nature, accurate forecasting becomes all the more a difficult task. A stock market is categorized majorly into two Equity market and Derivatives market. Derivatives market is a type of the financial market where the prices of shares are predicted for future dates. Futures and Options are two types of derivatives which are mostly traded. Futures contract is an agreement between two parties to buy or sell a particular share at a specific price for a future date. Options are contracts which give the right but not the obligation to buy or sell a particular asset at a certain price on or before a specified date. In the financial derivatives market, there is risk involved due to the high volatility of the stock prices. This uncertainty comes from the nature of financial derivatives, the complexity of the operation of the financial derivatives market, and intensified speculation and so on. This work suggests the role of three parameters from the stock data and experimentally shows its effect on stock price in the financial derivatives market. For the analysis, ninety days data of three companies is taken into account. The rest of the paper is organized as follows. In section II, some prediction models existing in the literature are discussed. The implementation is detailed in section III. The model is applied on NSE India data and the results are analyzed. The paper ends with conclusion in IV. 2. RELATED WORKS A motivation for this work is suggested by the early observations of Ms. Shalini H S and Dr. Raveendra P V [3]. They explain that Derivatives market provide an opportunity to transfer risk from one to another. Launch of the equity derivatives market in India has been extremely encouraging and successful. According to Michael Chui, Derivatives are created in response to some fundamental changes in the global financial 22 Cerene Mariam Abraham, M. Sudheep Elayidom and T. Santhanakrishnan

system. If properly handled, it should help improve the resilience of the system and bring economic benefits to the users[7]. By the basic methods and principles of VAR, Shiguang Lin quantified and analyzed the financial derivatives markets and created a risks control model of the financial derivatives markets[9]. M. Thenmozhi and Abhijeet Chandra studied the asymmetric relationship between the India Volatility Index and stock market returns[4], and demonstrated that Nifty returns are negatively related to the changes in the India Volatility Index levels. In a paper, Abhi Dattasharma and Praveen Kumar Tripathi identified interdependencies between different stocks, through which they found that investment in one stock can be done when a related stock is performing well[6]. They detected similarity based on the directions of amount of change for both the stocks. Priti Saxena and co-authors proposes a new approach of analyzing the stock market and predicting a hybrid form of linguistic-a priori concept. It provides accurate results in stock prediction which is a great help in decision making with respect to the clients and knowledge discovery of various useful patterns for brokers. This approach provides the clients with the easy access to information status of any stock price movement immediately[8]. Aditya Gupta explains Stock Market Prediction Using Hidden Markov Models[5]. The paper adopts Maximum a Posteriori HMM approach for forecasting stock values for the next day based on historical data. It considers the small change in Stock value and the intra-day high and low values of the stock to train the continuous HMM. This HMM is then used to make a Maximum a Posteriori decision over all the possible stock prices for the next day and this method was compared to existing models like HMM-fuzzy model, ARIMA and ANN for forecasting of stocks. For highly volatile financial TSD, a hybrid ARIMA-GARCH model is suggested which is suitable for multi-step ahead forecasting[2]. This model is applied on selected NSE India data sets to get multi-step ahead prediction. The results obtained are evaluated using error performance measures MAPE, MaxAPE, RMSE, whose values confirm the improved prediction accuracy compared to traditional models such as ARIMA, GARCH and trend-arima and wavelet-arima. The proposed model also kept the data trend preserved over the prediction horizon better than the others. 3. EXPERIMENTAL RESULTS AND DISCUSSIONS In a stock market, the investors study various factors such as earnings per share, inflation, book value, economic strength, etc. to predict the future value of a share. As part of the research, these factors and others were studied. Based on the previous data of price variations of stocks and the experience of some stock exchange members, the following factors were picked up to study their effect on the future price of shares. Open interest Number of Contracts Deliverable quantity A futures contract is a contractual agreement to buy or sell a particular commodity or financial asset at a predetermined price in the future. Open interest is the total number of options and/or futures contracts that are not closed or delivered on a particular day. It is denoted by X. Number of contracts is total number of contracts on security. It is denoted as Y. Deliverable Volume is the quantity of shares which actually move from one set of people (who had those shares in their demat account before today and are selling today) to another set of people. This is the amount of shares which actually get transacted. It is denoted by Z. 3.1. STATISTICAL ANALYSIS Correlation is a statistical technique used to find out how strongly two variables are associated. It tells you whether there is a dependency of one variable over the other variable. Regression is a statistical analysis process which is used to estimate the relationship between dependent (response) variable and independent (predictor) variables. Based on the past data, it will provide the best fit line or equation relating the dependent variable and the independent variables. The value of coefficient of determination, R 2, tells you how well the data fits the statistical model or equation. 23 Cerene Mariam Abraham, M. Sudheep Elayidom and T. Santhanakrishnan

Correlation and Regression analysis were used to understand the strength of relation and estimate the relationship between the factors- Open interest, Number of contracts, Deliverable quantity and the Price of shares. The result clearly showed an inverse relation existing between the factors and the price of shares[1]. Also the regression analysis showed that the predictor variables are significantly impacting the response variable with an R 2 value greater than 0.6 for the regression models of each of the companies, hence showing that all are good fit models and the total response variation is explained by each of the model. Since p- value is too small, null hypothesis is rejected. That is, the regression is significant[10]. Multicollinearity is the undesirable situation where the correlations among the regressors are strong. Tolerance, Variance Inflation Factor (VIF), Eigen values and Condition index are the collinearity factors which help us to identify multicollinearity. X.294 3.398 1 Y.238 4.210 Z.666 1.500 Fig 1: First Company X.273 3.663 1 Y.273 3.668 Z.991 1.009 Fig 2: Second Company X.323 3.094 1 Y.330 3.034 Z.737 1.357 Fig 3: Third Company Here we could see that the variance inflation factor is less than 10 and the tolerance is not close to 0. Therefore the problem of multicollinearity does not arise in this model. Autocorrelation can be checked using the Durbin-Watson test. For independent errors the Durbin-Watson test statistic is approximately 2. 24 Cerene Mariam Abraham, M. Sudheep Elayidom and T. Santhanakrishnan

Table 1. Durbin- Watson value of three companies Company Durbin- Watson Value A.732 B.694 C.494 Since the Durbin-Watson test statistics are not approximately 2, the errors are not independent, this is a violation of regression assumptions. Thus we move on to time series regression. 3.2. IMPLEMENTATION DETAILS Authors utilized Hadoop framework with version 1.0.2 installed on Ubuntu 16.04 LTS with Core i5 CPU running at 4.0 GHz with Java as the programming language. Details of the implementation of wordcount program are beyond the scope of this paper. The parameters taken are Open interest Number of contracts from derivative market Deliverable quantity from cash market Three months stock data of 3 large cap stocks were analyzed over the above parameters against stock price. X is used to denote open interest, Y is used to denote number of contracts, Z is used to deliverable quantity and P is used to denote stock price. The problem states that if there is an increase in the value of X on a particular day compared to its previous day, it will be indicated as 1 and if there is a decrease that will be indicated as 0.Hence X, Y, Z and P can have the value of either "1" or "0" on each day for the 3 months period. Comparing values of X, Y, Z and P, the different possibilities are as follows- Table 2: Combinations X Y Z P 0 0 0 0 0 0 0 1 - - - - - - 1 1 1 1 Since there are 16 possibilities, we are giving an index of 0,1,2,3...14, 15 for each possible combinations of 4 values. Index value is obtained from the following equation. If a is the value of X, b is the value of Y, c is the value of Z and d is the value of P. Then index value is (a 2 3 ) + (b 2 2 ) + (c 2 1 ) + (d 2 0 ) Our research problem states that if there is an increase in the value of X, Y and Z from its previous day, then there will be a decrease in the value of P and vice-verse. That is from the above pattern we need to find the occurrences of index with value 1 and with value 14. In order to get the index value 1, the X value should be 0, Y value should be 0, Z value should be 0 and P value should be 1. That is X-Y-Z-P value become 0-0-0-1. This means decrease in the value of X, Y and Z, resulted in increase in value of P. By analyzing the 90 days data, it was found that the index values 14 and 1 have high frequency of occurrence compared to the others. 25 Cerene Mariam Abraham, M. Sudheep Elayidom and T. Santhanakrishnan

3.3. DISCUSSIONS The main findings of the experiment are as follows- There exists an inverse relationship among the factors Open interest, Number of contracts and Deliverable quantity, with the stock price that is, as one increases, the other decreases in correlation analysis. Each of the variables Open interest, Number of contracts and Deliverable quantity is significantly impacting the Share price. Regression is not possible for estimating the relationship between the factors and price as the regression assumption tells that error must be uncorrelated which is violated here. Thus we move on to time series regression. 4. CONCLUSIONS Through the study, we have explored the effect of three parameters on stock price. This might become an interesting area of study for the investors as it may help one to understand the price movement of some wellknown stocks. We described three parameters and showed that the results obtained match price movement of the stocks and can be used for short-term and long-term predictions. The indicators successfully brought out investors thumb rules in a quantitative way. The prediction performance can now be studied on five years data of five large cap stocks using Big data analytics, which forms the future scope of this paper. ACKNOWLEDGEMENT The authors wish to express their gratitude to Mr. Manoj P. Michel, member of Cochin Stock Exchange, who shared his profound knowledge of the equity market and for his useful discussions. REFERENCES [1] Cerene Mariam Abraham, M. Sudheep Elayidom, Implementation of Correlation Techniques for the Equity Market in India, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 4, Special Issue 1, June 2015. [2] C. Narendra Babu and B. Eswara Reddy, Selected Indian Stock Predictions using a Hybrid ARIMA-GARCH Model, International Conference on Advances in Electronics, Computers and Communications, IEEE, 2014. [3] Ms. Shalini H S, Dr. Raveendra P V, A Study of Derivatives Market in India and its Current Position in Global Financial Derivatives Markets, IOSR Journal of Economics and Finance (IOSR-JEF) e-issn: 2321-5933, p-issn: 2321-5925.Volume 3, Issue 3. (Mar-Apr. 2014), PP 25-42. [4] M. Thenmozhi and Abhijeet Chandra, India Volatility Index (India VIX) and Risk Management in the Indian Stock Market, NSE Working Paper, W P/9/2013. [5] Aditya Gupta, Stock Market Prediction Using Hidden Markov Models, IEEE, 2012. [6] Abhi Dattasharma and Praveen Kumar Tripathi, Practical Inter-stock Dependency Indicators using Time Series and Derivatives, IEEE, 2008. [7] Michael Chui, Derivatives markets, products and participants: an overview, IFC Bulletin No 35. [8] Priti Saxena, Bhaskar Pant, R.H. Goudar, Smriti Srivastav, Varsha Garg and Shreela Pareek, Future Predictions in Indian Stock Market through Linguistic-Temporal Approach, IEEE, 2012. [9] Shiguang Lin, The Quantitative Analytic Research of Extenics by VAR on the Risks of the Financial Derivatives Markets, IEEE, 2010. [10] Cerene Mariam Abraham, M. Sudheep Elayidom, T. Santhanakrishnan, Design and Implementation of Efficient Regression Analysis Techniques in Derivative Market, The International Journal Of Science & Technoledge (ISSN 2321 919X), Vol. 4, Issue 11, November 2016. 26 Cerene Mariam Abraham, M. Sudheep Elayidom and T. Santhanakrishnan