A Big Data Analytical Framework For Portfolio Optimization

Similar documents
Ant colony optimization approach to portfolio optimization

A TEMPORAL PATTERN APPROACH FOR PREDICTING WEEKLY FINANCIAL TIME SERIES

A Comparative Study of Ensemble-based Forecasting Models for Stock Index Prediction

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 2, Mar Apr 2017

A Study of the Efficiency of Polish Foundries Using Data Envelopment Analysis

Cognitive Pattern Analysis Employing Neural Networks: Evidence from the Australian Capital Markets

Prediction of Stock Closing Price by Hybrid Deep Neural Network

Decision model, sentiment analysis, classification. DECISION SCIENCES INSTITUTE A Hybird Model for Stock Prediction

DECISION SCIENCES INSTITUTE Stock Trading Decisions Using Ensemble-based Forecasting Models

COGNITIVE LEARNING OF INTELLIGENCE SYSTEMS USING NEURAL NETWORKS: EVIDENCE FROM THE AUSTRALIAN CAPITAL MARKETS

COMPARATIVE STUDY OF TIME-COST OPTIMIZATION

Using Data Envelopment Analysis to Rate Pharmaceutical Companies; A case study of IRAN.

Optimization Financial Time Series by Robust Regression and Hybrid Optimization Methods

A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION

An enhanced artificial neural network for stock price predications

A Dynamic Hedging Strategy for Option Transaction Using Artificial Neural Networks

Stock Market Predictor and Analyser using Sentimental Analysis and Machine Learning Algorithms

A Study on Importance of Portfolio - Combination of Risky Assets And Risk Free Assets

Predictive modeling of stock indices closing from web search trends. Arjun R 1, Suprabha KR 2

A distributed Laplace transform algorithm for European options

STOCK MARKET PREDICTION AND ANALYSIS USING MACHINE LEARNING

Portfolio Selection using Data Envelopment Analysis (DEA): A Case of Select Indian Investment Companies

SURVEY OF MACHINE LEARNING TECHNIQUES FOR STOCK MARKET ANALYSIS

Stock Market Prediction System

Stock price development forecasting using neural networks

Research on Enterprise Financial Management and Decision Making based on Decision Tree Algorithm

International Journal of Research in Engineering Technology - Volume 2 Issue 5, July - August 2017

2015, IJARCSSE All Rights Reserved Page 66

Neural Network Prediction of Stock Price Trend Based on RS with Entropy Discretization

Available online at ScienceDirect. Procedia Computer Science 61 (2015 ) 85 91

Motif Capital Horizon Models: A robust asset allocation framework

DEVELOPMENT AND IMPLEMENTATION OF A NETWORK-LEVEL PAVEMENT OPTIMIZATION MODEL FOR OHIO DEPARTMENT OF TRANSPORTATION

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

Application of Big Data Analytics via Soft Computing. Yunus Yetis

Stock Prediction Using Twitter Sentiment Analysis

The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index

Stock Trading System Based on Formalized Technical Analysis and Ranking Technique

Keyword: Risk Prediction, Clustering, Redundancy, Data Mining, Feature Extraction

ISSN: (Online) Volume 4, Issue 2, February 2016 International Journal of Advance Research in Computer Science and Management Studies

AN ARTIFICIAL NEURAL NETWORK MODELING APPROACH TO PREDICT CRUDE OIL FUTURE. By Dr. PRASANT SARANGI Director (Research) ICSI-CCGRT, Navi Mumbai

The use of resource allocation approach for hospitals based on the initial efficiency by using data envelopment analysis

Stock Price Prediction using Recurrent Neural Network (RNN) Algorithm on Time-Series Data

Predictive Risk Categorization of Retail Bank Loans Using Data Mining Techniques

Forecasting stock market prices

STRATEGY OVERVIEW. Long/Short Equity. Related Funds: 361 Domestic Long/Short Equity Fund (ADMZX) 361 Global Long/Short Equity Fund (AGAZX)

A Novel Prediction Method for Stock Index Applying Grey Theory and Neural Networks

Journal of Mountain Area Research

THE ABSTRACT OF THE Ph.D. THESIS

Optimal Portfolio Inputs: Various Methods

Financial Analysis The Price of Risk. Skema Business School. Portfolio Management 1.

Keywords: artificial neural network, backpropagtion algorithm, derived parameter.

A Big Data Framework for the Prediction of Equity Variations for the Indian Stock Market

WHEN THE CUSTOMER WRITES HIS OWN STORY A SEGMENTATION SCHEME FOR THE LIFE INSURANCE MARKET

An Improved Approach for Business & Market Intelligence using Artificial Neural Network

Estimating term structure of interest rates: neural network vs one factor parametric models

BUZ. Powered by Artificial Intelligence. BUZZ US SENTIMENT LEADERS ETF INVESTMENT PRIMER: DECEMBER 2017 NYSE ARCA

Prediction Using Back Propagation and k- Nearest Neighbor (k-nn) Algorithm

Applying Independent Component Analysis to Factor Model in Finance

Possibility of Using Value Engineering in Highway Projects

Enhancing equity portfolio diversification with fundamentally weighted strategies.

EFFICIENCY OF PUBLIC HEALTHCARE: A CASE OF ODESSA HOSPITALS

A Simple, Adjustably Robust, Dynamic Portfolio Policy under Expected Return Ambiguity

The Use of Neural Networks in the Prediction of the Stock Exchange of Thailand (SET) Index

ECONOMIC PERFORMANCE ANALYSIS OF THE AUSTRALIAN PROPERTY SECTOR USING INPUT-OUTPUT TABLES. YU SONG and CHUNLU LIU Deakin University

Accelerated Option Pricing Multiple Scenarios

ARTIFICIAL NEURAL NETWORK SYSTEM FOR PREDICTION OF US MARKET INDICES USING MISO AND MIMO APROACHES

Valencia. Keywords: Conditional volatility, backpropagation neural network, GARCH in Mean MSC 2000: 91G10, 91G70

Application of Data Mining Tools to Predicate Completion Time of a Project

Creation and Application of Expert System Framework in Granting the Credit Facilities

Outline. Consumers generate Big Data. Big Data and Economic Modeling. Economic Modeling with Big Data: Understanding Consumer Overdrafting at Banks

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Submissions must confirm the following additional requirements:

PORTFOLIO MODELLING USING THE THEORY OF COPULA IN LATVIAN AND AMERICAN EQUITY MARKET

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

FISHER TOTAL FACTOR PRODUCTIVITY INDEX FOR TIME SERIES DATA WITH UNKNOWN PRICES. Thanh Ngo ψ School of Aviation, Massey University, New Zealand

Determination of Market Clearing Price in Pool Markets with Elastic Demand

On the Human Capital Factors to Evaluate the Efficiency of Tax Collection Using Data Envelopment Analysis Method

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

ANN Robot Energy Modeling

Lihong Li. Jianghan University, Wuhan, China. Miaoyan Li. Ministry of Finance, Beijing, China

COMPARING NEURAL NETWORK AND REGRESSION MODELS IN ASSET PRICING MODEL WITH HETEROGENEOUS BELIEFS

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

Linking Microsimulation and CGE models

Are New Modeling Techniques Worth It?

Stock Market Real Time Recommender Model Using Apache Spark Framework

Markowitz portfolio theory. May 4, 2017

CAMPUS CAREERS INVESTMENT GROUPS BUILD STRATEGIES

Based on BP Neural Network Stock Prediction

Option Pricing Using Bayesian Neural Networks

A Combined Mining Approach and Application in Tax Administration.

Integrated Management System For Construction Projects

The Effect of Expert Systems Application on Increasing Profitability and Achieving Competitive Advantage

Value-at-Risk Based Portfolio Management in Electric Power Sector

ANALYSIS OF MACROECONOMIC FACTORS AFFECTING SHARE PRICE OF PT. BANK MANDIRI Tbk

International Journal of Computer Engineering and Applications, Volume XII, Issue IV, April 18, ISSN

TURNOVER (OR) ACTIVITY PERFORMANCE OF UNIT TRUST OF INDIA

8: Economic Criteria

COMPARISON OF NATURAL HEDGES FROM DIVERSIFICATION AND DERIVATE INSTRUMENTS AGAINST COMMODITY PRICE RISK : A CASE STUDY OF PT ANEKA TAMBANG TBK

Monte Carlo for selecting risk response strategies

Backpropagation and Recurrent Neural Networks in Financial Analysis of Multiple Stock Market Returns

Transcription:

A Big Data Analytical Framework For Portfolio Optimization (Presented at Workshop on Internet and BigData Finance (WIBF 14) in conjunction with International Conference on Frontiers of Finance, City University of Hong Kong, Hong Kong, June 6-7, 2014) 1

A Big Data Analytical Framework For Portfolio Optimization Dhanya Jothimani, Ravi Shankar and Surendra S. Yadav Department of Management Studies, Indian Institute of Technology Delhi {dhanya.jothimani, ravi1, ssyadav}@dms.iitd.ac.in Abstract. With the advent of Web 2.0, various types of data are being produced every day. This has led to the revolution of big data. Huge amount of structured and unstructured data are produced in financial markets. Processing these data could help an investor to make an informed investment decision. In this paper, a framework has been developed to incorporate both structured and unstructured data for portfolio optimization. Portfolio optimization consists of three processes: Asset selection, Asset weighting and Asset management. This framework proposes to achieve the first two processes using a 5-stage methodology. The stages include shortlisting stocks using Data Envelopment Analysis (DEA), incorporation of the qualitative factors using text mining, stock clustering, stock ranking and optimizing the portfolio using heuristics. This framework would help the investors to select appropriate assets to make portfolio, invest in them to minimize the risk and maximize the return and monitor their performance. Keywords. Portfolio Optimization, Big Data, Hadoop, DEA, Stock Selection 1. Introduction Big data has gained popularity in the recent years and has become a buzz word. However, it is time to understand what Big Data is. Big Data refers to large data being generated continuously in the form of unstructured data produced by heterogeneous group of applications from social network to scientific computing applications, and so forth. The dataset ranges from a few hundred gigabytes to terabytes that is beyond the capacity of existing data management tools that can capture, store, manage and analyze [1]. Big data is characterized by the following dimensions: Volume, Velocity, Variety, Variability, Complexity and Low Value Density [2]. Various types of data, namely, structured, semi-structured and unstructured data are produced with the advent of Web 2.0 technology [3]. In the era of big data, one of the major challenges for the capital market firms is to handle the velocity at which data is being generated, considering the production of unstructured data in financial services like investment banking. New York Stock Exchange (NYSE) and NASDAQ have employed big data applications like IBM Netezza to store and process big data from various sources. Capital market firms use big data technologies to mitigate risks (fraud mitigation, ondemand enterprise management), regulation, trading analytics (High Frequency Trading, Pretrading decision), and data tagging [4]. The firms adopting big data technologies and predictive analytics have an edge over other firms in the uncertain market conditions [5]. In finance, portfolio is defined as the collection of assets. Assets range from stocks and bonds to real estate. With the seminal work of Markowitz [6], portfolio optimization has been a topic of research. Portfolio optimization is the investment decision-making process to hold a set of financial assets to meet various criteria of the investors. In general terms, the criteria are maximizing return and minimizing risk. In this paper, the scope of the work is limited to stock analysis. Portfolio optimization consists of three major steps: asset 2

selection, asset weighting, and asset management. In this paper, a framework is proposed to integrate unstructured and structured data to make an informed investment decision. The design of the paper is as follows. Section 2 discusses the proposed framework for portfolio optimization, followed by the expected outcomes in Section 3. Conclusions are presented in Section 4. 2. Proposed Framework The proposed framework for portfolio optimization can be explained using 5-step process: (a) Data Envelopment Analysis (DEA), (b) Validation of selected stocks, (c) Stock clustering, (d) Stock ranking, and (e) Optimization. All listed firms at a particular stock exchange are considered as the initial input to the framework and the output would be a set of stocks that would maximize the return and minimize the risk. The abstract framework for portfolio optimization is shown in figure 1. DEA is used to narrow the sample space of firms by identifying the efficient firms. In order to validate these firms as potential candidates for portfolio optimization, the latest information about the company is retrieved and processed from online news articles and tweets using text mining to the sentiments about the company in current context. The validated efficient firms are clustered into different groups to aid the diversification of portfolio. This is further followed by ranking of the stocks within each cluster and followed by asset weighting using optimization algorithms. Each process is explained from section 2.1 to section 2.5. 2.1 Data Envelopment Analysis Figure 1: Framework for Portfolio Optimization Data Envelopment Analysis (DEA) is a non-parametric linear programming that calculates the efficiency score of a Decision-Making Unit (DMU) based on a given set of inputs and outputs. 3

The DMUs with score 1 are considered to be efficient [7]. Apart from its applicability in the discipline of manufacturing, DEA can be used for stock selection. In this case, stocks form the DMU. Based on the previous studies [8, 9], four input parameters, namely, total assets, total equity, cost of sales and operating expenses and two output parameters, namely, net sales and net income, can be considered. These data can be obtained from standard databases like Bloomberg. The stocks with efficiency score 1 are considered for next stage. 2.2 Hadoop Framework for Sentiment Analysis This stage involves processing of frequently generated unstructured data using Hadoop MapReduce. This step complements the previous stage. Events like election, change of management and announcement of dividends have an effect on the market sentiment, which is not captured using the quantitative analysis. As first step, online news articles and tweets of the efficient firms are retrieved. Tweets can be obtained through Twitter API but it is limited to 1500 tweets. The ease-of-use, scalability, and failover properties make Hadoop MapReduce a popular choice for processing big data efficiently [10]. Tweets and news articles are processed using text mining to obtain the positive and negative sentiments about the firm. Hadoop MapRaduce infrastructure quickens the distributed text mining process. Figure 2 shows the MapReduce framework for distributed text mining. As shown in the figure, the company tweets and news articles are distributed among different Map processors to produce intermediate data. This intermediate data is processed by the Reduce processors to give the aggregated result. The firms with positive sentiments are chosen for the next stage. 2.3 Stock Clustering Figure 2: Hadoop framework for Distributed Text Mining In this stage, the correlation coefficients of the returns of the stocks are calculated. The stocks are assigned to different clusters based on these correlation coefficients. The greater the number of 4

clusters more is the diversification. The objective for number of clusters and quality of clusters is to maximize similarity within the cluster and to minimize the similarity between the clusters. Various clustering algorithms like k-means clustering or Louvian clustering [11] can be used. This process reduces the portfolio risk through diversification of stocks [12]. These resulting clusters can consist of firms with similar business activity or size in real sense. 2.4 Stock Ranking The appropriate stocks from each cluster should be selected. The stocks in each cluster can be ranked using Artificial Neural Network (ANN) [13]. Till the previous stage, only the internal factors of the firms were considered. At this stage, external factors like Gross Domestic Product (GDP) growth rate and interest rate can be considered [12]. ANN is a model for information processing that consists of three layers: input, hidden and output layer respectively. The inputs for ANN can be GDP growth rate and interest rate and the outputs can be future return on investments. This results in ranking the stocks within a cluster. 2.5 Optimization Previous stage leads to an assumption that the investor might choose the top stocks from each cluster. But the question that still remains is: How much to invest in each stock? Previous study [12] considered simple (equal) stock weighting method, a primitive method, to allocate the resources among the stocks. Hence the ranked stocks should be optimized to maximize returns and to minimize risk. Markowitz s mean-variance model can be used at this stage [6]. Various optimization heuristics like Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) and Genetic Algorithm (GA) can be used. The distribution of the stocks in a portfolio will be formed at the end of this stage. Top 3 performing portfolio will be suggested to the investor. 3. Expected Outcomes Apart from letting the investors make an informed investment decision, this framework would be useful to the naïve investors as well. The expected outcome of the framework is to generate portfolio in conformance with the criteria of the investors: minimize the risk and maximize the return. The first stage of DEA would result in a group of the potential stocks for portfolio optimization. The current information related to those resulting companies is analyzed using text mining to obtain the sentiment about the company. Sentiment analysis gives the qualitative aspect of the firms. The stocks resulting from the previous stage are grouped into different categories using the correlation co-efficient of the returns. This leads to the diversification of the stocks. The stocks are ranked to select appropriate stocks from each group. At the end of the entire process, top three portfolio suggestions would be provided to the investors so that they can select one of those three. 4. Conclusion The proposed framework tries to integrate both structured data from database (stock price, balance sheet data etc) and unstructured data from online news articles and tweets. Consideration of qualitative factors (Management of firms, etc.) along with quantitative factors (financial ratios) provides better alternatives for formation of portfolio. The assets are well-diversified using k-means clustering and ANN. Top three portfolio suggestions obtained using optimization 5

heuristics gives flexibility to the investors to choose the appropriate assets suiting their risk profile. The proposed model can be applied to any stock market data. The utility of the framework is limited only to stock analysis and investments. The parameters suggested for DEA is based on the previous studies. Instead of variance as risk measure, other risk measures like Value-at-risk and downside risk measures can be used in the final stage of the framework. References [1] X. Qin, H. Wang, F. Li, B. Zhou, Y. Cao, C. Li, H. Chen, X. Zhou, X. Du, and S. Wang, Beyond simple integration of RDBMS and MapReduce Paving the way toward a unified system for big data analytics: Vision and progress, in Proceedings of the 2012 Second International Conference on Cloud and Green Computing, CGC 12, (Washington, DC, USA), pp. 716 725, IEEE Computer Society, 2012. [2] N. S. Sachchidanand Singh, Big data analytics, in International Conference on Communication, Information & Computing Technology (ICCICT), pp. 1-4, 2012. [3] M. Minelli, M. Chambers, and A. Dhiraj, Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today s Businesses Wiley CIO series). Wiley Publishing, Hoboken, New Jersey, 1st ed., 2013. [4] R. Verma and S. R. Mani, Use of big data technologies in capital markets. Available online at: http://www.infosys.com/industries/financialservices/whitepapers/documents/big-dataanalytics.pdf, 2012, Accessed on: April 2, 2014. [5] M. Peat, Big data in finance, InFinance: The Magazine for Finsia Members, vol. 127, no. 1, pp. 34 36, 2013. [6] H. Markowitz, Portfolio selection, The Journal of Finance, vol. 7, no. 1, pp. 77 91, 1952. [7] W. Cooper, L. Seiford, and J. Zhu, Data envelopment analysis: History, models and interpretations, in Handbook on Data Envelopment Analysis (W. W. Cooper, L. M. Seiford, and J. Zhu, eds.), vol. 71 of International Series in Operations Research & Management Science, pp. 1 39, Springer US, 2004. [8] H.-H. Chen, Stock selection using data envelopment analysis., Industrial Management and Data Systems, vol. 108, no. 9, pp. 1255 1268, 2008. [9] Y. S. Chen and B. S. Chen, Applying DEA, MPI and Grey model to explore the operation performance of the Taiwanese wafer fabrication industry, Technological forecasting and social change, vol. 78, no. 3, pp. 536 546, 2011. [10] J. Dittrich and J.-A. Quian -Ruiz, Efficient big data processing in hadoop mapreduce, eproc. VLDB Endow., vol. 5, pp. 2014 2015, Aug. 2012. [11] V. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment, p. 10008, 2008. [12] N. Koochakzadeh, A heuristic stock portfolio optimization approach based on data mining techniques. PhD thesis, Department of Computer Science, University of Calgary, March 2013. [13] T. Ware, Adaptive statistical evaluation tools for equity ranking models. Submitted to: Canadian Industrial Problem Solving Workshops (Calgary, Canada, May 15-19, 2005), 2005. 6