Topic-based vector space modeling of Twitter data with application in predictive analytics

Similar documents
The Influence of News Articles on The Stock Market.

Stock Prediction Using Twitter Sentiment Analysis

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

ECS171: Machine Learning

HKUST CSE FYP , TEAM RO4 OPTIMAL INVESTMENT STRATEGY USING SCALABLE MACHINE LEARNING AND DATA ANALYTICS FOR SMALL-CAP STOCKS

Novel Approaches to Sentiment Analysis for Stock Prediction

Background for Case Study Used in Workshop

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Can Twitter predict the stock market?

Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks

THE investment in stock market is a common way of

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

An introduction to Machine learning methods and forecasting of time series in financial markets

Foreign Exchange Forecasting via Machine Learning

Text Mining Part 2. Opinion Mining / Sentiment Analysis. Combining Text procession with Machine Learning

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Predicting stock prices for large-cap technology companies

Forecasting Agricultural Commodity Prices through Supervised Learning

Beyond GLMs. Xavier Conort & Colin Priest

Classification of trading strategies of agents in a competitive market

Top-down particle filtering for Bayesian decision trees

Beating the market, using linear regression to outperform the market average

Are New Modeling Techniques Worth It?

Lazy Prices: Vector Representations of Financial Disclosures and Market Outperformance

Session 5. Predictive Modeling in Life Insurance

Stock Market Real Time Recommender Model Using Apache Spark Framework

Application of selected methods of statistical analysis and machine learning. learning in predictions of EURUSD, DAX and Ether prices

Support Vector Machines: Training with Stochastic Gradient Descent

INDIAN STOCK MARKET PREDICTOR SYSTEM

Boom or Ruin Does it Make a Difference? Using Text Mining and Sentiment Analysis to Support Intraday Investment Decisions

A Big Data Analytical Framework For Portfolio Optimization

Analyzing Representational Schemes of Financial News Articles

Do Media Sentiments Reflect Economic Indices?

Big Data, Small Data, Medium-sized Data

An Introduction to Opinion Mining and its Applications. Ana Valdivia Granada, 17/11/2016

Portfolio Recommendation System Stanford University CS 229 Project Report 2015

ScienceDirect. Detecting the abnormal lenders from P2P lending data

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 2, Mar Apr 2017

Stock Market Predictor and Analyser using Sentimental Analysis and Machine Learning Algorithms

BUZ. Powered by Artificial Intelligence. BUZZ US SENTIMENT LEADERS ETF INVESTMENT PRIMER: DECEMBER 2017 NYSE ARCA

Model 0: We start with a linear regression model: log Y t = β 0 + β 1 (t 1980) + ε, with ε N(0,

10 Things We Don t Understand About Finance. 3: The CAPM Is Missing Something!

International Journal of Computer Engineering and Applications, Volume XII, Issue IV, April 18, ISSN

Problem set 1 Answers: 0 ( )= [ 0 ( +1 )] = [ ( +1 )]

Session 5. A brief introduction to Predictive Modeling

Exploiting Topic based Twitter Sentiment for Stock Prediction

Practical example of an Economic Scenario Generator

Mean Reverting Asset Trading. Research Topic Presentation CSCI-5551 Grant Meyers

Panic Indicator for Measurements of Pessimistic Sentiments from Business News

Comparative study of credit rating of SMEs based on AHP and KMV. model

Supervised Learning, Part 1: Regression

Bayesian Finance. Christa Cuchiero, Irene Klein, Josef Teichmann. Obergurgl 2017

We are not saying it s easy, we are just trying to make it simpler than before. An Online Platform for backtesting quantitative trading strategies.

$tock Forecasting using Machine Learning

distribution of the best bid and ask prices upon the change in either of them. Architecture Each neural network has 4 layers. The standard neural netw

Financial Statements: Modeling and Analytics. Bio Overview Help Prerequisites Materials Assignments Grading Topics

By-Peril Deductible Factors

News and narratives in financial Systems

Understanding delegation in the European Union through machine learning

CS 475 Machine Learning: Final Project Dual-Form SVM for Predicting Loan Defaults

Exploiting Market Sentiment to Create Daily Trading Signals

Multiple regression - a brief introduction

Mixture Models and Gibbs Sampling

Risk and Risk Management in the Credit Card Industry

Predicting the Success of a Retirement Plan Based on Early Performance of Investments

INTELIGENCIA ARTIFICIAL. Machine Learning-Based Analysis of the Association between Online Texts and Stock Price Movements

State Switching in US Equity Index Returns based on SETAR Model with Kalman Filter Tracking

Sentiment Extraction from Stock Message Boards The Das and

Dynamic Resource Allocation for Spot Markets in Cloud Computi

Improving Stock Price Prediction with SVM by Simple Transformation: The Sample of Stock Exchange of Thailand (SET)

Using Twitter to Analyze Stock Market and Assist Stock and Options Trading

Credit Scoring. from Concept to Reality. Credit & Collections Conference Boston: June 11 th, 2007

Machine Learning in Finance

How Good is 1/n Portfolio?

Breaking News: The Influence of the Twitter Community on Investor Behaviour

Data Adaptive Stock Recommendation

Risk profile clustering strategy in portfolio diversification

Application of Data Mining Technology in the Loss of Customers in Automobile Insurance Enterprises

Sentiment Analysis of Twitter and RSS News Feeds and Its Impact on Stock Market Prediction

Improving Long Term Stock Market Prediction with Text Analysis

Data Science using R, Minitab & XLMiner

A Multi-topic Approach to Building Quant Models. Bringing Semantic Intelligence to Financial Markets

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES

Modelling economic scenarios for IFRS 9 impairment calculations. Keith Church 4most (Europe) Ltd AUGUST 2017

Automated Options Trading Using Machine Learning

DATA AND TEXT MINING OF FINANCIAL MARKETS USING NEWS AND SOCIAL MEDIA

******************************* The multi-period binomial model generalizes the single-period binomial model we considered in Section 2.

Tests for Two Variances

Adjusting for earnings volatility in earnings forecast models

INTRODUCTION AND OVERVIEW

Large-Scale SVM Optimization: Taking a Machine Learning Perspective

Engaging Pension Plan Participants Using Text Mining to better Understand Participants Thomas Post (Maastricht University and Netspar)

Stock Price Prediction using Deep Learning

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY

Statistical estimation

Relative and absolute equity performance prediction via supervised learning

Statistical Models of Word Frequency and Other Count Data

Multi-factor Stock Selection Model Based on Kernel Support Vector Machine

Machine Learning Performance over Long Time Frame

Data Mining: A Closer Look. 2.1 Data Mining Strategies 8/30/2011. Chapter 2. Data Mining Strategies. Market Basket Analysis. Unsupervised Clustering

Transcription:

Topic-based vector space modeling of Twitter data with application in predictive analytics Guangnan Zhu (U6023358) Australian National University COMP4560 Individual Project Presentation Supervisor: Dr. Timothy Graham

2

Stock prediction is Magic!!! 3

4 Module Outline Motivation and Background Goal Methods Experiments Future Work

5 Motivation People want to make money from stock market, Stock Price Prediction is attractive Prediction of Stock Price is challenging (some researches believe that stock price follow random walk) Research on relationship between Social Media and Stock Price Movement

6 Background Bag of words, Word2vec Problems: language is not merely a bag of words but a tool with particular properties which have been fashioned in the course of its use. The language is "due to meaning" in one sense[1]. Size too big (Document Term Matrix) [1] Zellig S. Harris (1954) Distributional Structure, WORD, 10:2-3, 146-162, DOI: 10.1080/00437956.1954.11659520

7 Background Topics <- Topic-based Modeling Topic Modeling based can work for Stock Market Prediction [1] LDA can be used as an effective dimension reduction method for text modeling and extract topics from the text [2] [1] Topic Modeling based Sentiment Analysis on Social Media for Stock Market Prediction (Thien Hai Nguyen, 2015) [2] An empirical study of text classification using Latent Dirichlet Allocation, Lei Li; Yimeng Zhang

8 Goal Hypothesis: The 'bag of words' model performs worse Topic-based modeling: How can LDA topic modelling be used a feature extraction technique for supervised machine learning using social media data? How accurately can a topic-based vector space model predict Google's stock market prices? What is the relationship between topics and stock price movements? Construct a good prediction model: Improve ML Model

9 Dataset Dataset Social Media Dataset 3-years (2014-2017) tweet text of Google collected using the Twitter API Historical Price Dataset: End of day: 3-years (2014-2017) data of Google from NASDAQ Intra-day: 7-months (01/01/2016-01/07/2016) data of Google from NASDAQ Price(Pi+1, Pi) -> Labels -> (Up, Fair, Down) represent price movement

10 Methods Combine Tweet Combine In Days (End-of-Day closing price) Stock market open and close in one day Pros: Combined Text is long and easy to extract topics Cons: No. of Instances is small (22 open days per month) Combine In Hours (Intra-day stock prices) Combine every hours Pros: No. of Instances is large (22*24 = 528 hours per month) Cons: Combine Text is short and hard to extract topics

11 Method Topic Model Unsupervised topic modeling Latent Dirichlet Allocation(LDA) LDA represents documents as mixtures of topics that spit out words with certain probabilities. α : per-document topic distributions; β : per-topic word distribution

12 Method Topic Model Gamma Function construct a document-topic matrix

13 Method Topic Model Black Box Topic model is a unsupervised and it is a black box. We don t know the what exactly the topics are. But we can have a look in which words are belongs to the topic.

14 Method Machine Learning Prediction using ML ML Methods: Support Vector Machine (SVM) XGBoost

15 Result Combine Methods 56% in prediction -> satisfied result ---- (Schumaker and Chen, 2009b; Si et al., 2013; Tsibouris and Zeidenberg, 1995)

16 Result No. of Topics (Value of K) (End-of-day data) Accuracy 0.35 0.40 0.45 0.50 0.55 0.60 0.65 20 40 60 80 100 120 k

17 Result ML Methods & Predicted Result (End-ofday data.) 56% in prediction -> satisfied result ---- (Schumaker and Chen, 2009b; Si et al., 2013; Tsibouris and Zeidenberg, 1995)

Result ML Methods & Predicted Result (End-ofday data) 0.6 0.5 value_d 0.4 Evaluation_Measures SVM_Acc SVM_F_up_Measure XGB_Acc XGB_F_up_Measure 0.3 0.2 0 25 50 75 100 125 k_value 18

19 Conclusion Topics discussed on Twitter can predict stock price movements LDA topic models can be used as input feature for supervised machine learning, and achieve close to stateof-the-art accuracy SVM tends to have better performance compared to more advanced algorithms such as XGBoost Topic-based vector space model performs better than a BoW model

20 Future Work LDA just focus on topics of texts. Need to consider more factors like opinions, mood and so on LDA should specific parameter K first. An non-parameter technique is needed The result is not stable. Sometimes may work really bad. It is not suitable for real world prediction. A more stable technique is needed. Prediction of stock price needs more factors. Different models should be fused together.

21 Thank you. Any Questions.