Applications of Twitter Emotion Detection for Stock Market Prediction. Clare H. Liu. S.B., Massachusetts Institute of Technology (2016)

Size: px
Start display at page:

Download "Applications of Twitter Emotion Detection for Stock Market Prediction. Clare H. Liu. S.B., Massachusetts Institute of Technology (2016)"

Transcription

1 Applications of Twitter Emotion Detection for Stock Market Prediction by Clare H. Liu S.B., Massachusetts Institute of Technology (2016) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2017 c Massachusetts Institute of Technology All rights reserved. Author Department of Electrical Engineering and Computer Science May 18, 2017 Certified by Andrew W. Lo Charles E. and Susan T. Harris Professor Thesis Supervisor Accepted by Christopher J. Terman Chairman, Masters of Engineering Thesis Committee

2 2

3 Applications of Twitter Emotion Detection for Stock Market Prediction by Clare H. Liu Submitted to the Department of Electrical Engineering and Computer Science on May 18, 2017, in partial fulfillment of the requirements for the degree of Master of Engineering in Computer Science and Engineering Abstract Currently, most applications of sentiment analysis focus on detecting sentiment polarity, which is whether a piece of text can be classified as positive or negative. However, it can sometimes be important to be able to distinguish between distinct emotions as opposed to just the polarity. In this thesis, we use a supervised learning approach to develop an emotion classifier for the six Ekman emotions: joy, fear, sadness, disgust, surprise, and anger. Then we apply our emotion classifier to tweets from the 2016 presidential election and financial tweets labeled with Twitter cashtags and evaluate the effectiveness of using finer-grained emotion categorization to predict future stock market performance. Thesis Supervisor: Andrew W. Lo Title: Charles E. and Susan T. Harris Professor 3

4 4

5 Acknowledgments First of all, I would like to express my gratitude to my thesis supervisor, Professor Andrew Lo, for giving me the opportunity to explore a new field, and for his insightful ideas and feedback. I would also like to thank Allie, Jayna, and Crystal for providing me with important resources and for their scheduling help. I especially want to thank Shomesh Chaudhuri for giving me crash courses on finance and providing invaluable suggestions and guidance over the past two years. Finally, I wish to thank my parents for their unconditional support and encouragement. 5

6 6

7 Contents 1 Introduction Thesis Organization Literature Review Emotion Classification Relationship Between Twitter Sentiment and Stock Market Performance Predicting Presidential Elections Creating an Emotion Classifier Multiclass Classification Algorithms One-vs-rest One-vs-one Logistic Regression Random Forests Datasets Baselines Methodology Feature Selection Data Preparation Implementation Details Evaluation Metrics Results Discussion

8 4 Emotion Analysis of Presidential Election Tweets Datasets Data Preparation Emotion Distributions on Election Day Election Day Key Events Comparison with Polarity-Based Sentiment Analysis Using Volume to Identify Events Can Presidential Debates Predict Market Returns? Summary of Candidate Policies S&P 500 Returns after Election Day Who won the Presidential Debates? S&P 500 Reactions to Presidential Debates Discussion Emotion Analysis of Financial Tweets Datasets Correlation Between Emotions and Stock Prices Using Volume to Identify Events Sentiment-Based Trading Strategy Preliminary Results Reevaluation of Emotion Classifier Performance Keyword-Based Trading Strategy Evaluation of Trading Strategy Performance Discussion Conclusions and Future Work 69 8

9 List of Figures 4-1 Average Sentiment during the 2016 Presidential Election Election Day Emotion Distributions First Presidential Debate Emotion Distributions during the First Presidential Debate Twitter Volume Plots for Microsoft and Facebook Preliminary Trading Strategy Performance for Microsoft, Facebook, and Yahoo Microsoft sentiment using keywords during earnings announcement on April Keyword-based Trading Strategy

10 10

11 List of Tables 3.1 Examples of Labeled Tweets Tweet Processing Example Model Comparison Logistic Regression Accuracy Metrics Classification Examples Examples of Classification Errors S&P 500 Sectors before and after Election Day Clinton: Change in joy tweets before and after debates Trump: Change in joy tweets before and after debates Morning Consult Poll Results S&P 500 Industries Before and After First Presidential Debate S&P 500 Industries Before and After Second Presidential Debate S&P 500 Industries before and after Third Presidential Debate Correlation between average emotion percentages and next-day stock returns Correlation between average emotion percentages and same-day stock returns Noise in $AAPL Tweets Microsoft Earnings Announcement Classification Examples Yahoo Earnings Announcement Classification Errors Trading Strategy Comparison Trading Strategy Statistics

12 12

13 Chapter 1 Introduction Over the past decade, the rise of social media has enabled millions of people to share their opinions and react to current events in real time. As of June 2016, Twitter has over 300 monthly active users and over 500 million tweets are posted per day [53]. Ever since the official Twitter API was introduced in 2006, users and researchers have been applying sentiment analysis algorithms on this massive data source to gauge public opinion towards emerging events. Automatic sentiment analysis algorithms have been used in a variety of applications, including evaluating customer satisfaction, fraud detection, and predicting future events, such as the results of a presidential election. Currently, most publicly available sentiment analysis libraries focus on detecting sentiment polarity, which is whether a piece of text expresses a positive, negative, or neutral sentiment. However, due to the wide range of possible human emotions, there are some limitations to using this coarse-grained approach for some applications. For instance, the producers of a horror movie may wish to use sentiment analysis to summarize understand their audience s opinion of the movie. Boredom and fear could both be classified as negative emotions, but the producers would be happy if their viewers expressed fear, while they would probably modify their approach for future movies if the viewers were bored. In this thesis, we will evaluate the merits and limitations of using a finer-grained emotion classification scheme compared to the more common sentiment polarity approach. We will also evaluate the possibility of predicting future stock returns based 13

14 on emotion distributions of tweets from two contrasting domains: presidential elections and financial tweets mentioning NASDAQ-100 companies. The election of a new president has wide implications on the future of United States and international economies, which usually results in stock market volatility. Company stock prices have also been shown to be affected by market sentiment, especially following important events such as earnings announcements and acquisitions. Since presidential elections and volatility in the stock market often evoke strong emotions in people, using a finer-grained emotion analysis approach could reveal more interesting insights about the public s perception of candidates and publicly traded companies, potentially leading to more accurate and profitable stock market predictions. 1.1 Thesis Organization The remainder of this thesis is organized as follows: Chapter 2 contains a literature review of past work in automatic emotion detection and in using Twitter to predict future stock market performance and the results of presidential elections. Chapter 3 then details the construction of and evaluates the performance of an emotion classifier for the six basic Ekman emotions. In chapter 4, we analyze tweets from the 2016 presidential election to determine whether emotion classification can be used to identify differences in public opinion towards the two presidential candidates. Then we investigate the correlation between the policies of presidential debate winners and the market performance of related industries on the following day. In chapter 5, we will evaluate the correlation between emotion distributions of tweets tagged with cashtags of the NASDAQ-100 companies and future stock returns for these companies. We then look at trends in Twitter volume and 14

15 sentiment for different tickers to identify significant events and predict outcomes on future returns. Finally, we will propose a simple trading strategy based on sentiment expressed in earnings announcement tweets. Finally, chapter 6 will summarize our major findings and suggest possible avenues for future research. 15

16 16

17 Chapter 2 Literature Review This chapter discusses approaches to automatic emotion classification and related work in using social media for stock market prediction. 2.1 Emotion Classification In 1992, psychologist Paul Ekman argued that there are six basic emotions: anger, fear, sadness, joy, disgust, and surprise. These emotions share nine characteristics with a biological basis, including distinctive universal signals, presence in other primates, and quick onset. He also argued that all other emotional states can be grouped into one of these basic emotions or classified as moods, emotional traits, or emotional attitudes instead [11]. Much of the recent research on finer-grained emotion detection has been focused on these six basic Ekman emotions. In 2007, SemanticEvaluation, an ongoing series of evaluations of computational semantic analysis systems, presented a task where the objective was to "annotate text for emotions (e.g. joy, fear, surprise) and/or for polarity orientation (positive/negative)" [51]. Participants were provided with a development corpus of 250 news headlines annotated with one of the six Ekman emotions and a test corpus of 1000 news headlines. Many future studies on emotion detection used this corpus to develop classifiers and larger corpora annotated with emotions. 17

18 Roberts and Harabagiu et al. developed EmpaTweet, a corpus of tweets annotated with the six Ekman emotions plus "love" using a semi-automated process [42]. Roberts first used a supervised learning approach to first automatically annotate unlabeled tweets with one or more emotional categories. Then human annotators were asked to verify the predominant emotion for ambiguous tweets. Mohammad et al. also created the Twitter Emotion Corpus (TEC) by collecting tweets containing hashtags of the six Ekman emotions, such as #joy and #sadness [30]. Two major approaches to automatic emotion classification include supervised learning methods and affect lexicon-based approaches [31]. Supervised learning approaches generally analyze labeled training examples to generate a prediction function that can be applied to unseen data. Many supervised learning algorithms use n-gram features to learn which words or phrases in the training data are associated with each emotion. An affect lexicon is a list of words and the emotions or sentiment that they are associated with. For example, the word "abandoned" is associated with fear and sadness, while "amuse" is associated with joy. Lexicon-based approaches usually look up the emotion associated with each word in a piece of text, if any, and label the text with the predominant emotion that was present. One example of an affect lexicon is Mohammad and Turney s NRC Word-Emotion Association Lexicon (EmoLex), which they generated using crowdsourcing from Amazon Mechanical Turk [29] [32]. Lexicon-based approaches usually perform worse than supervised learning approaches because they don t consider the context or sentence structure, which can greatly affect the meaning of a piece of text. However, lexicon-based approaches are much faster and more memory efficient than supervised learning methods, which usually use tens of thousands of features to generate models. Supervised learning approaches also may not generalize as well to other domains that do not share many n-gram features with the training set. Mohammad then investigated whether combining affect lexicons and n-gram features in a supervised learning algorithm could improve the accuracy of a classifier [31]. He found that using a combination of both types of features outperformed using n- 18

19 grams alone and affect lexicon features alone for test sets containing samples from the same domain (newspaper headlines) and a different domain (blog posts). Thus, we decided to replicate Mohammad s approach of using both n-grams and word lexicon features in our classifier. Next, we will discuss previous studies on the effectiveness of using both polaritybased sentiment analysis and emotion classification to predict future stock market movements. 2.2 Relationship Between Twitter Sentiment and Stock Market Performance Several groups have studied the correlation between sentiment polarity and the performance of various stock market indicators. Many studies found that sentiment polarity was not useful for predicting future stock returns, but that other factors such as volume were. Ranco et al. measured the correlation between Twitter volume and sentiment of Dow Jones constituents and the Dow Jones Industrial Average (DJIA). They found that sentiment polarity was not correlated with future stock returns, but that tweet volume was predictive of abnormal returns for about one third of the 30 Dow Jones companies. [40]. Hentschel et al. then studied the properties of Twitter cashtags for NASDAQ and NYSE stocks. They also found that tweet volume and market performance are sometimes related, but not always [19]. The correlation between tweet volume and future returns suggests that increases in tweet volume can be indicators of important events that can impact the market. Azar and Lo focused specifically on tweets mentioning the Federal Open Market Committee from and calculated the sentiment polarity for these tweets, weighting the polarity values by each Twitter user s number of followers. They found that the effect of sentiment polarity on returns was negligible except on the eight days that the FOMC meets, where increases in sentiment polarity are positively correlated with returns [3]. Furthermore, they were able to develop a sentiment-based trading 19

20 strategy that significantly outperformed benchmarks, even when only using eight days of data. Therefore, sentiment polarity seems to have the most predictive value when applied to significant market events. Other studies focused on identifying emotions or moods expressed in tweets and other forms of social media. Bollen et al. measured the mood of tweets in six dimensions (Calm, Alert, Sure, Vital, Kind, and Happy) in addition to their polarity (positive/negative). Like Ranco, they found that just the polarity of tweets was not correlated with future stock returns, but that the calmness dimension could be used to predict movements in the Dow Jones Industrial Average [6]. Mittal and Goel also found that calmness and happiness had a strong positive correlation with the DJIA. They were also able to accurately predict future DJIA closing prices using a neutral network algorithm and develop an improved portfolio management strategy that makes buy and sell decisions based on whether predicted future stock prices are above or below the mean values [28]. Gilbert and Karahalios used a supervised learning approach to create the "Anxiety Index", a metric of anxiety, fear, and worry expressed in blog posts published on LiveJournal. They found that increases in anxiety, worry, and fear across all of LiveJournal predicted downward pressure on the S&P 500 index, even when including blogs not related to finance [16]. Zhang et al. used a simpler approach to categorize tweets into the six Ekman emotions by counting words associated each emotion. Interestingly, they found that outbursts of both positive and negative emotions on Twitter had a negative correlation with the Dow Jones, S&P 500, and NASDAQ indices [56]. These results support our hypothesis that categorizing tweets into finer-grained emotions can be more useful than classifying tweets as just positive or negative for stock market prediction. 20

21 2.3 Predicting Presidential Elections Twitter sentiment analysis has also been used to predict the results of presidential elections. Jahanbakhsh and Moon performed a variety of analysis techniques, such as studying frequency distributions, sentiment analysis, and topic modeling to identify topics discussed in tweets during the 2012 presidential election [22]. They were able to determine that Obama was leading during the election from only analyzing Twitter data, which demonstrates the potential predictive power of Twitter for elections. Shi et al. investigated public opinion on Twitter during the 2012 republican primary election. They tested the correlation between various Twitter factors, including the Twitter volume for each candidate, the geolocation of Twitter users, and whether the Twitter account is a promotional account, and official poll results from the Realclearpolitics website. Their algorithm was able to accurately predict public opinion trends for Mitt Romney and Newt Gingrich, two out of the four candidates. Again, they found that their results when combining Twitter sentiment with volume were very similar to using volume alone [48]. In addition, presidential election results have also been shown to be tied to future stock returns. Prechter et al. found that social mood reflected by the stock market was more predictive of the success of an incumbent president s reelection bid than traditional macroeconomic factors, such as the Gross Domestic Product, inflation rate, and unemployment rate [39]. Oehler et al. analyzed stock market returns following presidential elections from 1976 to 2008 and found that the election of almost all recent presidents caused abnormal returns in many sectors and industries, but that the stock returns eventually stabilized with time. They also discovered that these effects were more correlated with the specific policies of individual presidents rather than the general ideology of the president s political party. They hypothesized that this effect is caused by initial uncertainty about the president-elect s new policies [35]. These results suggest that we can use a combination of Twitter volume and sentiment to gauge public opinion towards presidential candidates, which can in turn be 21

22 used to predict stock market returns following elections. 22

23 Chapter 3 Creating an Emotion Classifier Many corpora and libraries are publicly available for polarity-based sentiment analysis. However, finer-grained emotion categorization has not been studied as much, so we will develop our own emotion classifier to label unseen tweets with one of the six Ekman emotions in this chapter. This chapter will first summarize several approaches to multiclass classification, and then describe the implementation of our emotion classifier and evaluate its performance. 3.1 Multiclass Classification Algorithms Many machine learning classification algorithms are designed to classify input examples into two groups, such as positive and negative. These binary classification algorithms generally work by generating features for each training example and then calculating a decision boundary between the two classes. However, since we want to classify each tweet as one of the six basic Ekman emotions, we must use a multiclass classification approach. Multiclass classification solves the problem of assigning labels to a set of input examples, where there are more than two classes [1] [2]. Most multiclass classification approaches are based on binary classification methods. The one-vs-rest and one-vs-one strategies work by reducing the problem into multiple binary classification tasks. Other binary classification algorithms, such as logistic regression and random forests, can naturally be extended to 23

24 multiclass problems. All of these approaches are summarized below One-vs-rest The one-vs-rest approach trains a single binary classifier per class, where samples from each class are treated as positive samples and all other samples are negative samples. Each classifier produces a real-valued confidence score instead of just a class label. Then we can apply each classifier to each unseen sample and choose the label that corresponds to the classifier with the highest confidence score. The following equation describes how a label is chosen for each sample. ˆy = arg maxf k (x) (3.1) k 1...K If we have K classes, for each unseen sample x, we apply each of the K classifiers to the sample. f k (x) represents the confidence score obtained by applying classifier k to sample x. Then we choose the label ˆy to be the class k, where f k produces the highest confidence score [1] [2] One-vs-one The one-vs-one method trains K(K 1) 2 binary classifiers between each pair of the K total classes. Each of these classifiers is applied to all unseen samples and a voting scheme is applied, where each binary classifier votes for the class that produced the higher confidence score. The class with the highest number of votes is ultimately predicted for each sample [1] [2] Logistic Regression Linear regression is another classification algorithm that predicts real-valued outputs based on a linear function of the input examples. The basic linear prediction function is given in equation 3.2, where x is a vector containing the features of the training samples, y is a vector of predicted labels, and θ refers to the parameters of the model 24

25 [34]. y = h θ (x) = i θ i x i = θ T x (3.2) However, the linear regression model does not work well for classifying examples into a few discrete classes. Thus, the logistic regression classifier uses the sigmoid function in equation 3.3 to map the output of the linear prediction function into the range [0,1]. Thus, h θ (x) represents the probability that a x is a positive example. Similarly, 1 h θ (x) represents the probability that x is a negative example [34]. P (y = 1 x) = h θ (x) = exp ( θ T x) (3.3) For multiclass classification with K classes, we can use multinomial logistic regression, which runs K 1 independent binary logistic regression models. One class is chosen as a pivot value and the other K 1 classes are compared against this probability value. Finally, the class with the highest probability score is predicted, similarly to the one-vs-rest algorithm described above [27] Random Forests The random forest classification algorithm is an ensemble learning method based on decision trees. Decision trees are made up of decision nodes and leaves, which each represent a possible class. At each decision node, we examine a single variable, and we choose another node based on the result of a comparison function using the sample s features as inputs. The final leaf we choose is outputted as the predicted label [43]. The random forest algorithm constructs many decision trees and outputs the class that was the most frequently predicted by each of the individual decision trees. Combining the results of multiple decision trees helps to correct for a single decision tree s tendency to overfit to its training set [20]. 25

26 3.2 Datasets We use Mohammad s Twitter Emotion Corpus (TEC) as training data for our classifier. This corpus contains over 21,000 tweets annotated with one of the six Ekman emotions [30]. We also used Mohammad and Turney s NRC Word-Emotion Association Lexicon (EmoLex) to identify words that are associated with each of the six Ekman emotions. EmoLex is an affect lexicon that contains over 14,000 English words and a list of the Ekman emotions each word is associated with. Table 3.1 shows examples of tweets in the TEC that are labeled with each of the six Ekman emotions. Table 3.1: Examples of Labeled Tweets Tweet FANTASTIC. My amazing memory saves the day again! Now I can sleep in tomorrow I also hate the dentist and that s were I am heading to. I wish he was on strike lol #brokentooth I have a package at the post office. Can t think what could be in it. I don t remember internet shopping while drinking. Feeling left out... I guess I always have my boyfriend. People who say you broke their computer because you figured out what was wrong should die in a house fire. The fact wedding makes headlines and provides that pathetic excuse of a celebrity with more money makes me sick Emotion joy fear surprise sadness anger disgust 3.3 Baselines We implemented two simple baseline approaches to allow us to better evaluate the performance of our emotion classifier. The first baseline we tested was random guessing for each emotion, where each tweet is assigned a random number between 1 and 6, and each number corresponds to one of the six Ekman emotions. This approach 26

27 had an average 10-fold cross validation score of over 20 trials. In addition, we implemented an affect lexicon approach by counting words corresponding to each of the six emotions in and labeling tweets the emotion associated with the greatest number of words. This approach had a 10-fold cross validation score of 0.275, which slightly outperforms the random guessing approach. However, even though every tweet in the training set was labeled with one of the six Ekman emotions, % of the tweets in the training set did not contain any emotion words. For example, the tweet "One more week and I m officially done with my first semester of college.", clearly expresses joy, but since none of the joy words are contained in this tweet, this tweet would be classified as neutral. The poor performance of our baseline approaches indicates that a supervised learning approach is necessary in order to develop a classifier with acceptable accuracy scores. 3.4 Methodology This section describes the implementation of our classifier using a supervised learning approach, including feature selection and preprocessing of the training corpus Feature Selection Since tweets are limited to 140 characters, the main idea of each tweet can usually be captured in just a few words. Therefore, we chose to use simple features, such as the presence or absence of unigrams and bigrams that appeared more than once in the training corpus. Bigrams were included to account for negation and basic sentence patterns that can affect the meaning of a tweet. For example, the phrase "not happy" conveys the opposite emotion as "happy", even though both phrases contain exactly one word that is associated with the joy emotion. We also chose to include features corresponding to the number of words associated with each of the Ekman emotions, as described in the second baseline above, since Mohammad found that including affect lexicon features improved classifier performance across different domains [31]. 27

28 3.4.2 Data Preparation All words in the NRC Lexicon and all unigrams and bigrams in all tweets were converted to lowercase and stemmed with NLTK s Snowball Stemmer. This is to ensure that two English words with the same base word, but different tenses or forms would be treated as the same word. Stemmers work by removing suffixes to extract the base word [37]. For example, the words "organized" and "organizing" would both be converted to "organize". Punctuation marks are also treated as separate words, because some punctuation marks can be used to emphasize an emotion. For instance, exclamation points are often used when expressing joy and question marks are used when expressing surprise. All other special characters are removed from tweets. Table 3.2 shows an example of a tweet before and after it has been processed. Original Tweet Processed Tweet "I will NOT go to he d until I have my eyebrows threaded and my Mani/ Pedi... As a matter of fact I will be sleeping on the chair!!" "i will not go to he d until i have my eyebrow thread and my mani pedi... as a matter of fact i will be sleep on the chair!!" Table 3.2: Tweet Processing Example Implementation Details Features are stored in the matrix X, where X is an m n matrix, where each row represents a sample and each column represents a feature. X[i, j] corresponds to the value of feature j for sample i. The matrix y is an m 1 matrix that stores labels, so y[i] corresponds to the label for sample i. To populate the feature vectors, all unique unigrams and bigrams in the training corpus were assigned an index j between 0 and m 1. At the prediction stage, all tweets are stemmed and separated into unigrams and bigrams. If n-gram j is present 28

29 in tweet i, X[i, j] is set to 1 to indicate the presence of a particular n-gram. Because the training set contains over 35,000 unique stemmed unigrams and bigrams, and the vast majority of the unigrams and bigrams will not appear in a particular tweet, we use sparse matrices for space efficiency. Six additional features were added to represent the counts of words from each emotion category from EmoLex. Since the training set did not contain any examples of neutral tweets that expressed no emotion, tweets expressing no emotion will be erroneously classified. Therefore, we also used Pattern to calculate the sentiment polarity of each tweet. Pattern is a web mining Python module that includes sentiment analysis and natural language processing tools. Pattern utilizes SentiWordNet, a corpus of English words annotated with a positivity, negativity, and objectivity scores for each word, to calculate polarity scores. Pattern then groups each tweet into varying sizes of n-grams and averages the positivity, negativity, and objectivity scores for each group of words to calculate a final polarity and subjectivity score. Adjectives and adverbs can also amplify or negate the polarity score of a tweet [10]. Pattern s sentiment module reports a sentiment polarity ranging between -1 and 1, and a subjectivity score for each tweet ranging from 0 to 1 [10]. A polarity score of -1 means that the tweet is totally negative, 0 represents a neutral tweet, and 1 represents a totally positive tweet. We reclassified any tweets with a sentiment polarity score of 0.0 as neutral. We then tested various multiclass classification algorithms implemented in scikitlearn modules to determine the algorithm that would produce the best accuracy for our training set. The algorithms we tested included support vector machines using the one-vs-rest and one-vs-all strategies, logistic regression, and random forests [33]. 3.5 Evaluation Metrics Since no test set was provided, we used scikit-learn s built-in cross_val_predict function to evaluate the performance of our classifiers. cross_val_predict works by splitting the training set into n equal-sized groups. For each group i, the other n 1 29

30 groups are used as training data and predictions are made for group i, treating group i as the test set. This process is repeated for all of the n groups until every sample has been included in the test set exactly once. The cross_val_predict function returns the predicted labels for each element when that element was part of the test set [9]. We used the output from cross_val_predict to compute precision, recall, and F1 scores to evaluate each of the four models we tested. For a binary classification problem, precision represents the percentage of samples predicted as positive that are actually positive. Recall represents the percentage of actual positive samples that were predicted as positive by the classifier. The F1 score is a harmonic mean of the precision and recall and is often the main metric used to evaluate a classifier s performance, since it is possible to design naive classifiers with artificially high precision or recall scores. For example, a classifier that predicts every sample as positive would have a 100 percent recall score. The equations for calculating precision, recall, and F1 scores are listed in equations 3.4 to 3.6. tp, fp, and fn represent true positives (sample is positive and was predicted as positive), false positives (sample is not positive, but was predicted as positive), and false negatives (sample is positive, but was predicted as negative) respectively. P recision = tp tp + fp (3.4) Recall = tp tp + fn (3.5) F 1 = 2 precision recall precision + recall (3.6) We can extend these evaluation metric calculations to multiclass problems by calculating each metric individually for all classes and then calculating the weighted average. For the "joy" class, all samples that are labeled with "joy" are counted as positive, while all other samples are counted as negative, and likewise for all other classes. Then the binary classification formulas for precision, recall, and F1 scores can be directly applied. 30

31 3.6 Results Table 3.3 shows the precision, recall, and F1 results for each of the four models we tested. Table 3.3: Model Comparison emotion Precision Recall F1 One-vs-rest (SVM) One-vs-all (SVM) Logistic Regression Random Forest All four supervised learning machine learning models significantly outperformed our baselines of random guessing and only using an affect lexicon. The logistic regression model performed the best for all three evaluation metrics, so we will use this model for all classification problems throughout this thesis. Table 3.4 shows the precision, recall, and F1 scores for each emotion class for our logistic regression model. 31

32 Table 3.4: Logistic Regression Accuracy Metrics Emotion Number of Tweets Precision Recall F1 joy fear anger surprise sadness disgust All Emotions 21, The joy emotion had the highest F1 score and the disgust emotion had the lowest F1 score. This observation can be explained by the fact that joy is the only positive Ekman emotion, while it is more difficult to distinguish between the other Ekman emotions. In addition, joy was also the most common emotion in the training set, while disgust was the least common.therefore, obtaining more training examples could help improve the classifier s accuracy. 3.7 Discussion We looked at a sample of tweets from the 2016 presidential debates to subjectively evaluate the classifier s performance on unseen data. In general, the classifier seems to work well since Twitter s character limit usually prevents users from expressing multiple conflicting emotions in a single tweet. Table 3.5 shows some example tweets where the classifier predicted the correct emotion. Many of these tweets contain words or phrases that are strongly associated with an emotion, such as "dangerous" for fear, and "shut up" for the anger emotion. 32

33 Table 3.5: Classification Examples Tweet Emotion Polarity Hilary is calm, measured, has the facts on her side. Trump is turning red and frothing at the mouth like a twitter troll. RT this if you re proud to be standing with Hillary tonight. #debatenight shut up and let her speak you 3 year old brat Hillary Clinton policy created ISIS. She is dangerous AF. Plus she s a huge LIAR #debatenight Hillary invited Marc Cuban to the debates as we all know; unfortunately not everyone could make it. RIP #SethRich #deb disgust 0.15 joy 0.8 anger 0.1 fear -0.1 sadness 0.25 #Debates #Debates none 0.0 #Polls #slipping, have HER camp on defense/lowering expectations, goi surprise -0.1 However, our classifier does not perform as well on certain types of tweets. Table 3.6 shows some examples of tweets that have been misclassified. Relying on Pattern to identify neutral tweets introduces more errors because sentiment polarity algorithms are not completely accurate either. The first tweet clearly expresses joy and the second tweet expresses disgust, but our classifier predicted them as being neutral because the Pattern sentiment analysis algorithm assigned them polarities of

34 Table 3.6: Examples of Classification Errors Tweet Emotion HILLARY HAS GOT TRUMP SOOO none 0.0 OUTCLASSED!!!! Hillary is the most corrupt person none 0.0 to ever run for the presidency of the United States. #DrainTheSwamp Three key questions for Trump and Clinton ahead of joy the first debate #Debates Honestly, you can t win any debate having lied so often to the world. joy 0.1 The third tweet is labeled with "joy", but it actually has a neutral sentiment. Since "joy" was the most common emotion in our training set, many tweets that do not contain any emotional words or any of the unigrams or bigrams in the training set are labeled with "joy" by default. This example demonstrates a case where Pattern fails to identify some tweets as neutral. In the future, creating an expanded corpus that also includes neutral tweets could mitigate these types of mistakes since we would no longer have to rely on external libraries which are not 100 percent accurate themselves. The final tweet is labeled with "joy", even though it is expressing a negative opinion. This is probably because this tweet includes the word "win", which is associated with joy. Even though the word "can t" negates the meaning of "win", the bigram "can t win" probably was not present in our training set. Splitting contractions into their base words, such as converting "can t" to "can not", could help to resolve this issue. In addition, the word "lied" has a negative connotation, but it also does not appear next to "win", so the bigram features would also fail to capture the negative emotion. Therefore, using more advanced features that take sentence structure into account could also lead to more accurate results in future studies. 34

35 Chapter 4 Emotion Analysis of Presidential Election Tweets The 2016 United States presidential election was the most tweeted election in history. Over 1 billion tweets were posted since the primary debates began in August 2015, and over 75 million tweets were posted on Election Day alone, which is more than double the number of tweets posted on the previous election day in 2012 [8] [18]. The presidential candidates themselves were also very active on social media, with Hillary Clinton s tweet telling Donald Trump to "Delete your account" becoming the most retweeted tweet throughout the entire election cycle. In this chapter, we will explore whether Twitter sentiment during the election cycle could have been leveraged to predict future returns for key S&P 500 industries. 4.1 Datasets We obtained tweets from George Washington University s 2016 presidential election dataset published on Harvard s Dataverse repository [26]. This dataset contains approximately 280 million tweet ids during the 2016 presidential election cycle from July 13, 2016 to November 10, The tweets are grouped into several collections, including the three presidential debates, the Democratic and Republican conventions, and election day itself. S&P 500 daily adjusted closing prices for all sectors and 35

36 industries were obtained from Yahoo Finance Data Preparation We used the Twarc Python library to hydrate the lists of tweet ids for the collections corresponding to election day and each of the three presidential debates. Twarc makes calls to the Twitter API to retrieve each tweet s text and metadata, such as the time and date that it was posted, the user who posted it, and the number of times it was retweeted [52]. Deleted tweets or tweet ids associated with deleted accounts were dropped. We were able to successfully retrieve % of the 14 million tweets contained in these four collections. Then we extracted the timestamp and tweet text from each of the hydrated tweets and then we applied our emotion classifier described in chapter 3 on each tweet to label each tweet with an Ekman emotion. We again used the Pattern module to label tweets with a sentiment polarity score of 0.0 as neutral. Since many Twitter users have opposing opinions towards Clinton and Trump, we also categorize each tweet as being about Clinton, Trump, or both candidates. This allows us to identify differences in emotion distribution trends between the two candidates across key events during the election. To identify tweets about Donald Trump, we selected tweets that contained at least one of the following keywords or hashtags: "@realdonaldtrump", "trump", "#trump", "donald". Similarly, tweets containing at least one of the following words or hashtags were categorized as being about Hillary Clinton: "clinton", "hillary", "#clinton", "#hillary", "@hillaryclinton". 4.2 Emotion Distributions on Election Day This section highlights some insights revealed based on the emotion distributions of tweets from election day on November 8,

37 4.2.1 Election Day Key Events Prior to the election, Hillary Clinton was predicted to win based on poll results and also due to her stronger performance on the presidential debates. However, there were several turning points during the election. According to Leip s 2016 election night events timeline, all polls closed at midnight on November 9, This was a turning point in the election as many key swing states (such as Florida and North Carolina) had called for Trump in the previous hour, so it became evident at this point that Trump was very likely to win the election. At this point, Trump had 244 out of 270 electoral votes and many of the remaining states were traditionally red states [25]. Afterwards, at 2:43 AM on November 9, 2016, NBC reported that Hillary Clinton had called Donald Trump to officially concede [38] Comparison with Polarity-Based Sentiment Analysis As a baseline, we will first use Pattern s sentiment analysis algorithm, which returns a sentiment polarity between -1 and 1 [10]. Figure 4-1 shows the average sentiment per minute during election day on November 8, Figure 4-1: Average Sentiment during the 2016 Presidential Election The first dotted line on this figure indicates the closing of the polls and the second 37

38 dotted line indicates Hillary Clinton s concession. Clinton and Trump had similar sentiment trends during the course of the election night. The average sentiment polarity for both candidates remained fairly stable at around 0.1 until polls closed. The average sentiment then dropped for both candidates after the polls closed and then started to stabilize after Clinton s concession. Compared to tweets about Trump, the average sentiment for Clinton dropped more after the polls closed and remained more volatile after her concession. Even though we can identify differences in sentiment, it is still difficult to draw conclusions on how the public s attitude towards Clinton and Trump evolved throughout the election, since a wide variety of emotions are associated with a negative sentiment. In contrast, figure 4-2 shows how the emotion distributions shifted throughout the night in ten-minute intervals. After the polls closed and it became clear that Trump had accumulated most of 270 electoral votes required, anger quickly became the predominant emotion in tweets about Clinton. After Clinton s concession to Trump, the predominant emotion then changed to sadness for Clinton. 38

39 Figure 4-2: 2016 Election Day Emotion Distributions (a) Tweets about Clinton (b) Tweets about Trump Interestingly, the emotion distributions after these key events did not appear to fluctuate as much for tweets about Trump, even though it is expected that the percentage of "joy" tweets would increase for Trump after Clinton s concession. One possible explanation is that the demographics of Twitter users are not totally repre- 39

40 sentative of the average US voter, since social media appeals more to young users, who have historically been more likely to support the Democratic party [15] Using Volume to Identify Events Next, we analyzed tweets from the first presidential debate. George Washington University s dataset includes tweets from a 24-hour period starting from the morning of each presidential debate and ending the next morning after the debate had concluded. In figure 4-3, we plot the number of tweets aggregated over each ten-minute window throughout this 24-hour period. As expected, the number of tweets spikes dramatically during the debate, which occurred from 9:00 PM - 10:30 PM Eastern time (marked by the dotted lines). We also see that the relative frequencies of each Ekman emotion remain relatively stable before and after the debate, but greatly fluctuate during the debate. Thus, using a combination of Twitter volume and changes in sentiment can potentially be used to identify unusual events that occur during a given time period. This topic will be explored further in chapter 5 in the context of financial tweets. Since major current events often lead to volatility in the stock market, we will now investigate the impact of presidential debates on future stock returns. 40

41 Figure 4-3: First Presidential Debate (a) First Presidential Debate Tweet Volume (b) First Presidential Debate Emotions 41

42 4.3 Can Presidential Debates Predict Market Returns? Oehler et al. previously found that the stock returns for related sectors and industries following a presidential election were highly correlated with the new president s policies [35]. In this section, we aim to determine whether this observation also holds true after presidential debates. We will analyze the predicted impact of Clinton and Trump s proposed policies on a subset of S&P 500 industries and compare the stock market reaction immediately following each debate Summary of Candidate Policies Here we will briefly summarize Clinton and Trump s contrasting policies relating to a subset of S&P 500 sectors and industries. Pharmaceuticals and Biotechnology: Clinton proposed tighter regulations on drugmakers and wanted to set monthly price limits on drugs, both of which would lead to a loss of profits for pharmaceutical companies. Trump also wanted to make drugs more affordable, but was not as detailed about his plans. Therefore, the pharmaceuticals industry was predicted to perform better under a Trump administration [14]. Financials: Clinton proposed tighter regulations on banks, so the financials sector was also predicted to perform better under Trump [5]. Energy: Trump planned to lift restrictions on oil and gas companies, and increase fossil fuel production to increase job growth opportunities. Clinton s policies focused on renewable energy. Since the majority of stocks in the Energy sector are oil and gas companies, Trump s election was predicted to benefit the Energy sector [4]. Defense: The Defense industry would benefit from a Trump presidency due to his plans for increased defense spending [5]. 42

43 Technology: The Technology sector would perform better under Clinton due to her support for highly skilled immigration and plans to increase spending on STEM education [47]. Healthcare Facilities: Trump wanted to repeal and replace the Affordable Care Act, which would create a lot of uncertainty for hospitals. Therefore, healthcare facilities and hospitals would benefit from a Clinton presidency [23] S&P 500 Returns after Election Day Table 4.1 shows the closing prices and returns for each of these sectors on November 9, 2016, the day after the election. As predicted, pharmaceuticals, financials, defense, and energy made large gains after President Trump was elected. Healthcare facilities also fell significantly while the technology sector fell slightly, confirming Oehler s observations about the impact of presidential elections on specific sectors. Table 4.1: S&P 500 Sectors before and after Election Day Sector/Industry November 8 November 9 Return Pharmaceuticals and Biotech 1, , % Financials % Aerospace and Defense % Energy % Technology % Healthcare Facilities % S&P 500 2, , % To determine whether this pattern also holds true for presidential debates, we will use our emotion classifier to determine winners for the presidential debates. 43

44 4.3.3 Who won the Presidential Debates? We will now analyze the changes in emotion distributions to predict a winner for each of the three presidential debates. Figure 4-4 shows the emotion distributions before and after the first presidential debate (marked by the black dotted lines) for both presidential candidates. Figure 4-4: Emotion Distributions during the First Presidential Debate (a) Tweets about Clinton (b) Tweets about Trump 44

45 We can see that the percentage of joy tweets for Clinton increased after the debate, while the percentage decreased for Trump. Thus, we will use the change in percentage of joy tweets to estimate how each debate affected public opinion towards both candidates. Tables 4.2 and 4.3 display the percentage change in tweets expressing joy before and after each presidential debate for Clinton and Trump, respectively. The percentage of positive tweets increased for Clinton after all debates and it decreased after all debates for Trump. Therefore, based on our emotion distributions, we can conclude that Clinton s performances on all three presidential debates were better-received than Trump s. Table 4.2: Clinton: Change in joy tweets before and after debates Before After Change First Debate % % % Second Debate % % % Third Debate % % % Table 4.3: Trump: Change in joy tweets before and after debates Before After Change First Debate % % % Second Debate % % % % Third Debate % % % These results are supported by the polls that Morning Consult conducted after the conclusion of each debate (Table 4.4). All three polls showed that a higher percentage of participants believed that Clinton was the winner of each debate [12] [36] [13]. 45

46 Table 4.4: Morning Consult Poll Results Clinton Won Trump Won First Debate 49 % 26 % Second Debate 42 % 28 % Third Debate 43 % 26 % S&P 500 Reactions to Presidential Debates Now we will evaluate whether there is any correlation between Clinton s debate wins and stock returns for industries relating to her major policies. Table 4.5 shows S&P 500 returns following the first presidential debate. Technology stocks gained 1.15% and energy stocks fell in response to Clinton s win, as we predicted in the above section. The other four industries also made small gains. Table 4.5: S&P 500 Industries Before and After First Presidential Debate Sector/Industry September 26 September 27 Return Technology % Financials % Pharmaceuticals and Biotech % Aerospace and Defense Healthcare Facilities % Energy % S&P 500 2, , % However, the industry-specific returns following the second debate do not seem to be correlated with Clinton s policies, as energy stocks rose significantly after the second debate (Table 4.6). Nevertheless, the overall S&P 500 index still rallied following the first and second presidential debates, which is another predicted result based 46

47 on the similarity of Clinton s policies to those of the incumbent president, Barack Obama, as Prechter had previously found a positive relationship between an incumbent s vote margin and the percentage gain in the stock market during the three years prior to the election [39]. Table 4.6: S&P 500 Industries Before and After Second Presidential Debate Sector/Industry October 7 October 10 Return Healthcare Facilities % Energy % Technology % Financials % Aerospace and Defense % Pharmaceuticals and Biotech 1, , % S&P 500 2, , % Likewise, after the third debate (Table 4.7), pharmaceuticals gained, technology stocks fell, and the S&P 500 index also fell, contradicting Clinton s proposed policies. However, the third presidential debate occurred around the same time as many earnings announcements, which could explain some of the unexpected returns [24]. 47

48 Table 4.7: S&P 500 Industries before and after Third Presidential Debate Sector/Industry October 19 October 20 Return Pharmaceuticals and Biotech 1, , % Healthcare Facilities % Financials % Energy % Aerospace and Defense Technology % S&P 500 2, , % 4.4 Discussion Even though we were unable to identify a clear pattern between presidential debate winners and stock returns for related S&P 500 industries and sectors, we have still shown that categorizing tweets into emotions is more effective than a polarity-based approach at highlighting differences in public opinion towards presidential candidates. Oehler s study also concluded that abnormal returns after elections are probably caused by initial uncertainty towards the new president s policies [35]. Even though Clinton performed better in all three debates, Clinton s policies were still just theoretical at the time. Other economic factors, such as earnings announcements and the state of the global economy, may also overshadow the impact of presidential debates on the stock market. Furthermore, participants who believed that Clinton won the debates may have still disagreed with some or all of her policies. The first poll conduced by Morning Consult showed that even 12 % of Trump supporters believe that Clinton won the debate [12]. Thus, in addition to categorizing tweets by the presidential candidates mentioned, it would also be interesting to analyze the sentiment of tweets about 48

49 specific policies or key election issues in the future. 49

50 50

51 Chapter 5 Emotion Analysis of Financial Tweets In 2012, Twitter introduced cashtags, which are stock ticker symbols prefixed with a $ symbol that behave similarly to hashtags. Cashtags can be used to search for financial news about publicly traded companies. In this chapter, we will explore the relationships between the sentiment and volume of tweets tagged with NASDAQ-100 cashtags and future returns for NASDAQ-100 companies. 5.1 Datasets Tweets were obtained from Enrique Rivera s NASDAQ 100 Tweets dataset published on Dataworld. This dataset contains approximately 1 million tweets mentioning any NASDAQ-100 ticker cashtag symbols between March 10, 2016 and June 15, 2016 [41]. However, most ticker symbols were missing data at the beginning of this period, so we only used tweets starting from March 28, This dataset also contains additional metadata for each of the 100 cashtags, such as the most retweeted tweets and the top 100 Twitter users sorted by number of followerss. We also used Yahoo Finance to obtain daily adjusted closing prices during this three-month period. Millisecond trade data was obtained from the Wharton Research Data Services (WRDS) TAQ database. Earnings announcement dates and estimates were obtained from Zacks Investment Research. 51

52 5.2 Correlation Between Emotions and Stock Prices Previous work by Zhang suggested that emotional outbursts of any type on Twitter had weak negative correlations with future Dow Jones, S&P500, and NASDAQ index prices [56]. We want to investigate whether focusing only on financial tweets tagged by cashtags, instead using a sample of all tweets as Zhang did, would produce a stronger correlation with future stock market performance. First, we calculated the distribution of Ekman emotions on each day over all cashtags in our dataset using the emotion classifier we described in Chapter 3. Then, we calculated the Pearson correlation coefficients between the percentages of each Ekman emotion and the NASDAQ-100 return on the next day. The Pearson correlation coefficient (Equation 5.1) is a measure of the strength of the linear relationship between two variables [34]. r can range between -1 and 1, where 1 represents a perfect positive linear correlation, 0 represents no linear correlation at all, and -1 represents a perfect negative linear correlation. We used the percentages of each emotion on day t as x and the return corresponding to the price change from day t to day t + 1 as y. r = n i=1 (x i x)(y i y) n i=1 (x i x) 2 n i=1 (y i y) 2 (5.1) Since anyone can make a Twitter account and post random tweets containing cashtags, we also wanted to determine whether tweets from more reliable sources were more predictive of future returns. Thus, we also collected tweets only from the top 100 Twitter users sorted by number of followers and calculated the correlation coefficients again for this subset of tweets for all NASDAQ-100 stocks. Table 5.1 displays the average correlation between the emotion percentages and each stock s return on the following day, for both all tweets and only tweets written by the top 100 users. Since surprise can be either a positive or negative emotion, depending on the type of news, we also calculated separate correlation coefficients between "surprise" tweets with a positive polarity score and surprise tweets with a negative polarity score. Bolded values are statistically significant at p <

53 We found that none of the original Ekman emotions had statistically significant correlations with next-day returns for either of the two groups, with all correlation coefficients being under 20 percent. However, tweets expressing positive surprise and negative surprise from the top 100 users showed stronger positive and negative correlations, respectively. This could be because uncertainty usually leads to volatility in the stock market, as shown during the aftermath of the 2016 presidential election. Therefore, using a combination of sentiment polarity and finer-grained emotion classification can reveal more information about future stock returns than either of these approaches alone. Table 5.1: Correlation between average emotion percentages and next-day stock returns Emotion Top 100 Users All Users Joy Fear Sadness Disgust Anger Surprise Surprise (positive) Surprise (negative) No Emotion We then calculated the correlations between the current day s emotion percentages and the current day s returns to determine whether twitter users are actually reacting to changes in stock prices instead. Table 5.2 shows the average correlation between each stock s emotions and the return on from the same day. Interestingly, the top 100 users did not have significant differences in the correlations between same-day and next-day returns. In contrast, the general public had a much stronger positive 53

54 correlation between tweets expressing joy and also a much stronger negative correlation between tweets expressing anger. Both of these correlation coefficients were statistically significant at p < These results suggest that the general public is more reactive to stock market prices, while the top users have more neutral attitudes. This could be explained by the fact that many of the top users by follower count are professional news sources, such as Reuters, Wall Street Journal, and Business Insider. Thus, most tweets by these accounts would focus on reporting news about companies in an unbiased manner. In the future, it may be interesting to analyze sentiment in tweets posted by professional investors to determine whether it is possible to leverage expert opinions to predict changes in stock prices. Table 5.2: Correlation between average emotion percentages and same-day stock returns Emotion Top 100 Users All Users Joy Fear Sadness Disgust Anger Surprise Surprise (positive) Surprise (negative) No Emotion Excess noise in the Twitter dataset is another factor that could explain the low correlation values for emotions other than surprise. Zhang s study was conducted in 2009, when there were only 18 million Twitter users, compared to over 300 million today [53]. Table 5.3 shows several examples of noise in the Twitter data. Many tweets contain multiple cashtags, even when not all of the companies are actually 54

55 discussed in the tweet. Table 5.3: Noise in $AAPL Tweets Tweet Emotion Polarity Bad News For Twitter Longs $AAPL #APPLE $DIS $GOOG $GOOGL $SQ $TWTR Fitbit Management Upbeat on Expected New Product, Says Raymond James - Tech Trader Daily - $FIT $GRMN $AAPL Florida to face flooding, dangerous seas from Tropical Storm Colin #TRUMP $TWTR $AAPL #wlst Classic Marxist economics about how a servile population will submit to any old crap $AAPL sadness -0.7 joy fear -0.6 disgust Even though all of these tweets contain the $AAPL cashtag and are labeled with the correct emotion, none of the tweets are actually related to Apple. The first and second tweets are expressing emotions towards Twitter and Fitbit respectively, while the last two tweets do not mention any NASDAQ-100 company at all. The prevalence of these types of tweets can skew the emotion distributions and mask patterns and correlations that may be present. Nevertheless, many previous studies have shown that Twitter volume has a greater impact on future stock prices, so we will explore this relationship in the next section. 5.3 Using Volume to Identify Events In the previous chapter, we saw that Twitter volume spiked while a presidential debate was ongoing. We use a similar approach here to determine whether there is a correlation between tweet volume and stock returns. Spikes in Twitter volume can 55

56 indicate that a significant event has occurred, such as an earnings announcement, acquisition, or new product release. The stock market response to these events may either be positive or negative, depending on the nature of the event. For instance, figure 5-1a shows the daily Twitter volume for the $MSFT cashtag and the daily returns for the Microsoft stock. There are two main spikes in volume during this three-month period. The first spike occurred on April 21, 2016, which was the date of Microsoft s first quarter earnings announcement. Microsoft missed price targets by 2 cents per share, causing shares to fall by up to 5 percent in after hours trading [21]. The second spike occurred on June 13, 2016, when Microsoft announced its planned acquisition of LinkedIn that morning [44]. While LinkedIn s share price increased by 47 percent, Microsoft s stock price fell by 3.2 percent and remained relatively flat afterwards. Experts suggest that this negative response could have results from Microsoft s poor track record with prior large acquisitions, including Skype and Nokia, which were not as successful as analysts had hoped [49]. On the other hand, figure 5-1b displays the daily Twitter volume and returns for Facebook. In contrast to Microsoft, the response to Facebook s first quarter earnings announcement was overwhelmingly positive. Facebook crushed analysts earnings expectations, beating revenue expectations by a whopping 15 cents per share. Consequently, shares rose by 9 percent in the hours following Facebook s earnings announcement on April 27, 2016 [46]. These observations suggest that we can use Twitter sentiment to predict whether a particular event will result in a positive or negative effect on a company s stock price. Figures 5-1c and 5-1d show the daily tweet volumes versus the percentage of tweets expressing a positive sentiment for each day. As we can see in figure 5-1c, the percentage of positive tweets dropped on the day of Microsoft s earnings announcement, while the percentage of positive tweets increased on the day of Facebook s earnings announcement. Thus, it may be possible to construct a trading strategy that takes into account both the number of tweets and the sentiment on a given day to make decisions about whether to buy or sell certain stocks. 56

57 (a) MSFT Tweet Volume vs Returns (b) FB Tweet Volume vs Returns (c) MSFT Tweet Volume vs Sentiment (d) FB Tweet Volume vs Sentiment Figure 5-1: Twitter Volume Plots for Microsoft and Facebook 5.4 Sentiment-Based Trading Strategy Now we propose a simple trading strategy based on Twitter volume and the percentage of tweets expressing joy. For simplicity, we will assume that the price of a stock does not change due to after-hours trading and that there are no additional fees associated with buying or shorting stocks. We use a two-dimensional array to store daily returns for each of the NASDAQ- 100 components in Rivera s dataset. Let R i,t represent the return for stock i at time t. R i,t = p i,t p i,t 1 p i,t 1, where p i,t is the price for stock i on day t. T i,t and J i,t represent the total number of tweets for stock i at time t and the percentage of tweets labeled with the "joy" emotion at time t. C i,t represents the amount of capital for stock i at time t that is either currently invested or in the bank. For each stock i, we keep track of moving averages for the total number of tweets and the percentage of tweets labeled with the "joy" emotion, using a rolling window of five days. This is because the trading week is five days and we only consider the Twitter volume and sentiment on days immediately preceding a trading day, so tweets 57

58 on Fridays and Saturdays are not included. Figure 5-1 also shows that there are fewer tweets tagged with cashtags on weekends since no stocks are traded and no company announcements are made. We initially allocate $1 to invest in each NASDAQ-100 stock. To calculate the amount of capital on day t (C i,t ), we need to consider the percentage of joy tweets and the Twitter volume for day t 1. For each day t 1, if the total number of tweets (T i,t 1 ) for a stock i is at least one standard deviation greater than the previous week s average, this signifies that a noteworthy event may have occurred. Then we look at the percentage of joy tweets for that day. If the percentage of joy tweets (J i,t 1 ) is at least half a standard deviation greater than the previous week s average, the event will probably result in a profit, so we will buy the stock when the market opens on day t and then sell it after the market closes on day t. Thus, we gain a profit equal to the previous day s capital times the daily return for stock i on day t. Likewise, if the percentage of joy tweets is at least half a standard deviation below the average, we will short the stock and repurchase it the next day. If neither of these conditions are satisfied, C i,t will remain unchanged from the previous day. Equation 5.2 shows how the our calculation of the amount capital invested in stock i varies based on our decision for day t. C i,t 1 * (1 + R i,t ) C i,t = C i,t 1 * (1 R i,t ) C i,t Preliminary Results if buying stock if shorting stock otherwise (5.2) Figure 5-2 shows the results of this strategy on Microsoft, Facebook, and Yahoo during this three-month period. The green lines represent the amount of capital using a baseline buy and hold strategy, while the blue lines show the results of our sentiment and volume based trading strategy. As shown in figures 5-2a and 5-2b, this 58

59 strategy performs quite well for Microsoft and Facebook. Even though Microsoft s shares fell after the earnings announcement, our strategy was able to recognize that it should short the stock, leading to an overall profit. However, this strategy does not produce the intended results for Yahoo (figure 5-2c). Yahoo s earnings announcement occurred on April 19, 2016 and the response was more mixed compared to Microsoft and Facebook. Even though Yahoo s Q1 earnings were 11.3 percent lower than they were in first quarter of 2015, Yahoo was still able to beat EPS expectations by $0.01, so its shares rose by 1 percent in after hours trading following the announcement [45]. However, the percentage of tweets expressing joy was still below the average for the previous week, so our strategy would short Yahoo shares instead of buying them. One possible explanation for this inconsistency is that the public generally had negative opinions towards Yahoo as a company, and the earnings announcement drew more attention to Yahoo, prompting even occasional tweeters to express their negative opinions. In addition, AT&T announced its bid for Yahoo on May 25, 2016 causing Yahoo shares to fall by 2.3 percent [17]. Even though the shares fell, the percentage of positive tweets actually increased. Many tweets on this day mentioned both AT&T and Yahoo, so expressions of joy for AT&T may have skewed the results. In addition, since Verizon had also previously made a bid for Yahoo, the increase in competition could also be perceived as good news for Yahoo. Since so many factors can impact stock market movement, it becomes clear that a naive sentiment analysis algorithm alone cannot perform consistently well for more unstable companies. This is another example where focusing on the sentiment of tweets by professional investors who have more knowledge of companies financial situations could potentially result in greater profits. 59

60 (a) $MSFT (b) $FB (c) $YHOO Figure 5-2: Preliminary Trading Strategy Performance for Microsoft, Facebook, and Yahoo 60

61 5.4.2 Reevaluation of Emotion Classifier Performance We then obtained TAQ millisecond trade data in the hours following the earnings announcements and calculated hourly emotion averages to examine whether the daily emotion percentages could have been skewed by tweets from earlier in the day. Figure 5-3a plots Yahoo s price changes against the percentage of tweets expressing joy out of all non-neutral tweets in each hour during the day of the earnings announcement. This figure shows that despite the positive earnings announcement, the sentiment towards Yahoo still decreased slightly immediately after the announcement. We then discovered that our emotion classifier is not as accurate in the context of earnings announcement tweets. Table 5.4 shows some examples of tweets immediately after the earnings announcement on April 21. The first four tweets all express disappointment over Microsoft s failure to meet targets, but they are classified as different Ekman emotions with negative connotations. In this case, whether the tweet has a positive or negative sentiment seems to matter more than the specific emotion that was identified. Therefore, using finer-grained emotion classifier may not have an advantage over a polarity-based categorization for earnings announcement tweets because we are grouping all of the negative emotions together in our analysis. The remaining three tweets also express disappointment, but were again classified as neutral by Pattern, possibly due to the neutral tone and lack of obviously positive or negative words. 61

62 Table 5.4: Microsoft Earnings Announcement Classification Examples Tweet Emotion Polarity Just when the coast was clear. Earnings disaster. Haters anger 0.1 taking over. Momentum hit. Yowsa. $msft $v $sbux $goog $spx Microsoft had a lousy quarter, partly because of factors beyond its control $MSFT fear -0.5 Microsoft stock belly-flops on earnings miss and sadness weak guidance -now off more than 5% $MSFT More than one third in cash now. The after-hours performance of $GOOG, $MSFT, $V, & $SBUX: indicative of a market ready to roll over? fear Microsoft profit misses estimates, shares none 0.0 fall $MSFT MICROSOFT MISSES. It just cratered none 0.0 4% after earnings: $MSFT $MSFT $GOOGL Not only did they miss expectations, none 0.0 they missed soft/manip ones by analysts. 3 consecutive Q s of falling earnings. Ouch! Similarly, table 5.5 shows several misclassified tweets about Yahoo in the hour after the earnings announcement on April 19, 2016, classifying all of them as neutral even though the first four tweets are positive, while the last two are negative. Many of these tweets just state facts and use abbreviations which are not recognized as words, so traditional sentiment analyzers would classify them as neutral. From looking at these tweets about Microsoft and Yahoo, we can see that many tweets expressing 62

63 disappointment share common words, including forms of the word "miss", and "fall". The positive tweets about Yahoo also shared many common words such as "up", and "beats". Table 5.5: Yahoo Earnings Announcement Classification Errors Tweet Emotion Polarity $YHOO delivered $390M in Mavens GAAP revenue in Q1, none 0.0 up And why not, $YHOO looks none 0.0 like a heck of a buy. Non-GAAP of course. Yahoo $YHOO Q EPS $0.08 beats by $0.01, Rev of $1.09B -11.4% Y/Y #investors #Yahoo none 0.0 $YHOO Posts a Loss as Revenue Falls none $YHOO 1Q loss of $99.2M, after reporting a profit in same period last year - #CEO #Crisis #Tech none 0.0 These examples show that earnings announcement tweets use a very specific language and we can identify the sentiment of a tweet just by checking for the presence of several keywords. Companies that exceeded expectations usually include words such as "beat", "up", "buy", and "gain", while companies that missed expectations will include words such as "miss", "negative", "down", and "loss". We will now investigate a simple classification scheme that determines the polarity of these tweets by checking for the presence or absence of positive or negative terms. We first stemmed the text of each tweet and classified a tweet as positive if the processed text contained any positive words, negative if the text contained any negative words, and neutral otherwise. Figure 5-3b graphs the percentage of tweets with a positive sentiment in one-hour intervals when there are at least 10 tweets con- 63

64 taining words specific to earnings announcements during the hour. Figure 5-4 shows the percentage of positive tweets for Microsoft. We can see that least 80 percent of Yahoo s tweets mentioning earnings announcement related terms were positive, while less than 50 percent were positive for Microsoft. These results suggest that even a simple keyword-based trading strategy may be effective in the context of earnings announcements. (a) Yahoo sentiment during earnings announcement on April 19 (b) Yahoo sentiment using keywords during earnings announcement on April 19 64

Can Twitter predict the stock market?

Can Twitter predict the stock market? 1 Introduction Can Twitter predict the stock market? Volodymyr Kuleshov December 16, 2011 Last year, in a famous paper, Bollen et al. (2010) made the claim that Twitter mood is correlated with the Dow

More information

Stock Prediction Using Twitter Sentiment Analysis

Stock Prediction Using Twitter Sentiment Analysis Problem Statement Stock Prediction Using Twitter Sentiment Analysis Stock exchange is a subject that is highly affected by economic, social, and political factors. There are several factors e.g. external

More information

Predicting stock prices for large-cap technology companies

Predicting stock prices for large-cap technology companies Predicting stock prices for large-cap technology companies 15 th December 2017 Ang Li (al171@stanford.edu) Abstract The goal of the project is to predict price changes in the future for a given stock.

More information

Prediction Algorithm using Lexicons and Heuristics based Sentiment Analysis

Prediction Algorithm using Lexicons and Heuristics based Sentiment Analysis IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 16-20 www.iosrjournals.org Prediction Algorithm using Lexicons and Heuristics based Sentiment Analysis Aakash Kamble

More information

Do Media Sentiments Reflect Economic Indices?

Do Media Sentiments Reflect Economic Indices? Do Media Sentiments Reflect Economic Indices? Munich, September, 1, 2010 Paul Hofmarcher, Kurt Hornik, Stefan Theußl WU Wien Hofmarcher/Hornik/Theußl Sentiment Analysis 1/15 I I II Text Mining Sentiment

More information

Novel Approaches to Sentiment Analysis for Stock Prediction

Novel Approaches to Sentiment Analysis for Stock Prediction Novel Approaches to Sentiment Analysis for Stock Prediction Chris Wang, Yilun Xu, Qingyang Wang Stanford University chrwang, ylxu, iriswang @ stanford.edu Abstract Stock market predictions lend themselves

More information

Sentiment Extraction from Stock Message Boards The Das and

Sentiment Extraction from Stock Message Boards The Das and Sentiment Extraction from Stock Message Boards The Das and Chen Paper University of Washington Linguistics 575 Tuesday 6 th May, 2014 Paper General Factoids Das is an ex-wall Streeter and a finance Ph.D.

More information

Relative and absolute equity performance prediction via supervised learning

Relative and absolute equity performance prediction via supervised learning Relative and absolute equity performance prediction via supervised learning Alex Alifimoff aalifimoff@stanford.edu Axel Sly axelsly@stanford.edu Introduction Investment managers and traders utilize two

More information

Journal of Insurance and Financial Management, Vol. 1, Issue 4 (2016)

Journal of Insurance and Financial Management, Vol. 1, Issue 4 (2016) Journal of Insurance and Financial Management, Vol. 1, Issue 4 (2016) 68-131 An Investigation of the Structural Characteristics of the Indian IT Sector and the Capital Goods Sector An Application of the

More information

Binary Options Trading Strategies How to Become a Successful Trader?

Binary Options Trading Strategies How to Become a Successful Trader? Binary Options Trading Strategies or How to Become a Successful Trader? Brought to You by: 1. Successful Binary Options Trading Strategy Successful binary options traders approach the market with three

More information

Module 6 Portfolio risk and return

Module 6 Portfolio risk and return Module 6 Portfolio risk and return Prepared by Pamela Peterson Drake, Ph.D., CFA 1. Overview Security analysts and portfolio managers are concerned about an investment s return, its risk, and whether it

More information

Classifying Press Releases and Company Relationships Based on Stock Performance

Classifying Press Releases and Company Relationships Based on Stock Performance Classifying Press Releases and Company Relationships Based on Stock Performance Mike Mintz Stanford University mintz@stanford.edu Ruka Sakurai Stanford University ruka.sakurai@gmail.com Nick Briggs Stanford

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at  ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 441 449 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Prediction Models

More information

CS 475 Machine Learning: Final Project Dual-Form SVM for Predicting Loan Defaults

CS 475 Machine Learning: Final Project Dual-Form SVM for Predicting Loan Defaults CS 475 Machine Learning: Final Project Dual-Form SVM for Predicting Loan Defaults Kevin Rowland Johns Hopkins University 3400 N. Charles St. Baltimore, MD 21218, USA krowlan3@jhu.edu Edward Schembor Johns

More information

Predicting the GOP Primaries An examination of crowd wisdom

Predicting the GOP Primaries An examination of crowd wisdom Predicting the GOP Primaries An examination of crowd wisdom You Will Learn: 1. Can a cross-sectional prediction market produce the same, real-time predictions as a longitudinal prediction market? 2. Will

More information

Analyzing Representational Schemes of Financial News Articles

Analyzing Representational Schemes of Financial News Articles Analyzing Representational Schemes of Financial News Articles Robert P. Schumaker Information Systems Dept. Iona College, New Rochelle, New York 10801, USA rschumaker@iona.edu Word Count: 2460 Abstract

More information

SOUTH CENTRAL SAS USER GROUP CONFERENCE 2018 PAPER. Predicting the Federal Reserve s Funds Rate Decisions

SOUTH CENTRAL SAS USER GROUP CONFERENCE 2018 PAPER. Predicting the Federal Reserve s Funds Rate Decisions SOUTH CENTRAL SAS USER GROUP CONFERENCE 2018 PAPER Predicting the Federal Reserve s Funds Rate Decisions Nhan Nguyen, Graduate Student, MS in Quantitative Financial Economics Oklahoma State University,

More information

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies NEW THINKING Machine Learning in Risk Forecasting and its Application in Strategies By Yuriy Bodjov Artificial intelligence and machine learning are two terms that have gained increased popularity within

More information

The Influence of News Articles on The Stock Market.

The Influence of News Articles on The Stock Market. The Influence of News Articles on The Stock Market. COMP4560 Presentation Supervisor: Dr Timothy Graham U6015364 Zhiheng Zhou Australian National University At Ian Ross Design Studio On 2018-5-18 Motivation

More information

Prediction of Stock Price Movements Using Options Data

Prediction of Stock Price Movements Using Options Data Prediction of Stock Price Movements Using Options Data Charmaine Chia cchia@stanford.edu Abstract This study investigates the relationship between time series data of a daily stock returns and features

More information

The importance of the economy in US presidential

The importance of the economy in US presidential SYMPOSIUM The Objective and Subjective Economy and the Presidential Vote Robert S. Erikson, Columbia University Christopher Wlezien, Temple University The importance of the economy in US presidential elections

More information

INTELIGENCIA ARTIFICIAL. Machine Learning-Based Analysis of the Association between Online Texts and Stock Price Movements

INTELIGENCIA ARTIFICIAL. Machine Learning-Based Analysis of the Association between Online Texts and Stock Price Movements Inteligencia Artificial 21(61), 95-110 doi: 10.4114/intartif.vol21iss61pp95-110 INTELIGENCIA ARTIFICIAL http://journal.iberamia.org/ Machine Learning-Based Analysis of the Association between Online Texts

More information

Company Stock Price Reactions to the 2016 Election Shock: Trump, Taxes, and Trade INTERNET APPENDIX. August 11, 2017

Company Stock Price Reactions to the 2016 Election Shock: Trump, Taxes, and Trade INTERNET APPENDIX. August 11, 2017 Company Stock Price Reactions to the 2016 Election Shock: Trump, Taxes, and Trade INTERNET APPENDIX August 11, 2017 A. News coverage and major events Section 5 of the paper examines the speed of pricing

More information

Election Playbook. October 27, 2016 by Burt White of LPL Financial

Election Playbook. October 27, 2016 by Burt White of LPL Financial Election Playbook October 27, 2016 by Burt White of LPL Financial KEY TAKEAWAYS In our election playbook, we discuss some investments that could possibly receive an election boost. Some areas that may

More information

Topic-based vector space modeling of Twitter data with application in predictive analytics

Topic-based vector space modeling of Twitter data with application in predictive analytics Topic-based vector space modeling of Twitter data with application in predictive analytics Guangnan Zhu (U6023358) Australian National University COMP4560 Individual Project Presentation Supervisor: Dr.

More information

Using Twitter to Analyze Stock Market and Assist Stock and Options Trading

Using Twitter to Analyze Stock Market and Assist Stock and Options Trading University of Connecticut DigitalCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 12-17-2015 Using Twitter to Analyze Stock Market and Assist Stock and Options Trading Yuexin

More information

$tock Forecasting using Machine Learning

$tock Forecasting using Machine Learning $tock Forecasting using Machine Learning Greg Colvin, Garrett Hemann, and Simon Kalouche Abstract We present an implementation of 3 different machine learning algorithms gradient descent, support vector

More information

Consumer Confidence Highest Since Before Great Recession

Consumer Confidence Highest Since Before Great Recession Consumer Confidence Highest Since Before Great Recession December 14, 2016 by Gary Halbert of Halbert Wealth Management 1. Consumer Confidence Soars to Highest Since 2008 2. My Theory on Why Consumer Confidence

More information

Pattern Recognition by Neural Network Ensemble

Pattern Recognition by Neural Network Ensemble IT691 2009 1 Pattern Recognition by Neural Network Ensemble Joseph Cestra, Babu Johnson, Nikolaos Kartalis, Rasul Mehrab, Robb Zucker Pace University Abstract This is an investigation of artificial neural

More information

Background for Case Study Used in Workshop

Background for Case Study Used in Workshop Background for Case Study Used in Workshop Fethi Rabhi School of Computer Science and Engineering University of New South Wales Sydney Australia 1 Preliminaries Purpose of lecture Look at domains involved

More information

A Multi-topic Approach to Building Quant Models. Bringing Semantic Intelligence to Financial Markets

A Multi-topic Approach to Building Quant Models. Bringing Semantic Intelligence to Financial Markets A Multi-topic Approach to Building Quant Models Bringing Semantic Intelligence to Financial Markets Data is growing at an incredible speed Source: IDC - 2014, Structured Data vs. Unstructured Data: The

More information

Text Mining Part 2. Opinion Mining / Sentiment Analysis. Combining Text procession with Machine Learning

Text Mining Part 2. Opinion Mining / Sentiment Analysis. Combining Text procession with Machine Learning Text Mining Part 2 Opinion Mining / Sentiment Analysis Combining Text procession with Machine Learning Data Mining Data Mining is the non-trivial extraction of previously unknown and potentially useful

More information

Text Analytics in Finance

Text Analytics in Finance Text Analytics in Finance Stephen Pulman Dept. of Computer Science, Oxford University stephen.pulman@cs.ox.ac.uk and TheySay Ltd, www.theysay.io @sgpulman SAP Central Bank Executive Summit Text Analytics

More information

Loan Approval and Quality Prediction in the Lending Club Marketplace

Loan Approval and Quality Prediction in the Lending Club Marketplace Loan Approval and Quality Prediction in the Lending Club Marketplace Milestone Write-up Yondon Fu, Shuo Zheng and Matt Marcus Recap Lending Club is a peer-to-peer lending marketplace where individual investors

More information

Do President Trump s Tweets Increase Uncertainty in the US Economy?

Do President Trump s Tweets Increase Uncertainty in the US Economy? University of New Hampshire University of New Hampshire Scholars' Repository Honors Theses and Capstones Student Scholarship Spring 2018 Do President Trump s Tweets Increase Uncertainty in the US Economy?

More information

Lazy Prices: Vector Representations of Financial Disclosures and Market Outperformance

Lazy Prices: Vector Representations of Financial Disclosures and Market Outperformance Lazy Prices: Vector Representations of Financial Disclosures and Market Outperformance Kuspa Kai kuspakai@stanford.edu Victor Cheung hoche@stanford.edu Alex Lin alin719@stanford.edu Abstract The Efficient

More information

Trading Volume and Stock Indices: A Test of Technical Analysis

Trading Volume and Stock Indices: A Test of Technical Analysis American Journal of Economics and Business Administration 2 (3): 287-292, 2010 ISSN 1945-5488 2010 Science Publications Trading and Stock Indices: A Test of Technical Analysis Paul Abbondante College of

More information

Reading the Markets: Forecasting Prediction Markets by News Content Analysis

Reading the Markets: Forecasting Prediction Markets by News Content Analysis Reading the Markets: Forecasting Prediction Markets by News Content Analysis (or, How to Get Rich with Computational Linguistics) Kevin Lerman, Ari Gilder, Mark Dredze, Fernando Pereira UPenn Senior Design

More information

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 12, December -2017 e-issn (O): 2348-4470 p-issn (P): 2348-6406 REVIEW

More information

Improving Long Term Stock Market Prediction with Text Analysis

Improving Long Term Stock Market Prediction with Text Analysis Western University Scholarship@Western Electronic Thesis and Dissertation Repository May 2017 Improving Long Term Stock Market Prediction with Text Analysis Tanner A. Bohn The University of Western Ontario

More information

The Loans_processed.csv file is the dataset we obtained after the pre-processing part where the clean-up python code was used.

The Loans_processed.csv file is the dataset we obtained after the pre-processing part where the clean-up python code was used. Machine Learning Group Homework 3 MSc Business Analytics Team 9 Alexander Romanenko, Artemis Tomadaki, Justin Leiendecker, Zijun Wei, Reza Brianca Widodo The Loans_processed.csv file is the dataset we

More information

Malliaris Training and Forecasting the S&P 500. DECISION SCIENCES INSTITUTE Training and Forecasting the S&P 500 on an Annual Horizon: 2004 to 2015

Malliaris Training and Forecasting the S&P 500. DECISION SCIENCES INSTITUTE Training and Forecasting the S&P 500 on an Annual Horizon: 2004 to 2015 DECISION SCIENCES INSTITUTE Training and Forecasting the S&P 500 on an Annual Horizon: 2004 to 2015 (Full Paper Submission) Mary E. Malliaris Loyola University Chicago mmallia@luc.edu ABSTRACT Forecasting

More information

SURVEY OF MACHINE LEARNING TECHNIQUES FOR STOCK MARKET ANALYSIS

SURVEY OF MACHINE LEARNING TECHNIQUES FOR STOCK MARKET ANALYSIS International Journal of Computer Engineering and Applications, Volume XI, Special Issue, May 17, www.ijcea.com ISSN 2321-3469 SURVEY OF MACHINE LEARNING TECHNIQUES FOR STOCK MARKET ANALYSIS Sumeet Ghegade

More information

An Introduction to Opinion Mining and its Applications. Ana Valdivia Granada, 17/11/2016

An Introduction to Opinion Mining and its Applications. Ana Valdivia Granada, 17/11/2016 Sentiment Analysis An Introduction to Opinion Mining and its Applications Ana Valdivia Granada, 17/11/2016 About me Ana Valdivia Degree in Mathematics (UPC) MSc in Data Science (UGR) Paper about museums:

More information

Predicting the Success of a Retirement Plan Based on Early Performance of Investments

Predicting the Success of a Retirement Plan Based on Early Performance of Investments Predicting the Success of a Retirement Plan Based on Early Performance of Investments CS229 Autumn 2010 Final Project Darrell Cain, AJ Minich Abstract Using historical data on the stock market, it is possible

More information

Credit Card Default Predictive Modeling

Credit Card Default Predictive Modeling Credit Card Default Predictive Modeling Background: Predicting credit card payment default is critical for the successful business model of a credit card company. An accurate predictive model can help

More information

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time Internet Appendix A Additional Results Figure A1: Stock of retail credit cards over time Stock of retail credit cards by month. Time of deletion policy noted with vertical line. Figure A2: Retail credit

More information

Session 3. Life/Health Insurance technical session

Session 3. Life/Health Insurance technical session SOA Big Data Seminar 13 Nov. 2018 Jakarta, Indonesia Session 3 Life/Health Insurance technical session Anilraj Pazhety Life Health Technical Session ANILRAJ PAZHETY MS (BUSINESS ANALYTICS), MBA, BE (CS)

More information

We believe the election outcome will not interfere with your ability to achieve your long-term financial goals.

We believe the election outcome will not interfere with your ability to achieve your long-term financial goals. Dear Client: On Jan. 20, Donald Trump, as you know, will become the 45th president of the United States. This letter provides you our analysis of what the election s outcome means for you. Let me summarize

More information

ALGORITHMIC TRADING STRATEGIES IN PYTHON

ALGORITHMIC TRADING STRATEGIES IN PYTHON 7-Course Bundle In ALGORITHMIC TRADING STRATEGIES IN PYTHON Learn to use 15+ trading strategies including Statistical Arbitrage, Machine Learning, Quantitative techniques, Forex valuation methods, Options

More information

WIN NEW CLIENTS & INCREASE WALLET-SHARE with HiddenLevers Engaging prospects + clients with portfolio stress testing

WIN NEW CLIENTS & INCREASE WALLET-SHARE with HiddenLevers Engaging prospects + clients with portfolio stress testing WIN NEW CLIENTS & INCREASE WALLET-SHARE with HiddenLevers Engaging prospects + clients with portfolio stress testing TABLE OF CONTENTS INTRO: How it works 3 ONE: Introduce and position risk at the first

More information

Examining Long-Term Trends in Company Fundamentals Data

Examining Long-Term Trends in Company Fundamentals Data Examining Long-Term Trends in Company Fundamentals Data Michael Dickens 2015-11-12 Introduction The equities market is generally considered to be efficient, but there are a few indicators that are known

More information

THE 2016 ELECTION: CLINTON VS. TRUMP VOTERS ON AMERICAN HEALTH CARE

THE 2016 ELECTION: CLINTON VS. TRUMP VOTERS ON AMERICAN HEALTH CARE THE 2016 ELECTION: CLINTON VS. TRUMP VOTERS ON AMERICAN HEALTH CARE October 2016 0 INTRODUCTION On nearly every question about health care and health policy issues in our poll, conducted September 14-21,

More information

Predicting Market Fluctuations via Machine Learning

Predicting Market Fluctuations via Machine Learning Predicting Market Fluctuations via Machine Learning Michael Lim,Yong Su December 9, 2010 Abstract Much work has been done in stock market prediction. In this project we predict a 1% swing (either direction)

More information

Right direction 33% 34% Wrong track 57% 56% Neither 3% 2% Don t know / Refused 7% 7%

Right direction 33% 34% Wrong track 57% 56% Neither 3% 2% Don t know / Refused 7% 7% Heartland Monitor Poll XIII ALLSTATE/NATIONAL JOURNAL HEARTLAND MONITOR POLL XIII National Sample of 1000 ADULTS AGE 18+ (Margin of Error = +/-3.1% in 95 out of 100 cases) Conducted May 19-23, 2012 via

More information

Corporate Finance, Module 21: Option Valuation. Practice Problems. (The attached PDF file has better formatting.) Updated: July 7, 2005

Corporate Finance, Module 21: Option Valuation. Practice Problems. (The attached PDF file has better formatting.) Updated: July 7, 2005 Corporate Finance, Module 21: Option Valuation Practice Problems (The attached PDF file has better formatting.) Updated: July 7, 2005 {This posting has more information than is needed for the corporate

More information

UNDERSTANDING ML/DL MODELS USING INTERACTIVE VISUALIZATION TECHNIQUES

UNDERSTANDING ML/DL MODELS USING INTERACTIVE VISUALIZATION TECHNIQUES UNDERSTANDING ML/DL MODELS USING INTERACTIVE VISUALIZATION TECHNIQUES Chakri Cherukuri Senior Researcher Quantitative Financial Research Group 1 OUTLINE Introduction Applied machine learning in finance

More information

Feedforward Neural Networks for Sentiment Detection in Financial News

Feedforward Neural Networks for Sentiment Detection in Financial News World Journal of Social Sciences Vol. 2. No. 4. July 2012. Pp. 218 234 Feedforward Neural Networks for Sentiment Detection in Financial News Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading

More information

Investment Decisions and Negative Interest Rates

Investment Decisions and Negative Interest Rates Investment Decisions and Negative Interest Rates No. 16-23 Anat Bracha Abstract: While the current European Central Bank deposit rate and 2-year German government bond yields are negative, the U.S. 2-year

More information

DFAST Modeling and Solution

DFAST Modeling and Solution Regulatory Environment Summary Fallout from the 2008-2009 financial crisis included the emergence of a new regulatory landscape intended to safeguard the U.S. banking system from a systemic collapse. In

More information

User Guide for Schwab Equity Ratings Report

User Guide for Schwab Equity Ratings Report User Guide for Schwab Equity Ratings Report The Schwab Equity Ratings Report will help you make informed decisions on equities by providing you with important additional information and analysis. Each

More information

Abstract Making good predictions for stock prices is an important task for the financial industry. The way these predictions are carried out is often

Abstract Making good predictions for stock prices is an important task for the financial industry. The way these predictions are carried out is often Abstract Making good predictions for stock prices is an important task for the financial industry. The way these predictions are carried out is often by using artificial intelligence that can learn from

More information

EMBARGOED UNTIL 12:01 A.M., WEDNESDAY, OCTOBER 3, 2012

EMBARGOED UNTIL 12:01 A.M., WEDNESDAY, OCTOBER 3, 2012 Eagleton Institute of Politics Rutgers, The State University of New Jersey 191 Ryders Lane New Brunswick, New Jersey 08901-8557 www.eagleton.rutgers.edu eagleton@rci.rutgers.edu 732-932-9384 Fax: 732-932-6778

More information

Measuring the Impact of Financial News and Social Media on Stock Market Modeling Using Time Series Mining Techniques

Measuring the Impact of Financial News and Social Media on Stock Market Modeling Using Time Series Mining Techniques algorithms Article Measuring the Impact of Financial News and Social Media on Stock Market Modeling Using Time Series Mining Techniques Foteini Kollintza-Kyriakoulia 1, Manolis Maragoudakis 1, * and Anastasia

More information

CHAPTER 2 Describing Data: Numerical

CHAPTER 2 Describing Data: Numerical CHAPTER Multiple-Choice Questions 1. A scatter plot can illustrate all of the following except: A) the median of each of the two variables B) the range of each of the two variables C) an indication of

More information

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION XLSTAT makes accessible to anyone a powerful, complete and user-friendly data analysis and statistical solution. Accessibility to

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

CHAPTER 13 EFFICIENT CAPITAL MARKETS AND BEHAVIORAL CHALLENGES

CHAPTER 13 EFFICIENT CAPITAL MARKETS AND BEHAVIORAL CHALLENGES CHAPTER 13 EFFICIENT CAPITAL MARKETS AND BEHAVIORAL CHALLENGES Answers to Concept Questions 1. To create value, firms should accept financing proposals with positive net present values. Firms can create

More information

Forecasting Agricultural Commodity Prices through Supervised Learning

Forecasting Agricultural Commodity Prices through Supervised Learning Forecasting Agricultural Commodity Prices through Supervised Learning Fan Wang, Stanford University, wang40@stanford.edu ABSTRACT In this project, we explore the application of supervised learning techniques

More information

Loan Approval and Quality Prediction in the Lending Club Marketplace

Loan Approval and Quality Prediction in the Lending Club Marketplace Loan Approval and Quality Prediction in the Lending Club Marketplace Final Write-up Yondon Fu, Matt Marcus and Shuo Zheng Introduction Lending Club is a peer-to-peer lending marketplace where individual

More information

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2017

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2017 ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2017 These notes have been used and commented on before. If you can still spot any errors or have any suggestions for improvement, please

More information

PURPLEPOLL. December 2011 Edition WHY THE PURPLEPOLL?

PURPLEPOLL. December 2011 Edition WHY THE PURPLEPOLL? December Edition PURPLEPOLL WHY THE PURPLEPOLL? In, a dozen states will decide whether President Obama is elected to a second term. The PurplePoll focuses exclusively on the Purple Electorate - likely

More information

Kansas Policy Survey: Spring 2001 Survey Results Short Version

Kansas Policy Survey: Spring 2001 Survey Results Short Version Survey Results Short Version Prepared by Chad J. Kniss with Donald P. Haider-Markel and Steven Maynard-Moody December 2001 Report 266B Policy Research Institute University of Kansas Steven Maynard-Moody,

More information

Amazon Elastic Compute Cloud

Amazon Elastic Compute Cloud Amazon Elastic Compute Cloud An Introduction to Spot Instances API version 2011-05-01 May 26, 2011 Table of Contents Overview... 1 Tutorial #1: Choosing Your Maximum Price... 2 Core Concepts... 2 Step

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 15: Tree-based Algorithms Cho-Jui Hsieh UC Davis March 7, 2018 Outline Decision Tree Random Forest Gradient Boosted Decision Tree (GBDT) Decision Tree Each node checks

More information

STOCK MARKET FORECASTING USING NEURAL NETWORKS

STOCK MARKET FORECASTING USING NEURAL NETWORKS STOCK MARKET FORECASTING USING NEURAL NETWORKS Lakshmi Annabathuni University of Central Arkansas 400S Donaghey Ave, Apt#7 Conway, AR 72034 (845) 636-3443 lakshmiannabathuni@gmail.com Mark E. McMurtrey,

More information

Evaluating Performance

Evaluating Performance Evaluating Performance Evaluating Performance Choosing investments is just the beginning of your work as an investor. As time goes by, you ll need to monitor the performance of these investments to see

More information

MS&E 448 Final Presentation High Frequency Algorithmic Trading

MS&E 448 Final Presentation High Frequency Algorithmic Trading MS&E 448 Final Presentation High Frequency Algorithmic Trading Francis Choi George Preudhomme Nopphon Siranart Roger Song Daniel Wright Stanford University June 6, 2017 High-Frequency Trading MS&E448 June

More information

arxiv: v1 [cs.cy] 30 Apr 2017

arxiv: v1 [cs.cy] 30 Apr 2017 Tales of Emotion and Stock in China: Volatility, Causality and Prediction Zhenkun Zhou 1, Ke Xu 1 and Jichang Zhao 2, 1 State Key Lab of Software Development Environment, Beihang University 2 School of

More information

How To Prevent Another Financial Crisis On Wall Street

How To Prevent Another Financial Crisis On Wall Street How To Prevent Another Financial Crisis On Wall Street Helin Gao helingao@stanford.edu Qianying Lin qlin1@stanford.edu Kaidi Yan kaidi@stanford.edu Abstract Riskiness of a particular loan can be estimated

More information

Investigating Algorithmic Stock Market Trading using Ensemble Machine Learning Methods

Investigating Algorithmic Stock Market Trading using Ensemble Machine Learning Methods Investigating Algorithmic Stock Market Trading using Ensemble Machine Learning Methods Khaled Sharif University of Jordan * kldsrf@gmail.com Mohammad Abu-Ghazaleh University of Jordan * mohd.ag@live.com

More information

EMPLOYABILITY OF NEURAL NETWORK ALGORITHMS IN PREDICTION OF STOCK MARKET BASED ON SENTIMENT ANALYSIS

EMPLOYABILITY OF NEURAL NETWORK ALGORITHMS IN PREDICTION OF STOCK MARKET BASED ON SENTIMENT ANALYSIS EMPLOYABILITY OF NEURAL NETWORK ALGORITHMS IN PREDICTION OF STOCK MARKET BASED ON SENTIMENT ANALYSIS Pranjal Bajaria Student, Bal Bharti Public School, Dwarka, Delhi ABSTRACT Expansion of verbal technologies

More information

Creating Equity Indices: A Case Exercise

Creating Equity Indices: A Case Exercise Creating Equity Indices: A Case Exercise Judson W. Russell, Ph.D., CFA* Clinical Associate Professor of Finance University of North Carolina Charlotte Department of Finance Charlotte, NC 28223 jrussell@uncc.edu

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL NETWORKS K. Jayanthi, Dr. K. Suresh 1 Department of Computer

More information

EASY MAKE IT. Behavioral finance pioneer Richard Thaler on how the DC industry can continue to nudge participants and even plan sponsors

EASY MAKE IT. Behavioral finance pioneer Richard Thaler on how the DC industry can continue to nudge participants and even plan sponsors Photography credit: France Leclerc MAKE IT Behavioral finance pioneer Richard Thaler on how the DC industry can continue to nudge participants and even plan sponsors EASY toward better behavior 16 The

More information

Backtesting Performance with a Simple Trading Strategy using Market Orders

Backtesting Performance with a Simple Trading Strategy using Market Orders Backtesting Performance with a Simple Trading Strategy using Market Orders Yuanda Chen Dec, 2016 Abstract In this article we show the backtesting result using LOB data for INTC and MSFT traded on NASDAQ

More information

Automated Options Trading Using Machine Learning

Automated Options Trading Using Machine Learning 1 Automated Options Trading Using Machine Learning Peter Anselmo and Karen Hovsepian and Carlos Ulibarri and Michael Kozloski Department of Management, New Mexico Tech, Socorro, NM 87801, U.S.A. We summarize

More information

Marist College Institute for Public Opinion Poughkeepsie, NY Phone Fax

Marist College Institute for Public Opinion Poughkeepsie, NY Phone Fax Marist College Institute for Public Opinion Poughkeepsie, NY 12601 Phone 845.575.5050 Fax 845.575.5111 www.maristpoll.marist.edu POLL MUST BE SOURCED: MSNBC/Telemundo/Marist Poll* Decision 2016: Clinton

More information

Portfolio Recommendation System Stanford University CS 229 Project Report 2015

Portfolio Recommendation System Stanford University CS 229 Project Report 2015 Portfolio Recommendation System Stanford University CS 229 Project Report 205 Berk Eserol Introduction Machine learning is one of the most important bricks that converges machine to human and beyond. Considering

More information

SPOTLIGHT ON GLOBALIZATION AND DEMOGRAPHICS

SPOTLIGHT ON GLOBALIZATION AND DEMOGRAPHICS SPOTLIGHT ON GLOBALIZATION AND DEMOGRAPHICS TWO MEGAFORCES SURE TO STEER THE ECONOMY FOR YEARS TO COME DECEMBER Ask any political junkie, and they ll tell you that the presidential campaign season has

More information

Stock Price Prediction using Deep Learning

Stock Price Prediction using Deep Learning San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2018 Stock Price Prediction using Deep Learning Abhinav Tipirisetty San Jose State University

More information

Many students of the Wyckoff method do not associate Wyckoff analysis with futures trading. A Wyckoff Approach To Futures

Many students of the Wyckoff method do not associate Wyckoff analysis with futures trading. A Wyckoff Approach To Futures A Wyckoff Approach To Futures by Craig F. Schroeder The Wyckoff approach, which has been a standard for decades, is as valid for futures as it is for stocks, but even students of the technique appear to

More information

Exploiting Alternative Data in the Investment Process Bringing Semantic Intelligence to Financial Markets

Exploiting Alternative Data in the Investment Process Bringing Semantic Intelligence to Financial Markets Exploiting Alternative Data in the Investment Process Bringing Semantic Intelligence to Financial Markets Data is growing at an incredible speed Source: IDC - 2014, Structured Data vs. Unstructured Data:

More information

BUZ. Powered by Artificial Intelligence. BUZZ US SENTIMENT LEADERS ETF INVESTMENT PRIMER: DECEMBER 2017 NYSE ARCA

BUZ. Powered by Artificial Intelligence. BUZZ US SENTIMENT LEADERS ETF INVESTMENT PRIMER: DECEMBER 2017 NYSE ARCA BUZZ US SENTIMENT LEADERS ETF INVESTMENT PRIMER: DECEMBER 2017 BUZ NYSE ARCA Powered by Artificial Intelligence. www.alpsfunds.com 855.215.1425 Investors have not previously had a way to capitalize on

More information

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques 6.1 Introduction Trading in stock market is one of the most popular channels of financial investments.

More information

Instruction (Manual) Document

Instruction (Manual) Document Instruction (Manual) Document This part should be filled by author before your submission. 1. Information about Author Your Surname Your First Name Your Country Your Email Address Your ID on our website

More information

White Paper. Demystifying Analytics. Proven Analytical Techniques and Best Practices for Insurers

White Paper. Demystifying Analytics. Proven Analytical Techniques and Best Practices for Insurers White Paper Demystifying Analytics Proven Analytical Techniques and Best Practices for Insurers Contents Introduction... 1 Data Preparation... 1 Data Warehousing and Analytical Data Tables...1 Binning...1

More information

MONEY IN POLITICS JANUARY 2016

MONEY IN POLITICS JANUARY 2016 JANUARY 2016 JANUARY 2016 PAGE 2 TABLE OF CONTENTS I. INTRODUCTION... 3 METHODOLOGY... 4 II. EXECUTIVE SUMMARY... 5 III. SUMMARY OF RESULTS... 8 IV. DATA TABLES... 27 V. DEMOGRAPHICS... 50 VI. QUESTIONNAIRE...

More information

SUMMARY OF BORROWER SURVEY DATA

SUMMARY OF BORROWER SURVEY DATA SUMMARY OF BORROWER SURVEY DATA STUDENT LOAN BORROWER COUNSELING PROGRAM An Initiative of the Center for Excellence in Financial Counseling Introduction This summary provides results from the pilot test

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL

More information