UNDERSTANDING ML/DL MODELS USING INTERACTIVE VISUALIZATION TECHNIQUES Chakri Cherukuri Senior Researcher Quantitative Financial Research Group 1
OUTLINE Introduction Applied machine learning in finance Case studies Twitter Sentiment Analysis Learning option prices using deep learning tools Yield Curve dimensionality reduction (PCA vs Autoencoder) Conclusion 2
APPLIED MACHINE LEARNING IN FINANCE 3
STRUCTURED DATA SETS Task Features Labels Machine Learning Technique Time series prediction Past returns, market conditions Future returns LSTM Illiquid asset pricing Asset characteristics Market price Boosted Trees/Random Forests Trading Strategies Market conditions Strategy to invest in Boosted Trees/Random Forests Dimensionality Reduction Yield Curve Yield curve PCA/Autoencoder Exotic option pricing Deal/market parameters Price Neural nets 4
UNSTRUCTURED DATA SETS Task Object detection from satellite images Abstractive summarization of news articles for quick consumption News/twitter sentiment for stocks, commodities etc. Entity embeddings for companies, news, documents Deep Learning Model Conv nets RNN, attention based models NLP models (Word embeddings + Nets) LSTM/RNN 5
TWITTER SENTIMENT ANALYSIS 6
NEWS/TWITTER SENTIMENT News & social sentiment from the raw news story or tweet Unstructured Highly time-sensitive Story-level sentiment Company-level sentiment Sentiment score can be used as a trading signal 7
RUSSELL 2000 STOCKS 8
TWITTER SENTIMENT CLASSIFICATION Problem statement: Predict the sentiment (negative, neutral, positive) of a tweet for a company Ex: $CTIC Rated strong buy by three WS analysts. Increased target rom $5 to $8. : Positive Three way classification problem Input: raw tweets Output: sentiment label {negative, neutral, positive} 9
METHODOLOGY We are given labeled train and test data sets Train classifier on training data set Predict labels on test data and evaluate performance 10
ONE-VS-REST LOGISTIC REGRESSION Train three binary classifiers for each label Model 1: Negative vs. Not Negative Model 2: Neutral vs. Not Neutral Model 3: Positive vs. Not Positive Get probabilities (measures of confidence) for each label Output the label associated with the highest probability 11
CLASSIFIER PERFORMANCE ANALYSIS Look at misclassifications Confusion matrix Understand model predicted probabilities Triangle visualization Fix data issues 12
TRIANGLE VISUALIZATION Model returns 3 probabilities (which sum to 1) How can we visualize these 3 numbers? Points inside an equilateral triangle Negative / Neutral Not sure Very positive 13
PERFORMANCE ANALYSIS DASHBOARD Use the dashboard to: Analyze misclassifications (using confusion matrix) Improve model by adding more features (by looking at model coefficients) Fix data issues (using triangle and lasso) 14
ANALYZE MISCLASSIFICATIONS 15
ANALYZE MISCLASSIFICATIONS 16
ANALYZE MISCLASSIFICATIONS 17
USE LASSO TO FIND DATA ISSUES 18
USE LASSO TO FIND DATA ISSUES 19
DEEP LEARNING TOOLS 20
NEURAL NETWORK WIZARD Graphical tool to build, train and diagnose deep learning models Real time plots during the training process: Loss/Accuracy curves Distributions of weights/biases/activations at each layer Diagnostic plots: Analysis of residuals (for regression) / Confusion matrix (for classification) Partial dependencies Conditional residual plots/histograms 21
NETWORK PARAMETERS 22
NETWORK ARCHITECTURE 23
LOSS AND ACCURACY CURVES 24
DISTRIBUTIONS OF WEIGHTS/BIASES/ACTIVATIONS 25
PARTIAL AND CONDITIONAL DEPENDENCIES Training dataset Conditioned on S=70 Conditioned on S=80 Conditioned on S=120 S T sigma moneyness 103.1720 1.0001 0.2970 1.1086 106.8025 1.9337 0.0059 1.5291 73.6899 1.0049 0.3483 1.2806 96.9050 0.4798 0.2530 1.6489 129.9036 1.6109 0.0286 0.3932 70.6674 0.6879 0.5089 1.6949 126.6076 1.1710 0.4051 1.3195 95.6398 0.0855 0.4133 0.4837 114.1751 1.4486 0.2888 0.3599 79.0308 0.5609 0.4857 1.1420 127.8912 0.3830 0.4798 0.3025 S T sigma moneyness 70 1.0001 0.2970 1.1086 70 1.9337 0.0059 1.5291 70 1.0049 0.3483 1.2806 70 0.4798 0.2530 1.6489 70 1.6109 0.0286 0.3932 70 70 0.6879 0.5089 1.6949 70 1.1710 0.4051 1.3195 70 0.0855 0.4133 0.4837 70 1.4486 0.2888 0.3599 70 0.5609 0.4857 1.1420 70 0.3830 0.4798 0.3025 S T Sigma moneyness 80 1.0001 0.2970 1.1086 80 1.9337 0.0059 1.5291 80 1.0049 0.3483 1.2806 80 0.4798 0.2530 1.6489 80 1.6109 0.0286 0.3932 80 80 0.6879 0.5089 1.6949 80 1.1710 0.4051 1.3195 80 0.0855 0.4133 0.4837 80 1.4486 0.2888 0.3599 80 0.5609 0.4857 1.1420 80 0.3830 0.4798 0.3025 S T sigma moneyness 120 1.0001 0.2970 1.1086 120 1.9337 0.0059 1.5291 120 1.0049 0.3483 1.2806 120 0.4798 0.2530 1.6489 120 1.6109 0.0286 0.3932 120 120 0.6879 0.5089 1.6949 120 1.1710 0.4051 1.3195 120 0.0855 0.4133 0.4837 120 1.4486 0.2888 0.3599 120 0.5609 0.4857 1.1420 120 0.3830 0.4798 0.3025 Training dataset Conditioned on T=1 Conditioned on T=.5 Conditioned on T=2 S T sigma moneyness 103.1720 1.0001 0.2970 1.1086 106.8025 1.9337 0.0059 1.5291 73.6899 1.0049 0.3483 1.2806 96.9050 0.4798 0.2530 1.6489 129.9036 1.6109 0.0286 0.3932 70.6674 0.6879 0.5089 1.6949 126.6076 1.1710 0.4051 1.3195 95.6398 0.0855 0.4133 0.4837 114.1751 1.4486 0.2888 0.3599 79.0308 0.5609 0.4857 1.1420 127.8912 0.3830 0.4798 0.3025 S T sigma moneyness 103.1720 1 0.2970 1.1086 106.8025 1 0.0059 1.5291 73.6899 1 0.3483 1.2806 96.9050 1 0.2530 1.6489 129.9036 1 0.0286 0.3932 1 70.6674 1 0.5089 1.6949 126.6076 1 0.4051 1.3195 95.6398 1 0.4133 0.4837 114.1751 1 0.2888 0.3599 79.0308 1 0.4857 1.1420 127.8912 1 0.4798 0.3025 S T Sigma moneyness 103.1720.5 0.2970 1.1086 106.8025.5 0.0059 1.5291 73.6899.5 0.3483 1.2806 96.9050.5 0.2530 1.6489 129.9036.5 0.0286 0.3932.5 70.6674.5 0.5089 1.6949 126.6076.5 0.4051 1.3195 95.6398.5 0.4133 0.4837 114.1751.5 0.2888 0.3599 79.0308.5 0.4857 1.1420 127.8912.5 0.4798 0.3025 S T sigma moneyness 103.1720 2 0.2970 1.1086 106.8025 2 0.0059 1.5291 73.6899 2 0.3483 1.2806 96.9050 2 0.2530 1.6489 129.9036 2 0.0286 0.3932 2 70.6674 2 0.5089 1.6949 126.6076 2 0.4051 1.3195 95.6398 2 0.4133 0.4837 114.1751 2 0.2888 0.3599 79.0308 2 0.4857 1.1420 127.8912 2 0.4798 0.3025 26
DIAGNOSTIC PLOTS 27
YIELD CURVE DIMENSIONALITY REDUCTION 28
YIELD CURVE PRIMER Bonds have a fixed maturity (1M, 3M, 10Y) and pay coupons Examples of bonds treasury bonds, corporates, muni etc. Yield Curve: Plot of bond yields against maturities Adjacent points on the yield curve move together (correlated) 29
U.S. TREASURY YIELD CURVE 11 tenors/maturities Typically upward sloping Different shapes Pre-crisis Post-crisis Current 30
YIELD CURVE DYNAMICS Yield for each tenor (point on the yield curve) changes every day Problem: How to model the changes in the yield curve driven by 11 correlated variables? Any parsimonious representation possible? 31
PRINCIPAL COMPONENT ANALYSIS (PCA) PCA can be used to: Reduce dimensionality Retain as much variance in the dataset as possible Typically first few (3-5) PCA factors enough to explain almost all the variance 32
PCA OVER DIFFERENT TIME PERIODS PCA factors vary with time periods Interval Selector Quickly select different time intervals Perform stats on the selected time slices (using callbacks) 33
YIELD CURVE PCA: CRISIS 34
YIELD CURVE PCA: AFTER CRISIS 35
YIELD CURVE PCA: CURRENT 36
YIELD CURVE PCA: CURRENT 37
DIMENSION REDUCTION: AUTOENCODERS tanh relu linear Compressed feature vector 38
PCA VS AUTOENCODER PCA Autoencoder 39
DIMENSION REDUCTION: AE VS PCA 40
CONCLUSION Abundance of financial data Abundance of already existing models/techniques ML/DL techniques provide new ways of modeling financial data Interactive visualization tools help us better understand and interpret these models 41
RESOURCES Widget libraries used to build the applications: ipywidgets: https://github.com/jupyter-widgets/ipywidgets bqplot: https://github.com/bloomberg/bqplot (and other custom widgets) ML/DL libraries scikit-learn: http://scikit-learn.org tensorflow: https://www.tensorflow.org keras: https://keras.io Tech at Bloomberg: www.techatbloomberg.com 42