UNDERSTANDING ML/DL MODELS USING INTERACTIVE VISUALIZATION TECHNIQUES

UNDERSTANDING ML/DL MODELS USING INTERACTIVE VISUALIZATION TECHNIQUES Chakri Cherukuri Senior Researcher Quantitative Financial Research Group 1

OUTLINE Introduction Applied machine learning in finance Case studies Twitter Sentiment Analysis Learning option prices using deep learning tools Yield Curve dimensionality reduction (PCA vs Autoencoder) Conclusion 2

APPLIED MACHINE LEARNING IN FINANCE 3

STRUCTURED DATA SETS Task Features Labels Machine Learning Technique Time series prediction Past returns, market conditions Future returns LSTM Illiquid asset pricing Asset characteristics Market price Boosted Trees/Random Forests Trading Strategies Market conditions Strategy to invest in Boosted Trees/Random Forests Dimensionality Reduction Yield Curve Yield curve PCA/Autoencoder Exotic option pricing Deal/market parameters Price Neural nets 4

UNSTRUCTURED DATA SETS Task Object detection from satellite images Abstractive summarization of news articles for quick consumption News/twitter sentiment for stocks, commodities etc. Entity embeddings for companies, news, documents Deep Learning Model Conv nets RNN, attention based models NLP models (Word embeddings + Nets) LSTM/RNN 5

TWITTER SENTIMENT ANALYSIS 6

NEWS/TWITTER SENTIMENT News & social sentiment from the raw news story or tweet Unstructured Highly time-sensitive Story-level sentiment Company-level sentiment Sentiment score can be used as a trading signal 7

RUSSELL 2000 STOCKS 8

TWITTER SENTIMENT CLASSIFICATION Problem statement: Predict the sentiment (negative, neutral, positive) of a tweet for a company Ex: $CTIC Rated strong buy by three WS analysts. Increased target rom $5 to $8. : Positive Three way classification problem Input: raw tweets Output: sentiment label {negative, neutral, positive} 9

METHODOLOGY We are given labeled train and test data sets Train classifier on training data set Predict labels on test data and evaluate performance 10

ONE-VS-REST LOGISTIC REGRESSION Train three binary classifiers for each label Model 1: Negative vs. Not Negative Model 2: Neutral vs. Not Neutral Model 3: Positive vs. Not Positive Get probabilities (measures of confidence) for each label Output the label associated with the highest probability 11

CLASSIFIER PERFORMANCE ANALYSIS Look at misclassifications Confusion matrix Understand model predicted probabilities Triangle visualization Fix data issues 12

TRIANGLE VISUALIZATION Model returns 3 probabilities (which sum to 1) How can we visualize these 3 numbers? Points inside an equilateral triangle Negative / Neutral Not sure Very positive 13

PERFORMANCE ANALYSIS DASHBOARD Use the dashboard to: Analyze misclassifications (using confusion matrix) Improve model by adding more features (by looking at model coefficients) Fix data issues (using triangle and lasso) 14

ANALYZE MISCLASSIFICATIONS 15

ANALYZE MISCLASSIFICATIONS 16

ANALYZE MISCLASSIFICATIONS 17

USE LASSO TO FIND DATA ISSUES 18

USE LASSO TO FIND DATA ISSUES 19

DEEP LEARNING TOOLS 20

NEURAL NETWORK WIZARD Graphical tool to build, train and diagnose deep learning models Real time plots during the training process: Loss/Accuracy curves Distributions of weights/biases/activations at each layer Diagnostic plots: Analysis of residuals (for regression) / Confusion matrix (for classification) Partial dependencies Conditional residual plots/histograms 21

NETWORK PARAMETERS 22

NETWORK ARCHITECTURE 23

LOSS AND ACCURACY CURVES 24

DISTRIBUTIONS OF WEIGHTS/BIASES/ACTIVATIONS 25

PARTIAL AND CONDITIONAL DEPENDENCIES Training dataset Conditioned on S=70 Conditioned on S=80 Conditioned on S=120 S T sigma moneyness 103.1720 1.0001 0.2970 1.1086 106.8025 1.9337 0.0059 1.5291 73.6899 1.0049 0.3483 1.2806 96.9050 0.4798 0.2530 1.6489 129.9036 1.6109 0.0286 0.3932 70.6674 0.6879 0.5089 1.6949 126.6076 1.1710 0.4051 1.3195 95.6398 0.0855 0.4133 0.4837 114.1751 1.4486 0.2888 0.3599 79.0308 0.5609 0.4857 1.1420 127.8912 0.3830 0.4798 0.3025 S T sigma moneyness 70 1.0001 0.2970 1.1086 70 1.9337 0.0059 1.5291 70 1.0049 0.3483 1.2806 70 0.4798 0.2530 1.6489 70 1.6109 0.0286 0.3932 70 70 0.6879 0.5089 1.6949 70 1.1710 0.4051 1.3195 70 0.0855 0.4133 0.4837 70 1.4486 0.2888 0.3599 70 0.5609 0.4857 1.1420 70 0.3830 0.4798 0.3025 S T Sigma moneyness 80 1.0001 0.2970 1.1086 80 1.9337 0.0059 1.5291 80 1.0049 0.3483 1.2806 80 0.4798 0.2530 1.6489 80 1.6109 0.0286 0.3932 80 80 0.6879 0.5089 1.6949 80 1.1710 0.4051 1.3195 80 0.0855 0.4133 0.4837 80 1.4486 0.2888 0.3599 80 0.5609 0.4857 1.1420 80 0.3830 0.4798 0.3025 S T sigma moneyness 120 1.0001 0.2970 1.1086 120 1.9337 0.0059 1.5291 120 1.0049 0.3483 1.2806 120 0.4798 0.2530 1.6489 120 1.6109 0.0286 0.3932 120 120 0.6879 0.5089 1.6949 120 1.1710 0.4051 1.3195 120 0.0855 0.4133 0.4837 120 1.4486 0.2888 0.3599 120 0.5609 0.4857 1.1420 120 0.3830 0.4798 0.3025 Training dataset Conditioned on T=1 Conditioned on T=.5 Conditioned on T=2 S T sigma moneyness 103.1720 1.0001 0.2970 1.1086 106.8025 1.9337 0.0059 1.5291 73.6899 1.0049 0.3483 1.2806 96.9050 0.4798 0.2530 1.6489 129.9036 1.6109 0.0286 0.3932 70.6674 0.6879 0.5089 1.6949 126.6076 1.1710 0.4051 1.3195 95.6398 0.0855 0.4133 0.4837 114.1751 1.4486 0.2888 0.3599 79.0308 0.5609 0.4857 1.1420 127.8912 0.3830 0.4798 0.3025 S T sigma moneyness 103.1720 1 0.2970 1.1086 106.8025 1 0.0059 1.5291 73.6899 1 0.3483 1.2806 96.9050 1 0.2530 1.6489 129.9036 1 0.0286 0.3932 1 70.6674 1 0.5089 1.6949 126.6076 1 0.4051 1.3195 95.6398 1 0.4133 0.4837 114.1751 1 0.2888 0.3599 79.0308 1 0.4857 1.1420 127.8912 1 0.4798 0.3025 S T Sigma moneyness 103.1720.5 0.2970 1.1086 106.8025.5 0.0059 1.5291 73.6899.5 0.3483 1.2806 96.9050.5 0.2530 1.6489 129.9036.5 0.0286 0.3932.5 70.6674.5 0.5089 1.6949 126.6076.5 0.4051 1.3195 95.6398.5 0.4133 0.4837 114.1751.5 0.2888 0.3599 79.0308.5 0.4857 1.1420 127.8912.5 0.4798 0.3025 S T sigma moneyness 103.1720 2 0.2970 1.1086 106.8025 2 0.0059 1.5291 73.6899 2 0.3483 1.2806 96.9050 2 0.2530 1.6489 129.9036 2 0.0286 0.3932 2 70.6674 2 0.5089 1.6949 126.6076 2 0.4051 1.3195 95.6398 2 0.4133 0.4837 114.1751 2 0.2888 0.3599 79.0308 2 0.4857 1.1420 127.8912 2 0.4798 0.3025 26

DIAGNOSTIC PLOTS 27

YIELD CURVE DIMENSIONALITY REDUCTION 28

YIELD CURVE PRIMER Bonds have a fixed maturity (1M, 3M, 10Y) and pay coupons Examples of bonds treasury bonds, corporates, muni etc. Yield Curve: Plot of bond yields against maturities Adjacent points on the yield curve move together (correlated) 29

U.S. TREASURY YIELD CURVE 11 tenors/maturities Typically upward sloping Different shapes Pre-crisis Post-crisis Current 30

YIELD CURVE DYNAMICS Yield for each tenor (point on the yield curve) changes every day Problem: How to model the changes in the yield curve driven by 11 correlated variables? Any parsimonious representation possible? 31

PRINCIPAL COMPONENT ANALYSIS (PCA) PCA can be used to: Reduce dimensionality Retain as much variance in the dataset as possible Typically first few (3-5) PCA factors enough to explain almost all the variance 32

PCA OVER DIFFERENT TIME PERIODS PCA factors vary with time periods Interval Selector Quickly select different time intervals Perform stats on the selected time slices (using callbacks) 33

YIELD CURVE PCA: CRISIS 34

YIELD CURVE PCA: AFTER CRISIS 35

YIELD CURVE PCA: CURRENT 36

YIELD CURVE PCA: CURRENT 37

DIMENSION REDUCTION: AUTOENCODERS tanh relu linear Compressed feature vector 38

PCA VS AUTOENCODER PCA Autoencoder 39

DIMENSION REDUCTION: AE VS PCA 40

CONCLUSION Abundance of financial data Abundance of already existing models/techniques ML/DL techniques provide new ways of modeling financial data Interactive visualization tools help us better understand and interpret these models 41

RESOURCES Widget libraries used to build the applications: ipywidgets: https://github.com/jupyter-widgets/ipywidgets bqplot: https://github.com/bloomberg/bqplot (and other custom widgets) ML/DL libraries scikit-learn: http://scikit-learn.org tensorflow: https://www.tensorflow.org keras: https://keras.io Tech at Bloomberg: www.techatbloomberg.com 42