Predicting Economic Recession using Data Mining Techniques

Similar documents
Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Bond Market Prediction using an Ensemble of Neural Networks

Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data

Iran s Stock Market Prediction By Neural Networks and GA

Backpropagation and Recurrent Neural Networks in Financial Analysis of Multiple Stock Market Returns

CHAPTER 3 MA-FILTER BASED HYBRID ARIMA-ANN MODEL

Understanding neural networks

Forecasting Foreign Exchange Rate during Crisis - A Neural Network Approach

An enhanced artificial neural network for stock price predications

A Comparative Study of Various Forecasting Techniques in Predicting. BSE S&P Sensex

The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index

International Journal of Research in Engineering Technology - Volume 2 Issue 5, July - August 2017

Abstract Making good predictions for stock prices is an important task for the financial industry. The way these predictions are carried out is often

Yao s Minimax Principle

SURVEY OF MACHINE LEARNING TECHNIQUES FOR STOCK MARKET ANALYSIS

Two kinds of neural networks, a feed forward multi layer Perceptron (MLP)[1,3] and an Elman recurrent network[5], are used to predict a company's

Forecasting stock market prices

Stock market price index return forecasting using ANN. Gunter Senyurt, Abdulhamit Subasi

A Comparative Study of Ensemble-based Forecasting Models for Stock Index Prediction

A Novel Prediction Method for Stock Index Applying Grey Theory and Neural Networks

STAT758. Final Project. Time series analysis of daily exchange rate between the British Pound and the. US dollar (GBP/USD)

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Development and Performance Evaluation of Three Novel Prediction Models for Mutual Fund NAV Prediction

AN ARTIFICIAL NEURAL NETWORK MODELING APPROACH TO PREDICT CRUDE OIL FUTURE. By Dr. PRASANT SARANGI Director (Research) ICSI-CCGRT, Navi Mumbai

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

Predicting Abnormal Stock Returns with a. Nonparametric Nonlinear Method

Foreign Exchange Rate Forecasting using Levenberg- Marquardt Learning Algorithm

A Markov switching regime model of the South African business cycle

Time Series Forecasting Of Nifty Stock Market Using Weka

Artificially Intelligent Forecasting of Stock Market Indexes

STOCK PRICE PREDICTION: KOHONEN VERSUS BACKPROPAGATION

Chapter IV. Forecasting Daily and Weekly Stock Returns

Journal of Insurance and Financial Management, Vol. 1, Issue 4 (2016)

STOCK MARKET PREDICTION AND ANALYSIS USING MACHINE LEARNING

This homework assignment uses the material on pages ( A moving average ).

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Empirical Study on Short-Term Prediction of Shanghai Composite Index Based on ARMA Model

Introduction to Population Modeling

You can define the municipal bond spread two ways for the student project:

LITERATURE REVIEW. can mimic the brain. A neural network consists of an interconnected nnected group of

Predicting Changes in Quarterly Corporate Earnings Using Economic Indicators

Business Cycles in Pakistan

PREDICTION OF THE INDIAN STOCK INDEX USING NEURAL NETWORKS

Prediction of Stock Closing Price by Hybrid Deep Neural Network

Improving Returns-Based Style Analysis

Draft. emerging market returns, it would seem difficult to uncover any predictability.

Per Capita Housing Starts: Forecasting and the Effects of Interest Rate

Chapter 6 Forecasting Volatility using Stochastic Volatility Model

Based on BP Neural Network Stock Prediction

Relationship between Consumer Price Index (CPI) and Government Bonds

Variance in Volatility: A foray into the analysis of the VIX and the Standard and Poor s 500 s Realized Volatility

University of Regina

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam

Applications of Neural Networks in Stock Market Prediction

Alternate Models for Forecasting Hedge Fund Returns

An Improved Approach for Business & Market Intelligence using Artificial Neural Network

Keywords Time series prediction, MSM30 prediction, Artificial Neural Networks, Single Layer Linear Counterpropagation network.

A Review of Artificial Neural Network Applications in Control. Chart Pattern Recognition

State Switching in US Equity Index Returns based on SETAR Model with Kalman Filter Tracking

2015, IJARCSSE All Rights Reserved Page 66

Creating short-term stockmarket trading strategies using Artificial Neural Networks: A Case Study

The Analysis of ICBC Stock Based on ARMA-GARCH Model

Stock Market Prediction using Artificial Neural Networks IME611 - Financial Engineering Indian Institute of Technology, Kanpur (208016), India

Using artificial neural networks for forecasting per share earnings

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

NEWCASTLE UNIVERSITY. School SEMESTER /2013 ACE2013. Statistics for Marketing and Management. Time allowed: 2 hours

Performance analysis of Neural Network Algorithms on Stock Market Forecasting

A Dynamic Hedging Strategy for Option Transaction Using Artificial Neural Networks

Valuation and Optimal Exercise of Dutch Mortgage Loans with Prepayment Restrictions

Outline. Neural Network Application For Predicting Stock Index Volatility Using High Frequency Data. Background. Introduction and Motivation

COGNITIVE LEARNING OF INTELLIGENCE SYSTEMS USING NEURAL NETWORKS: EVIDENCE FROM THE AUSTRALIAN CAPITAL MARKETS

Predicting stock prices for large-cap technology companies

Predicting the stock price companies using artificial neural networks (ANN) method (Case Study: National Iranian Copper Industries Company)

Regional Business Cycles In the United States

The 2 nd Order Polynomial Next Bar Forecast System Working Paper August 2004 Copyright 2004 Dennis Meyers

Real-Options Analysis: A Luxury-Condo Building in Old-Montreal

The use of real-time data is critical, for the Federal Reserve

Pattern Recognition by Neural Network Ensemble

Jaime Frade Dr. Niu Interest rate modeling

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm

Prediction Using Back Propagation and k- Nearest Neighbor (k-nn) Algorithm

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

Sustainability of Current Account Deficits in Turkey: Markov Switching Approach

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Improving Stock Price Prediction with SVM by Simple Transformation: The Sample of Stock Exchange of Thailand (SET)

Stock Market Forecasting Using Artificial Neural Networks

Government Tax Revenue, Expenditure, and Debt in Sri Lanka : A Vector Autoregressive Model Analysis

The Robust Repeated Median Velocity System Working Paper October 2005 Copyright 2004 Dennis Meyers

Web Extension 25A Multiple Discriminant Analysis

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay

Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas Consumption

Research Article Design and Explanation of the Credit Ratings of Customers Model Using Neural Networks

Book References for the Level 2 Reading Plan. A Note About This Plan

Role of soft computing techniques in predicting stock market direction

$tock Forecasting using Machine Learning

Forecasting Foreign Exchange Rate by using ARIMA Model: A Case of VND/USD Exchange Rate

INFLATION TARGETING AND INDIA

Foreign Exchange Forecasting via Machine Learning

Valencia. Keywords: Conditional volatility, backpropagation neural network, GARCH in Mean MSC 2000: 91G10, 91G70

Application of Innovations Feedback Neural Networks in the Prediction of Ups and Downs Value of Stock Market *

Transcription:

Predicting Economic Recession using Data Mining Techniques Authors Naveed Ahmed Kartheek Atluri Tapan Patwardhan Meghana Viswanath Predicting Economic Recession using Data Mining Techniques Page 1

Abstract Over the past 50 years the US economy has seen as many as six recessions. Every recession causes significant economic loss. It takes quite a lot of planning and effort to recover from a recession. This effect could be reduced if we know that the economy is heading towards a recession. Some measures could be taken beforehand to reduce the intensity of economic loss. The problem of finding economic changes leading to a recession is not a trivial one. Over the years analysts have tried to find a pattern of changing economic conditions causing recession. Economists have come up with various predictive models indicating factors leading up to a recession. We propose using various data mining techniques to come up with a better prediction model. We analyze a few major leading economic factors as indicators leading to a recession. Using the data for these factors during the previous recessions, we try and predict occurrence of an economic recession. 1. Introduction By definition, a recession is a general slowdown in economic activity in a country over a sustained period of time, or a business cycle contraction [1]. Any economy hit by an economic recession suffers a long period of economic loss and recovery. Predicting recession would give a chance to reduce the degree of economic loss caused by recession. Some preventive measures and careful decisions may save considerable amounts of money. In spite of the importance of predicting recession, it remains a tough task with limited success in the field of economic analysis. Researchers are trying to use various statistical analysis and data mining techniques to predict recession but none of them is full proof. In our paper we use four different data mining techniques - Neural Networks, Decision trees, Linear Regression and Auto regression to predict economic recession. Economic growth of a country is measured by the Gross Domestic Product (GDP). There are several factors that affect GDP. Some of these economic factors affect the change in GDP and these are known as leading indicators, whereas others change according to the GDP and are known as lagging indicators. The key to predicting recession is studying the changes in the leading indicators observed in the period leading to the previous known recessions and analyzing the current data to see if similar changes occur. Neural networks are widely known for their pattern recognition capability. We use different types of neural networks and analyze which one gives the best result for predicting recession. Another data mining technique we used is Decision tree. Decision tree method is used for classifying data. We treat the prediction of recession as a problem of classifying whether the next quarter will result in a recession or not. Another popular approach for predicting values is regression. We use multiple linear regression to find the best fitting curve to our data and then feed the input data for predicting recession. Auto regression is widely used for predicting future values in a time series data. We use auto regression to predict the future values of the GDP using previous values of the GDP. ARMA(Auto Regression with Moving Average) is a popular approach used to improve the results of auto regression. We apply moving average along with auto regression using ARIMA to predict GDP values for future quarters. The rest of the paper is structured in the following manner: in section 2 we discuss related work in the literature. Section 3 explains the leading economic indicators that we have used. Section 4 discusses the use of Neural Networks using NeuroShell and its results. Section 5 discusses the use of Neural Networks and Decision trees in predicting recession using Weka. Section 6 shows the Linear Regression approach whereas Section 7 discusses Auto Regression. Section 8 gives our conclusion. Predicting Economic Recession using Data Mining Techniques Page 2

2. Background Several efforts have been made to attempt recession using data mining techniques. At the Monash University in Australia[1], they developed a bivariate nonlinear model of output (GDP) and the interest rate spread, and compared its ability in predicting recessions with linear and nonlinear models of output. They found that a nonlinear model of output and spread seems to give less false warnings of recession than a linear model. A study done at the Kent State University[2] used the neural network approach to predicting recession. They recursively modeled the relationship between the leading indicators and the probability of a future recession. The out-ofsample results show that via the NN model indicators, such as interest rate spread, S&P500 index, etc. are useful in predicting US recessions. Furthermore, when the out-of-sample forecasting period is divided into three subperiods, they found that the relevance of various leading indicators may change from time to time. The out-of-sample results showed that interest spread is the single best indicator for predicting US recessions. 7. Money Supply Is defined as the total amount of money available in an economy at any given point of time. (Figure 3.7) 8. Oil prices This is the price of 1 gallon of oil chained in year 2000 US $. (Figure 3.8) 9. Tax Revenues The total value of tax revenues for a particular year. (Figure 3.9) 10. Gross Domestic Product The GDP measure for a particular quarter. (Figure 3.10) Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 3. Economic Indicators Leading Economic Indicators which were taken into account for predicting a recession. *source[5],[6],[7],[8] 1. Housing Starts This indicates the total number of private houses which have started construction in a particular quarter. (Figure 3.1) 2. Bank PLR This is the base lending rate at which the Banks offers loans to a customer. (Figure 3.2) 3. Personal Income It is the individual per capita personal income. (Figure 3.3) 4. S & P 500 This is a value weighted index of 500 large cap common stocks. (Figure 3.4) 5. Production Index - This is an index of production volume of some major products. (Figure 3.5) 6. Consumer Index It is an index of all prices of urban consumer goods. (Figure 3.6) Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 3.10 Predicting Economic Recession using Data Mining Techniques Page 3

4. Decision Trees and Neural Network using Weka Decision Trees Introduction: A decision Tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences including chance event outcomes, resource costs and utility. In data mining and machine learning, a decision tree is a predictive model that is a mapping from observations about an item to conclusions about its target value. (Source: Wikipedia). Approach: The Decision Tree algorithm J48 was used to predict the recessions. We used the Weka package to generate the Decision trees. Basically 3 types of definitions of economic recessions were considered for our testing. 1. If there was a negative Gross Domestic product (GDP) growth for only 1 quarter. 2. If there was negative GDP growth for any 2 months in a quarter. 3. If there was a negative GDP growth for any 2 consecutive months in a quarter. We used 12 quarters and 8 quarter inputs for each of the leading economic indicators discussed earlier. The testing set from 2007 Quarter 1 till 2008 quarter 4 and 1974 Quarter 1 till 1975 quarter 4 was used. Data from 1961 Quarter 1 till 2006 Quarter 4 was used as a training set. Neural Networks Introduction: Artificial Neural Networks (ANN) is a mathematical model or computational model based on a biological neural network. It consists of interconnected group of artificial neurons and processes information using a connectionist approach to computation. In most cases, an ANN is an adaptive system that changes its structure based on external or internal Results for Decision Trees: 12 quarters with negative GDP growth for 2 consecutive months criteria inst# actual predicted error Prob. Distr. 1974 1 1:Yes 1:Yes *1 0 1974 2 1:Yes 1:Yes *1 0 1974 3 1:Yes 1:Yes *1 0 1974 4 1:Yes 1:Yes *1 0 1975 1 2:No 2:No 0 *1 1975 2 2:No 2:No 0 *1 1975 3 2:No 2:No 0 *1 1975 4 2:No 2:No 0 *1 2007 1 2:No 2:No 0 *1 2007 2 2:No 2:No 0 *1 2007 3 2:No 2:No 0 *1 2007 4 2:No 2:No 0 *1 2008 1 2:No 2:No 0.06 *0.938 2008 2 2:No 2:No 0.06 *0.938 2008 3 1:Yes 2:No 0.06 *0.938 2008 4 1:Yes 2:No 0.06 *0.938 Table 4.1 Correct Classified Instances 14 87.5% Incorrectly Classified Instances 2 12.5% Information that flows through the network during learning phase. (Source: Wikipedia). Approach: We have used the Weka package in order to predict the recession. We have used data from 1961 Quarter 1 till 2006 Quarter 4 as a training set. We used the testing set from 2007 Quarter 1 till 2008 quarter 4 and 1974 Quarter 1 till 1975 quarter 4. The number of hidden layers Predicting Economic Recession using Data Mining Techniques Page 4

used was 2 with each layer containing 3 hidden nodes. Also the 3 different definitions of economic recession were considered with 12 and 8 quarters respectively. Results for Neural Networks: 12 quarters with negative GDP growth for 2 consecutive months criteria inst# actual predicted error prob. Distr. 1974 1 1:Yes 1:Yes *0.982 0.018 1974 2 1:Yes 1:Yes *0.984 0.016 1974 3 1:Yes 1:Yes *0.977 0.023 1974 4 1:Yes 1:Yes *0.977 0.023 1975 1 2:No 2:No 0.03 *0.97 1975 2 2:No 2:No 0.022 *0.978 1975 3 2:No 2:No 0.017 *0.983 1975 4 2:No 2:No 0.015 *0.985 2007 1 2:No 2:No 0.015 *0.985 2007 2 2:No 2:No 0.014 *0.986 2007 3 2:No 2:No 0.014 *0.986 2007 4 2:No 2:No 0.014 *0.986 2008 1 2:No 2:No 0.014 *0.986 2008 2 2:No 2:No 0.014 *0.986 2008 3 2:Yes 2:No 0.014 *0.986 2008 4 2:Yes 2:No 0.014 *0.986 Table 4.2 Correct Classified Instances 14 87.5% Incorrectly Classified Instances 2 12.5% 5. Neural Networks using NeuroShell Introduction Using neural networks on WEKA can only classify input as either recession or no recession i.e. it gives only one output. Therefore, NeuroShell is used which can predict numbers and can give more than one output. Using NeuroShell, GDP percent change based on chained 2000 dollars is predicted. The idea here is to be able get two consecutive negative growth quarters i.e. when the network predicts the present and the next two quarters which act as an indicator that a recession has started. The focus is not on how much the GDP is varying from actual value but that it shows two consecutive quarters of negative growth indicating start of recession. The network is also used to predict how long the recession stays. Various networks are used and the results shown are the results obtained from the networks that gave the best result. Methodology Along with the nine leading factors, the GDP from the previous quarters is also given as input to predict the present and the next two quarters percentage change in GDP. To predict how long a recession lasts, another factor is added as input indicating how long has it been since the last recession occurred and the network will output how long the current recession will last and the GDP change for three quarters. The recession periods are identified using data from National Bureau of Economic Research (NBER) which is well known for providing start and end dates for recessions in United States[3]. The data is divided into training set, test set and validation set. The years from 1962 to 1966, 1971 to 1986 and 1992 to 2005 are taken as training set. The years from 1967 to 1970 and 1987 to 1991 are taken as test set. The years from 2006 to 2008 are taken as validation set. Predicting Economic Recession using Data Mining Techniques Page 5

Setup Different types of neural networks are used to predict the GDP change - the Ward Network which uses 3 different activation function with 3 hidden layers in parallel, Elman network which adds the output from hidden layer as input to next pattern, Jordon network which adds the output of previous pattern as input to next pattern. Other simple neural networks using one hidden layer and multiple hidden layers are also used but did not yield good results. Various time periods like last eight, twelve and sixteen quarters are used to train the neural network and the best time period is chosen. The learning rate is taken as 0.3 and the momentum is taken as 0.1 which is an arbitrary choice. The stopping criterion for training is if error on training set is less than 0.000002 or number of epochs since minimum average error greater than 40,000 or if the average error on the test set is 0.0002 or number of events since the average error is greater than 200,000 with calibration interval 500. The number of hidden neurons is calculated using the formula - ½ (inputs + Outputs) + square root of the number of patterns in the training set. These settings are used to find which networks did well and filtered out the networks which did not do well. The networks that performed better are then selected and different settings are used by changing the learning rate, momentum, stopping criteria and number of hidden nodes. Once the network with best settings is found (it was found that best network is the one that predicts two consecutive quarters as negative growth exactly at the start of recession) this network is then used to predict how long the recession will last. The inputs to the network are the nine leading economic indicators and the GDP change from the previous quarters. The output is the change in GDP for the present quarter and also the change in GDP for the next two consecutive quarters. The performance measuring criteria is how well the network is able to predict the start of recession by giving two consecutive quarters GDP as negative growth. Mean Absolute Error(MAE) is collected as some kind of measuring criteria, but for predicting 2008 quarters GDP values for next quarters is taken as 0 since information on recent quarters values are not available and therefore the MAE is not a correct measuring criterion for this problem. Analysis of Results The ward network performed better than any other network in predicting the start of recession. The last 12 quarters are taken as inputs. Table 5.1 shows the predicted change in GDP from 2006 quarter 1 to 2008 quarter 4. Ideally the output should have negative GDP growth for present and next quarter in 2007 q4 but the network did give such results. Instead it predicted negative growth for present GDP for 2007 quarter 3 and present GDP for 2007 quarter 4 as negative. This gives an indication that the recession has started. The GDP in column for quarter 3 of 2007 should be equal to Present column GDP of quarter 4 of 2007 which is not the case in the table. The network was able to give some indication of recession. Using last 16 quarters the network predicted all negative GDP growth in the present column and positive GDP for both next and next next quarters. Therefore, we decided to drop last 16 quarters as input and continue with only 12 quarters. This network setup is then used to predict how long the recession would last by adding another input which has information on how long has it been since the last recession and a fourth output indicating how long the recession would last. The problem with having this input and output information is the inability to represent the non recession years in terms of the input and output. The input and output for non recession periods is just how long has it been since the last recession and for how many quarters has the recession lasted. Table 5.2 gives the details on how the network performed. According to the output from this network the recession started in quarter 1 of 2008 and the duration of current recession is predicted in quarter 2 of 2008 as 5 quarters. So the network predicted that the current recession would end in quarter 3 of 2009. Predicting Economic Recession using Data Mining Techniques Page 6

Different settings for the ward network are used to be able to accurately predict the start of recession with the indicator of two consecutive quarters of negative GDP growth. The ward network with eight quarters as input gave the best result. Table 5.3 gives the details of how the network did. The network was able to predict the start the recession perfectly with the indicator we are looking for, two consecutive quarters of negative GDP growth. For quarter 4 of 2007 the network predicted negative growth for the present quarter and the next quarter which indicates the start of recession. Present Period Recession 2.128 0.981 2.173 2006q1 NO 0.670 1.236 1.926 2006q2 NO 0.498 1.526 1.110 2006q3 NO 0.207 0.893 0.710 2006q4 NO 0.506 0.213 1.588 2007q1 NO 0.445 0.135 1.654 2007q2 NO -0.992 1.637 0.254 2007q3 NO -0.424 1.734 0.563 2007q4 YES 0.200 0.848 2.621 2008q1 YES -0.359 0.519 1.772 2008q2 YES 0.975-0.735 2.524 2008q3 YES 1.529 1.028 3.823 2008q4 YES Table 5.1 How Long Present Period Recession 4 2.03 2.42 4.66 2006q1 NO 4 1.47 2.24 3.48 2006q2 NO 4 3.21 0.70 4.00 2006q3 NO 4 2.98 2.56 1.87 2006q4 NO 4 0.12 2.54 5.95 2007q1 NO 4 1.34 2.30 3.75 2007q2 NO 4 2.35 1.24 0.40 2007q3 NO 4 1.89 0.96 0.51 2007q4 YES 4-1.07 1.65 5.28 2008q1 YES 5-0.64 1.02 5.48 2008q2 YES 5 0.83 1.42 4.65 2008q3 YES 5 1.55 0.77 6.67 2008q4 YES Table 5.2 Present Period Recession 1.55 2.53 2.12 2006q1 NO 0.62 2.07 1.41 2006q2 NO 0.40 1.34 2.41 2006q3 NO 0.57 0.23 1.22 2006q4 NO -0.43 1.14 2.02 2007q1 NO -0.98 0.74 2.21 2007q2 NO -1.42 0.51 2.01 2007q3 NO -1.06-0.15 2.94 2007q4 YES 0.58 1.99 3.66 2008q1 YES -0.13 2.98 3.20 2008q2 YES 1.26 3.26 2.54 2008q3 YES 3.04 1.96 2.95 2008q4 YES Table 5.3 MAE HowLong Present 12 quarters wardnet NA 2.520 2.039 2.699 jordon NA 3.033 2.251 2.243 elman NA 2.886 2.282 2.209 ward net 1.699 3.126 1.606 3.029 8quarters ward NA 2.867 2.251 2.741 jordon NA 2.904 2.327 2.350 elman NA 3.171 2.332 2.376 ward net 1.635 2.618 3.339 4.153 16 quarters WardNet NA 3.821 2.224 2.282 Table 5.4 MAE for different networks The table 5.4 shows the MAE values for some of the best performing networks. The table above shows that Ward network proved to be the best neural network to indicate the start of recession. Using the input from previous 12 quarters gave the least MAE as compared with 8 and 16 quarters as input. The Ward network was chosen to predict the length of recession due to its better performance. Predicting Economic Recession using Data Mining Techniques Page 7

6. Linear Regression Linear Regression is a well known data mining technique, widely used for predicting values. A linear regression model assumes [4], given a random sample (Y i, X i1,,x ip ), i=1,,n, a possibly imperfect relationship between Y i the dependent variable and the indicators or independent variables X i1, X ip. A disturbance term ε i, which is a random variable too, is added to this assumed relationship to capture the influence of everything else on Y i other than X i1,,x ip. Hence, the multiple linear regression model takes the following form Y i = b 0 + b 1 X i1 +b 2 X i2 + + b p X ip +ε i (1) where i=1,,n Thus finding the values of b 0,,b p and taking the known values of X i1, X ip we can predict the dependent variable values. Approach Fig. 1 shows our approach for predicting recession using linear regression. We tested the performance of linear regression on different inputs 1. Input of all economic indicators for previous 8 quarters. 2. Input of all economic indicators for previous 12 quarters. 3. Input of all economic indicators for previous 16 quarters. Thus the set of dependent variables varies depending on the number of quarters we consider for prediction. Also, since the definition of recession is not very precise, we applied the approach to three different criteria - 1. Recession is indicated by occurrence of single quarter of negative growth GDP. 2. Recession is indicated by occurrence of two consecutive quarters of negative growth GDP. 3. Recession is indicated by occurrence of two quarters of negative growth GDP with at most one positive growth GDP quarter in between. Fig 6.1. Diagram showing the multiple regression approach. Our dependent variable indicated whether the next quarter will be in recession or not, thus predicting the start of recession. We used moving average of previous 5 values to smooth the input values. Smoothing is necessary because if we plot the values of any economic indicator the graph would have a lot of noise. This noise is caused because the values are not defined according to a smooth curve function. After the input data is smoothed we find the linear model for prediction. To find the coefficients in equation (1) we minimize the error ε i. Once the coefficients of the equation are found we use our test input to predict whether the next quarter is a recession or not. Fig 6.2 Graph showing smoothing effect on the economic indicator Housing Starts Predicting Economic Recession using Data Mining Techniques Page 8

Analysis of Results The available data was divided into two partsthe training and the testing data. We used data from 1974-1975 and 2006-2008 as test data for Linear Regression. The remaining data was used to find the coefficients of the linear model. Year Quarter Predicted Actual 1974 1 Yes Yes 1974 2 No Yes 1974 3 No Yes 1974 4 No Yes 1975 1 Yes Yes 1975 2 No No 1975 3 No No 1975 4 Yes No 2006 1 No No 2006 2 No No 2006 3 Yes No 2006 4 No No 2007 1 No No 2007 2 No No 2007 3 No No 2007 4 No No 2008 1 No No 2008 2 No No 2008 3 No Yes 2008 4 Yes Yes Table 6.1 Result for 12 quarters and alternate negative growth quarters for recession. Year Quarter Predicted Actual 1974 1 Yes No 1974 2 No No 1974 3 No Yes 1974 4 No Yes 1975 1 Yes Yes 1975 2 Yes No 1975 3 No No 1975 4 No No 2006 1 No No 2006 2 No No 2006 3 No No 2006 4 No No 2007 1 No No 2007 2 No No 2007 3 No No 2007 4 No No 2008 1 Yes No 2008 2 No No 2008 3 No Yes 2008 4 No Yes Table 6.2 Result for 12 quarters and consecutive negative growth quarters for recession. The data was tested for different number of input quarters. Linear regression performed best when the input was considered for the previous 12 quarters. Input for the previous 16 quarters generated a lot of false negatives whereas Inputs from previous 8 quarters ended up giving more false positives. This was expected behavior because considering the previous 4 years data, the trend was that no recession should occur. Similarly 8 quarters data as input proved to be smaller and thus predicted more quarters as recession. When we tested for different recession criteria, the results were most efficient when our criterion of recession was two consecutive negative growth quarters. This again shows the expected behavior because this is a widely followed definition for a recession. False -ve False +ve True -ve True +ve Accuracy Table 6.1 4 3 12 1 65% Tabl2 6.2 4 2 11 3 70% Table 6.3 Accuracy Table 7. Autoregression A variant of the regression technique known as autoregression, which is usually applied to time series data, was applied to forecast future GDP values and decide whether the next quarter will be in recession or not. Autoregression predicts future values based on some of the immediate previous values in the times series. The ARIMA (Autoregressive Integrated Moving Average) model was used to perform autoregression on the time varying GDP values. The ARIMA model has two parts the autoregressive (AR) part and the moving average (MA) part. The ARIMA model takes three inputs a) p order of autoregressive part b) d degree of differencing c) q order of moving average part The equation that is obtained from the model is : X[t] = a[1]x[t-1] +... + a[p]x[t-p] + e[t] + b[1]e[t-1] +... + b[q]e[t-q] Predicting Economic Recession using Data Mining Techniques Page 9

Moving average smoothing is performed on the data to remove any bias due to single values in the data set. The model ARIMA(0,1,1), is the exponential smoothing model often used in forecasting. Exponential smoothing assigns exponentially decreasing weights as the observation get older. For the AR part, trials were performed by taking the previous 4, 8 and 12 quarters GDP values. Approach Using the ARIMA(8,1,1) model, the predicted GDP values for the next 3 quarters were found. If the first of the 3 quarters and one of the next two showed negative growth, that quarter was declared to be in recession. Best results were obtained when previous 8 quarters data was taken to be the order of regression. Results are shown below ARIMA(8,1,1) Yr / Qtr Q1, Q2, Q3 Predictions Result 1974 Q1 4422.56 4457.32 4472.95 No 1974 Q2 4347.93 4338.78 4365.45 Yes 1974 Q3 4338.74 4365.45 4401.6 No 1974 Q4 4324.93 4347.41 4374.48 No 1975 Q1 4302.68 4314.35 4307.23 Yes 1975 Q2 4232.26 4196.36 4194.44 Yes 1975 Q3 4241.52 4256.26 4244.01 Yes 1975 Q4 4399.13 4456.97 4538.42 No 1976 Q1 4455.46 4536.74 4610.28 No 1976 Q2 4587.11 4672.85 4732.48 No 2008 Q1 11653.3 11692.4 11733.8 No 2008 Q2 11682.8 11721.9 11761.6 No 2008 Q3 11780.7 11834.1 11889.0 No 2008 Q4 11744.3 11779.3 11800.3 No 2009 Q1 11480.4 11439.7 11393.1 Yes Table 7.1 8. Conclusion Among the four methods that were employed to try and predict recession, neural networks and multiple linear regression gave the best results. Neural networks, especially the Ward network in NeuroShell predicted very accurately the start of recession and also provided an estimate as to how long the recession would last. The autoregression technique proved to be inadequate in predicting recession since it gave results only after the actual start of recession. 9. References [1] Heather M. Anderson and Farshid Vahid Predicting the Probability of a Recession with Nonlinear Autoregressive Leading Indicator Models [2] Min Qi; Predicting US recessions with leading indicators via neural network models ; International Journal of Forecasting 17 (2001) 383 401 [3] http://www.nber.org/cycles.html [4] http://en.wikipedia.org/wiki/linear_regression [5] http://www.economagic.com [6] http://fraser.stlouisfed.org/publications/ei/ [7] http://www.nber.org/data_index.html/ [8] http://www.gpoaccess.gov/indicators/99janbro.h tml It was observed from the above results that barring a few results, recession was predicted only the end of the actual recession period implying that this technique makes a prediction after the fact. Hence, auto regression was found not to be a good technique for predicting recession. Predicting Economic Recession using Data Mining Techniques Page 10