Agricultural and Applied Economics 637 Applied Econometrics II Assignment I Using Search Algorithms to Determine Optimal Parameter Values in Nonlinear Regression Models (Due: February 3, 2015) (Note: Make sure you hand in the computer code and output files used in answering the following questions. If your output contains a listing of iterative results, edit the output file so only the first few and last few iterations are shown. There is no need to waste paper.) (60 Total Points) 1. (40 pts) As you will discover throughout the semester, unlike estimating parameters of the classical regression model (CRM), finding optimal parameters of a nonlinear (in parameter) regression model usually requires an iterative search process. The definition of what constitutes optimal obviously depends on the objective function used as a guide in determining the preferred parameter values. For example, are you trying to find parameter values that minimize the sum of squared differences between predicted and actual dependent variable values (i.e., the sum of squared errors, [SSE s]) or are you trying to find parameter values that maximize the joint probability of obtaining the endogenous variable values you have in your dataset. Unlike the CRM, a nonlinear regression model requires you to make an initial guess as to parameter values and then to check to determine if indeed these parameter values are optimal. If not optimal then the estimation algorithm you are using should have specific procedures for obtaining new updated parameter estimates. This parameter updating is an iterative process where for each vector change, the parameters are improved. Improvement is defined relative to algorithm objective function. 1
Over the next few weeks you will be learning alternative methods for conducting the above iterative process in parameter selection. We will undertake this estimation assuming alternative algorithm objective functions. One method we will not be reviewing to any large degree is what is known as the Grid Search method given its severe limitations as to the number of optimal parameters that can be identified. Under the Grid Search method you divide the feasible range of parameter values into a finite grid of discrete values and then evaluate the impact of all parameter combination(s) on the defined objective function (e.g., the SSE value, loglikelihood function (LLF) value, etc.). a) (10 pts) Let s assume you want to estimate the following relationship between annual per capita U.S. gasoline quantity (not expenditures) (PC_Gas_Qt), lagged consumption (PC_Gas_Qt-1) and gasoline price index (Gas_Pt). t t-1 P Q PC_Gas_Q Gas_P PC_Gas_Q + ε t (1.1) where t = 1953-2004, the s are unknown parameters whose values you are trying to estimate and εt is an error term for the t th year where εt~ (0,σ 2 ). Note the lag structure depicted in (1.1). Also note that (1.1) cannot be linearized with respect to the parameters given the assumed error structure. I would like you to develop MATLAB code that will determine the values of P and Q that minimize the SSE from predicting PC_Gas_Qt via the Grid Search method. The code you develop should display (and write to an output file) your estimate of σ 2 conditional on these final estimates. Remember under the Grid Search method you use a finite number of pre-defined grid points and you compare the SSE s under these finite number of candidate parameter combinations. How do you evaluate whether this pair of parameters generates a global versus local minimum SSE value. In contrast to the CRM, SSE functions for many nonlinear, in parameter, regression models, the SSE function may not be globally convex. t 2
For the 1 st 10 and last 10 iterations have your software print out: i. The iteration number; ii. Current parameter pair values; and iii. Resulting SSE values. What direction should the movement of the SSE values as more iterations are undertaken if you develop a correctly working estimation system and reasonable starting values? Does the behavior of your iterations follow this pattern? The data you will be using for this question is a dataset containing annual total U.S. gasoline expenditures and other aggregate U.S. data encompassing the period 1953-2004. This data is contained in the file gas_market_1_15.xlsx and can be obtained from the class website. You will have to create some of the variables used in (1.1) given that you only have the raw market data on your desk. b) (15 pts) To further refine your parameter estimate, I would like you to modify the code you developed in 1(a) so that once you obtain your parameter estimates via the Grid Search method you use these values as starting points for a more refined General Search algorithm. Under the General Search algorithm you take the optimal Grid Search obtained parameter values and then examine relative SSE values within the neighborhood of these optimal values. This refined General Search algorithm for a single parameter (ρ) can be illustrated via the diagram shown to the right. In words, we can describe this iterative General Search algorithm via the following: i. Use the Grid Search estimates of P and Q as starting values; ii. Given the above estimate of Q compare the current SSE values with SSE values obtained under the scenario of P being slightly 3
larger and slightly smaller than the above Grid Search parameter estimates; iii. Of the two, (i.e., larger and smaller) new candidate values of P, identify the new value of P that generates the smaller SSE then the Grid Search SSE parameter value used in (i) as your new updated P estimate; iv. Continue to change your P value in the same direction as identified in (iii) by adjusting the value of P used in (iii) until the SSE starts to increase; v. When the SSE starts to increase, reverse the direction of the change in the parameter value and continue in generating new estimates of P until the SSE starts to increase. In this iteration make the absolute value of the change in parameter value to be smaller than that used in (iv) (or the previous iteration). vi. Repeat step (v) until you feel you are close enough to the true but unknown value of P conditional on the fixed value of Q. vii. Given the P value obtained in (vi) undertake the same iterative process starting with step (ii) but instead of changing P, you change Q. viii. Repeat the iterative process starting with step (vii) but this time changing Q given the current value of P obtained in (vii), etc In developing this new General Search algorithm you will need to address several questions: i. What is the magnitude of the parameter steps (changes) that I should use to move from one parameter value to another? (Note: the parameter step is the absolute value of the parameter value changes from one iteration to another). Specifically how do I determine a parameter specific step length given that the parameters can vary significantly in size. ii. What new step size should I use whenever I reverse the search direction? iii. What criteria do I use to determine whether my current parameter estimates are close enough to the true but unknown optimal parameter values? 4
Present in words a summary of your algorithm you developed. How did you address the issues raised in (i) (iii) above. Similar to 1(a) your program should present your 1 st five iterations as well as the last five with the final estimated parameter, SSE and your final estimate of σ 2 being generated from the final updated parameter vector. What are your results? How many iterations did it take for you to say you have obtained parameter values that minimize the SSE function? c) (15 pts) Finally, lets extend the methodologies you developed in sections (a) and (b) above to estimate the following t t t-1 t t P Q N PC_Gas_Q Gas_P PC_Gas_Q NC_ P + ε (1.2) where NC_Pt is the price of new cars We now have 4 parameters to be estimated, P, Q, N, and σ 2. Present your final estimates of these parameters. your program should be designed to handle any number of parameters without having to change the iteration code but by having the model size being dynamically determined and the same code can estimate optimal parameter vectors regardless of size. What were your starting parameter, SSE and σ 2 values? What are your final parameter, SSE and σ 2 values? What was your convergence criteria? How many iterations did it take to generate your final parameter estimates? 2. (20 pts) When attempting to determine optimal parameter values in (1) we did not make any assumption concerning the shape of the distribution of the error term, εt, other than E(εt) = 0 and its variance (i.e., σ 2 ) is homoscedastic and non-autocorrelated. Another method that can be used to obtain parameter estimates is to make an assumption concerning the data generating process of our observed dependent variable, PC_Gas_Qt (and therefore εt). Once an assumption is made concerning the dependent variable probability distribution one can choose as the preferred parameter values that maximize the joint probability of observing our T dependent variable values, PC_Gas_Qt (t=1,,t). The typical assumption is that the dependent variable values are independently and identically distributed 5
(i.e., iid). This implies that given the relationships represented in (1.1) we have via the Markov theorem: f(y1,y2,,yt) = p(y1)p(y2 y1) p(yt yt-1)= p(ε1)p(ε2) p(εt) (2.1) where the yt s are our dependent variable values (t=1,,t). Let s assume that our error terms, the ε s, are iid normally distributed. This implies that our dependent variable (PC_Gas_Qt) is also normally distributed. Incorporating this additional information we can restate (1.1) to be the following: t t-1 P Q PC_Gas_Qt Gas_P PC_Gas_Q + ε (2.2) where for this applied we have εt~n(0,σ 2 μ). Given the above normality distribution assumption we can represent the natural logarithm of the joint PDF of the T observations of our dependent variable via the following: ˆ T 2 2 ln f y 1 1,...,y T 0.5 ln 2 ln σˆ ˆ σˆ ˆ t t (2.3) t1 where ˆ is the current estimate of. Given that we are treating our data as given, our objective is to choose the values of P and Q that maximizes the logarithm of the joint probability (i.e., eq. 2.3) of observing our data that we actually have. We can derive what is referred to as the data sample s log-likelihood function, L( ), where: T 2 2 ˆ ˆ 1 L ˆ y,...,y ln f y,...,y ˆ 0.5 ln 2 ln σ εˆ σ εˆ 1 T 1 T t t (2.4) t1 t (Hint: Remember the formula for an unbiased estimate of σ 2 under the CRM and how one identifies the error term vector given current parameter estimates.) A depiction of a log-likelihood function for a single parameter, θ, is shown in the figure to the right which also displays the general search procedure for identifying the optimal parameter value. 6
a) (15 pts) Modify the General Search method you developed for (1a) to estimate instead the two parameters using the maximum likelihood function approach. Your task is to find the values of these two parameters that maximize the value of (2.4). The following figure depicts a similar problem but with a different log-likelihood function and two parameters, Beta_1 and Beta_2. Besides the final maximum likelihood parameter estimates, you should also display the final total sample log-likelihood function value. b) (5 pts) Generate a graph similar to the above for parameter values surrounding the final parameter estimates. Graphically identify the optimal values of P and G. Extra Credit Due Feb. 10, 2015: The above questions have been devoted to using search methods to obtain estimates of a limited number of unknown regression parameters. Obviously to examine the properties of these point estimates we need to know the distribution of these estimates. I would like you to propose a method by which you can obtain parameter estimate standard errors. There is no one method to do this. Develop MATLAB code to implement your proposed algorithm and apply this to the analysis of the regression model applied in question #2 above. Modify your output to include these standard errors in a final result table. 7
8