Econ 582 Nonlinear Regression Eric Zivot June 3, 2013
Nonlinear Regression In linear regression models = x 0 β (1 )( 1) + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β it is assumed that the regression function (x) = [ x = x] =x 0 β is a linear (in x) function of the 1 vector β. In parametric nonlinear regression, the regression function (x θ) is a nonlinear function of parameters θ = (x θ)+ [ x = x] = (x θ) = [ x = x]
Examples of nonlinear regression functions ( θ) = 1 + 2 =1 =3 1+ 3 ( θ) = 1 + 2 3 =1 =3 ( θ) = 1 + 2 exp( 3 ) =1 =3 ( θ) = 1 + 2 + 3 ( 4 )1 ( 4 ) (x θ) = (x 0 θ) known, = Ã! (x θ) = θ 0 1 x 1 +(θ 0 2 x 2 3 1)Φ (x θ) = ³ θ 0 1 x 1 1( 2 3 )+(θ 0 2 x 1)1 ( 2 3 ) 4
Remarks Typically (x θ) is a continuous and differentiable function of θ In the switching examples with the indicator function, (x θ) is not differentiable in θ The form of (x θ) is sometimes motivated by economic theory (e.g. cost function estimation) Sometimes the form of (x θ) is adopted as a flexible approximation to an unknown regression function (e.g. (x θ) =polynomial in x)
Nonlinear Least Squares Estimation = ( x θ )+ ( 1) ( 1) Example (Cobb-Douglas production function) = 1 2 1 3 2 + = 2 =3 The nonlinear least squares estimator solves min (θ) = ( (x θ)) 2 = 2
Assume (x θ) is a continuous and differentiable function of θ The FOCs for a minimum are where Note 2 (θ) = 2 = 2 m θ (x θ) 1 ( (x ˆθ)) (x ˆθ) ( (x ˆθ))m θ (x ˆθ) =0 = (x θ) = ( (x ˆθ))m θ (x ˆθ) = 2 (x ) 1. (x ) ˆ m θ (x ˆθ)
Matrix Notation y 1 = 1. x 0 1 m(x θ) 1 = (x 1 θ). (x θ) 1 X = ε =. x 0 1 (θ) = (y m(x θ)) 0 (y m(x θ)) m θ (X θ) = m(x θ) 0 = (x 1 ) (x 1 ) 1..... (x ) 1 (x )
FOCs (θ) = 2 m(x ˆθ) 0 0 (y m(x ˆθ)) = 2m θ (X ˆθ) 0 (y m(x ˆθ)) = 2m θ (X ˆθ) 0ˆε = 0 Note: In general we have nonlinear equations in unknown and there is no analytical solution. Hence, ˆθ must be found numerically using an iterative algorithm. The most commonly used algorithm is Gauss-Newton iteration.
Gauss-Newton (GM) Algorithm The GN algorithm can be motivated as follows. Consider a 1st order Taylor series approximation to m(x θ) at θ = θ 1 (starting value) m(x θ) ( 1) = m(x θ 1 ) ( 1) + m θ (X θ 1 )(θ θ 1 ) ( ) ( 1) + error Approximate the nonlinear regression using the TS approximation y = m(x θ)+ε m(x θ 1 )+m θ (X θ 1 )(θ θ 1 )+ε
Using m(x θ 1 )+m θ (X θ 1 )(θ θ 1 ) = [m(x θ 1 ) m θ (X θ 1 )θ 1 ]+m θ (X θ 1 )θ rewrite the model as y m(x θ 1 )+m θ (X θ 1 )θ 1 = m θ (X θ 1 )θ + ε or ȳ(θ 1 ) = m θ (X θ 1 )θ + ε ȳ(θ 1 ) = y m(x θ 1 )+m θ (X θ 1 )θ 1 This approximate model is linear in θ
Estimate the approximate linear model by least squares min θ 1 (θ) = (ȳ(θ 1 ) m θ (X θ 1 )θ) 0 (ȳ(θ 1 ) m θ (X θ 1 )θ) θ 2 = ³ m θ (X θ 1 ) 0 m θ (X θ 1 ) 1 mθ (X θ 1 ) 0 ȳ(θ 1 ) Then repeat estimation of approximate linear model using updated estimate θ 2 min θ 2 (θ) = (ȳ(θ 2 ) m θ (X θ 2 )θ) 0 (ȳ(θ 2 ) m θ (X θ 2 )θ) θ 3 = ³ m θ (X θ 2 ) 0 m θ (X θ 2 ) 1 mθ (X θ 2 ) 0 ȳ(θ 2 )
At iteration we have θ +1 = ³ m θ (X θ ) 0 m θ (X θ ) 1 mθ (X θ ) 0 ȳ(θ ) Substituting in ȳ(θ )=y m(x )+m θ (X θ )θ we have θ +1 = ³ m θ (X θ ) 0 m θ (X θ ) 1 mθ (X θ ) 0 [y m(x θ )+m θ (X θ )θ ] = ³ m θ (X θ ) 0 m θ (X θ ) 1 mθ (X θ ) 0 [y m(x θ )] + ³ m θ (X θ ) 0 m θ (X θ ) 1 mθ (X θ ) 0 m θ (X θ )θ = θ + ³ m θ (X θ ) 0 m θ (X θ ) 1 mθ (X θ ) 0 [y m(x θ )]
Note: Using (θ ) θ +1 = θ 1 2 = 2m θ (X θ ) 0 [y m(x θ )] we have ³ mθ (X θ ) 0 m θ (X θ ) 1 (θ ) Provided that m θ (X θ ) 0 m θ (X θ ) is pd m θ (X θ ) is full rank Then the FOCs are satisfied if θ +1 θ That is, (θ ) = m θ (X θ ) 0 [y m(x θ )] 0
Common Convergence Criteria Stop when θ +1 θ 10 6 x = ³ 2 1 + + 2 1 2 To avoid issues with the units of θ it is better to stop when θ +1 θ θ + 10 5 Stop when (θ )
Stop when (θ +1 ) (θ ) (θ )+
Remarks The solution to the FOCs can be a local minimum, local maximum or a global minimum. The GN iteration scheme always leads in the direction of a minimum instead of a maximum provided m θ (X θ ) 0 m θ (X θ ) is pd θ +1 = θ + ³ m θ (X θ ) 0 m θ (X θ ) 1 mθ (X θ ) 0 [y m(x θ )] = θ 1 ³ mθ (X θ ) 0 (θ ) m 2 θ (X θ ) 1 If (θ ) 0 then θ +1 θ ; if (θ ) 0 then θ +1 θ
The GN iteration scheme can overshoot the global minimum. To guard against this a step-length correction, is often added to the algorithm θ +1 = θ ³ mθ (X θ ) 0 m θ (X θ ) 1 (θ ) where is chosen such that (θ +1 ) (θ ) To guard against getting stuck at a local minimum it is often suggested that different starting values be used.
Asymptotic Distribution of NLS estimator (Homoskedastic Case) = (x θ)+ [ 2 ]= 2 Consider the linear approximation evaluated at the true value of θ (θ) =m θ (x θ) 0 θ+ Assuming 1 1 m θ (x θ) 0 m θ (x θ)m θ (x θ) 0 h mθ (x θ)m θ (x θ) 0i = M θθ 1 X m θ (x θ) (0 2 M θθ ) Then (ˆθ θ) (0 2 M 1 θθ )
Equivalently, ˆθ Theasymptoticvariance 2 M 1 θθ with ˆ 2 = cm θθ = 1 1 µ 1 2 M 1 θθ can be consistently estimated using ˆ 2 c M 1 θθ ˆ 2 m θ (x ˆθ)m θ (x ˆθ) 0
Asymptotic Distribution of NLS estimator (Heteroskedastic Case) = (x θ)+ [ 2 x ]= 2 (x ) Then (ˆθ θ) (0 M 1 V = h m θ (x θ)m θ (x θ) 0 2 θθ VM 1 i θθ ) The matrix V can be consistently estimated using the White-type HC estimator ˆV = 1 m θ (x ˆθ)m θ (x ˆθ) 0ˆ 2