Yield Curve Construction: Draft 2

Yield Curve Construction: Draft 2 Stefanus Lie Risk Management Institute National University of Singapore Abstract In this work, several combination of models will be explored. Analysis, comparison, and stress testing will firstly be done for set of government bonds, before possible future implementation for policy banks and corporate bonds. Construction of our own yield curves could be useful for two reasons: Firstly, satisfy the desire to remove black-box part from Chinabond yield curve, for example in bonds valuation using Hull-White model. Secondly, to be implemented in future pricing project, or even to make it a commercial product. This Draft 2 contribution to the literature is comprehensive analysis, comparison, and stress testing among different combinations of filtering, models, knot points, and weight (these terms will be explained later) in Chinese bond market. Keywords: Spot Rate, Forward Rate, Discounting Rate, Spline, Filtering, Smoothness, Stress Testing, Knot Points, Global Optimization Preprint submitted to Risk Management Institute December 16, 2016

1. Preliminaries 1.1. Introduction and Notation Define s(t) to be t-year spot rate, f(t) to be t-year forward rate, and d(t) to be t-year discounting factor (or price of risk-free zero-coupon bond paying 1 in t year), for all t 0. Recall the following relations: 1. d(t) = exp( ts(t)). 2. d(t) = exp( t f(u)du). 0 3. f(t) = ts (t) + s(t). If we know either one of s(t), f(t), d(t), we know all. Term Structure of Interest Rates and Yield Curve of a class of bonds could be used interchangeably, and they may mean: 1. Yield to Maturity with respect to time to maturity t 0. 2. Spot rate with respect to time to maturity t 0. 3. Forward rate with respect to time to maturity t 0. 4. Discounting factor with respect to time to maturity t 0. If we know either one of (2), (3), or (4), we know all. 1.2. Government Bond Market in China Most of government bond in China are zero-coupon bond or fixed-coupon bond with no option. The term to maturity could be 0.25 years, 0.5 years, 0.75 years, 1 year, 2 years, 3 years, 5 years, 7 years, 8 years (only once in 1999), 9 years (only twice in 1983 and 1984), 10 years, 15 years, 20 years, 30 years, and 50 years. According to coupon payment structure, a bond could be classified as zero-coupon bond (if the issuance price is 100 and there is final payment of 100 plus coupon at maturity), discounted bond (if the issuance price is less than 100 and there is final payment of 100 at maturity), and coupon bond (if there are some coupon payments between issuance and maturity, besides payment of coupon at maturity). Here are the summary about the bond market: 2

Figure 1: Summary of government bonds in Chinese market Note that bonds are unique to its short name, not symbol (because symbols are different for different market: Interbank, Shanghai Exchange, Shenzhen Exchange, Bank Counter, and Others). Note also that for some bonds, after the first issuance, the government could issue the bonds with same property in primary market, and thus increase the amount outstanding. The second issuance (and so on) could have different symbol than the first bond, although it s on the same market. 3

1.3. Description of Data Currently, there are two sets of data in the database as basis of our project: 1. Static: short name, face value, symbol, market, issue end date, issue amount, carry date, maturity date, coupon frequency, coupon at issue, coupon description, interest reference, tax exemption, tax rate, coupon date description, subordinated, benchmark interest name, special provisions, option-embedded, option-embedded description, spread at issue, redemption date, min coupon adjustment, max coupon adjustment, benchmark method. 2. Dynamic: date, short name, market, open price, previous close price, close price, high price, low price, volume, amount, average, quotes, best bid clean price, best bid yield, best ask clean price, best ask yield, last trading day, settlement data (transaction, average, high, close, open). The blue variables are those that are currently used in the model. The static data is obtained by combining data from Wind Data Explorer (active, delisted, and matured bonds) to excel, then upload the corresponding data to database. This is only updated to 31 October 2016. The dynamic data is obtained from static data and Wind code generator in Python, directly uploaded to database. Hence, static and dynamic data are only updated to 31 October. The problem to our daily update is reduced to the problem of updating the static data automatically, i.e., how to know recently issued bonds? 1.4. Introduction to Models Two steps are crucial in building yield curve for a given day: 1. Filtering: Selection of data from market, i.e., which data of prices/yields should be included? 2. Modeling: Run the model that will generate yield curve from filtered data. The empirical model for yield curve construction, from my perspective, is always compromise between fitting current market data and imposing theoretical/ parametric assumptions. By strictly assuming that the constructed yield curve follows the current market data, we may have to relax some theoretical/ parametric assumptions (no-arbitrage, economic meaning, shape, 4

and smoothness), while by assuming some theoretical/parametric assumptions, we have to accept some noise in market data in the constructed yield curve and have less parameters. Call the extremes to be left and right extremes. It should be noted that, for pricing and hedging purposes, we should not get too much to the right, while for policy making, most countries uses right models which has less number of parametes, as in Bank of International Settlements (2005). Most common examples of right models are Nelson-Siegel (1987) - having only 4 parameters or less to be estimated - and Svennson (1994, 1995) - having only 6 paramaters or less to be estimated. 2. Literature Review In this sections, I will provide selected historical perspective of yield curve construction. By selected, I mean: Only contributions that are relevant to our modelling purpose will be discussed in more details. 2.1. Pre-1970s era Report from Durand (1942) and US Treasury Department (1966), among others, consist only of first type yield curve (yield to maturity vs time). The fitting methods are by hand-fitting: not quantitative and not objective. An attempt was made by Cohen, Kramer, and Waugh (1966), by using regression with one dependent variables and two independent variables. The dependent variable is yield to maturity or log of yield to maturity, the first independent variable is time to maturity of squared time to maturity, and the second independent variable is squared log of time to maturity. Yet, approximating YTM vs time to maturity directly is not rigorous, because of coupon effect: bonds with equal maturity but with different coupon could not be seen having the same YTM. The spot/forward/discounting curve should be more appropiate. 2.2. Introduction to Spline There are several definitions for spline, and for consistency, throughout this project, we use the following definition (if some paper by authors is quoted, we adjust their definition to this definition): Definition 1: For given n > 1 and increasing sequence t 1 < t 2 < < t k called knot points, a function f : [t 1, t k ] R is called n-spline if f (n) is constant on each intervals (t 1, t 2 ), (t 2, t 3 ),... (t k 1, t k ), and f (n 1) is continuous on [t 1, t k ]. 5

The following propositions could be shown by induction: Proposition 1: For n-spline defined above, set of n-spline is also a vector space over R with dimension n + k 1. Define this vector space to be V (n, t 1, t 2,..., t k ). Proposition 2: Let S be subset of {1, 2,... k}. Define W (S, n, t 1, t 2,..., t k ) = {f V (n, t 1, t 2,..., t k ) f(t s ) = 0 s S}. Then W is subspace of V with dim(w ) = n + k S 1. In particular, the space of 3-spline (cubic spline) f s such that f(t s ) = 0 s = 1, 2,... k has dimension 2. Proposition 3 (consequence of well-known result about vector space): Let S be subset of {1, 2,... k}, t 1 < t 2 < < t k be knot points, and f j : [t 1, t k ] R(1 j n + k S 1) be linearly independent n-spline corresponding to the knot points such that f j (t s ) = 0 s S. Then the set {f j 1 j k + 2 S } becomes the basis of vector space W (S, n, t 1, t 2,..., t k ) = {f V (n, t 1, t 2,..., t k ) f(t s ) = 0 s S}. There are several ways to define the basis of the spline. A popular one is B-spline, discussed comprehensively in DeBoor (1978). We could also use McCulloch definition of basis cubic spline. Steeley (1991) suggested that using B-spline may be preferrable if we want to impose some constraints of the curve: Say, limit the first/second derivative, impose monotonicity, etc. 2.3. Breakthrough by McCulloch(1971, 1975) Recall the discounting function d(t). In McCulloch (1971), d(t) is assumed to be 2-spline based on knot points 0 = t 1 < t 2 < < t k such that d(0) = 1. Therefore the family of function d 1 (d minus constant function 1) is a vector space with dimension 2 + k 1 1 = k, according to Proposition 2. This means, all possible d satisfying condition above is set of { k } 1 + a i f i (t) a i R i=1 where f i (t) : [0, t k ] R are k + 1 functions as a choice of basis. One of the disadvantage of assuming the discounting function to be 2-spline is that the forward curves will have some knuckles at knot points, since first derivative of d is not differentiable at knot points. In McCulloch (1975), d(t) is assumed to be cubic spline based on knot points 0 = t 1 < t 2 < < t k such that d(0) = 1. Therefore the family of 6

function d 1 (d minus constant function 1) is a vector space with dimension 3 + k 1 1 = k + 1, according to Proposition 2. This means, all possible d satisfying condition above is set of { k+1 } 1 + a i f i (t) a i R i=1 where f i (t) : [0, t k ] R are k + 1 functions as a choice of basis. McCulloch (1975) proposed a particular choice of (k + 1) basis functions. Now, it is assumed that n P = A j d(s j ) + (w(p )ɛ) for all bonds in the bond sample set. This is equivalent to P = P = j=1 j=1 j=1 n k+1 ) A j (1 + a i f i (s j ) + (w(p )ɛ) i=1 i=1 n k+1 ( n ) A j + a i A j f i (s j ) + (w(p )ɛ) j=1 The parameters need to be estimated are a 1, a 2, a k+1, and this could be estimated by weighted linear regression. McCulloch (1971) proposed that the weight is broker fee plus half of bid-ask price difference, using the quotes data. Now, the natural questions to be asked: How many knot points there should be? How to choose the knot points? Suppose that the ordered bond maturities in the filtered set are m 1 < m 2 < < m n. McCulloch (1975) suggested that knot points are chosen such that the number of knot points is around n 1. Let the number of knot points be k. The choice of knot points 0 = t 1 < t 2 <... t k = m n depends on the sequence m 1 < m 2 < < m n : Suppose we wish to compute t i for 1 i k, then define f(i) = n(i 1)/8. Denote g(i) and h(i) to be integer part and non-integer part of f(i), respectively. Then define t i = m g(i) + h(i)(m g(i)+1 m g(i) ) for 2 i k 1. This will ensure almost equal number of bonds with maturity in each intervals (t i, t i+1 ). We could see that by having more knot points, we will have more parameters to be estimated, and the model goes to left. Moreover, ensuring the 7

equal number of bonds in each intervals is to make the curve between adjacent knot points stable: not sensitive to price change/error in one particular bond for which the maturity is in the interval. 2.4. Contribution by Carleton & Cooper (1976): Direct discrete discounting estimation Carleton & Cooper (1976) assumes that price of coupon bonds could be seen as the sum of several zero-coupon bonds, and hence, linear combination of discounting factors. During 1970s, most of US Government Bonds had coupon payments only in Feb 15, May 15, Aug 15, and Nov 15. Thus, by selecting about 40 bonds maturing no later than 4 years, price of all bonds is linear combination of 16 discounting factors. Standard linear regression with bid, ask, and mean prices for each bond, are used to estimate the discounting factors. Afterwards, spot yield curve and forward yield curve are constructed by linear interpolation of forward rates between two discrete dates. This model is not implementable in our project, since the coupon payment date is irregular in Chinese market, so we have more variables than equations. 2.5. Pioneer of a good Parametric Model: Cooper (1977) Recall that Cohen, Cramer, Waugh (1966) use parametric model, and then regression to construct YTM yield curve. The seed of parametric model could be traced in Echols and Eliott (1976), who assumed that simple forward rate follows f(t) = A exp(bt). From this, it is implied that simple spot rate is linear combination of t, 1/t, and 1. By considering the coupon effect on spot rate, Cohen, Cramer, Waugh (1966) use regression of simple spot rate with dependent variables: 1, t, 1/t, c (coupon). However, this is still a raw estimation. A good contribution was made by Cooper (1977), who assumes that f(t) = A + (B A) exp( Ct), implying that s(t) = A + ((B A)/Ct)(1 exp( Ct)). The parameters A, B, C are estimated that give smallest residual standard errors in price. 2.6. Attempt to improve McCulloch (1971, 1975): Vasicek-Fong (1982) Vasicek-Fong (1982) criticized McCulloch s assumption that discounting factor is cubic spline, while actually it should be exponential-like, then suggested transformation of discounting function. For t 0, define x = 1 exp( αt), and then define G(x) = d(t), for 0 x < 1. This will make α to be long-term forward rate. Since d(0) = 1, d is decreasing, and d(t) 8

as t, G(0) = 1, G(1) = 0 (where G(1) is defined as left limit), and G is decreasing. Now, G is assumed to be cubic spline, with converted knot points 0 = x 1 < x 2 < < x k < x k+1 < 1, where x k = 1 exp( αt k ). Therefore the family of function G 1 + x (G minus constant function 1 plus function y = x) is a vector space with dimension 3 + (k + 1) 2 1 = k + 1. Now, all possible G satisfying conditions above is set of { k+1 } 1 x + a i f i (t) a i R i=1 where f i (t) : [0, 1] R are k + 1 functions as a choice of basis. The parameters need to be estimated are a i s. Define y j = 1 exp( αs j ). Now, it is assumed that n P = A j g(y j ) + (w(p )ɛ) for all bonds in the bond sample set. This is equivalent to P = P = j=1 n A j j=1 j=1 n k+1 ) A j (1 y j + a i f i (y j ) + (w(p )ɛ) j=1 i=1 i=1 n k+1 ( n ) y j A j + a i A j f i (y j ) + (w(p )ɛ) j=1 The parameters need to be estimated are a 1, a 2, a k+1, and this could be estimated by weighted linear regression. Vasicek & Fong (1982) suggested using elasticity (first derivative of price with respect to yield) as weight in linear regression. This paper doesn t provide any empirical test. 2.7. Extension to McCulloch (1971, 1975) and Vasicek & Fong (1982): Smoot (1983) Smoot (1983) in his Ph.D thesis proposes a new method of applying spline directly on spot rate. There are several layers of consideration in this paper: Knot placement, tax models, weight for price error, and the model itself (McCulloch/Vasicek/Smoot). After deciding to choose a particular knot placement and tax model, data from 1971-1982 is tested using all combinations of weight and model. For each choice of weight, the lowest MSE is produced by using the new model: spline on spot rate. However, I doubt 9

that this result could be generalized to our project directly: Firstly, the data could be very different, in terms of time and place. Secondly, Smoot (1983) had already zoom in to a specific knot points and tax models. Therefore, we still consider McCulloch and Vasicek-Fong models for consideration. In the rest of this section, more explanation about this model will be given: Spot rate s(t) is assumed to be cubic spline based on knot points 0 = t 1 < t 2 < < t k. Therefore the family of spot function s(t) is a vector space with dimension 3 + k 1 = k + 1, according to Proposition 2. This means, all possible s satisfying condition above is set of { k+2 } a i f i (t) a i R Now, note that the discounting function is ( k+2 ) d(t) = exp( ts(t)) = exp t a i f i (t) j=1 j=1 i=1 Let there are m bonds in the sample set, denote P l (1 l m) to be its dirty prices, and denote n 1, n 2,..., n m to be number of future payment times of each bond. For l-th bond, let s l,1 < s l,2 < < s l,nl to be time to next coupon/final payments, and A l,1, A l,2,..., A l,nl to be corresponding payment amounts. Note that if we assume spot rate as above, the price of l-th bond should be: n l n l k+2 ) n l k+2 A l,j d(s l,j ) = A l,j exp ( s l,j a i f i (s l,j ) = A l,j exp( s l,j f i (s l,j )) a i Therefore, our aim to minimize sum of weighted square of error: ( m ( nl k+2 ) A l,j exp( s l,j f i (s l,j )) a i P l w(p l ) l=1 j=1 i=1 i=1 where w(p l ) is weight associated with l-th bond. 2.8. Chambers, Carleton, Waldman (1984) The spot rate is assumed to be polynomial: s(t) = J x j t j 1 j=1 10 i=1 j=1 i=1 ) 2

This paper do empirical testing on using 1, 2, 3, 4, 5 degree polynomial, minimizing MSE by nonlinear regression with equal error weight. Futhermore, by arguing that MSE-minimizing do not produce good results (heterodascity), maximum likelihood methods are proposed considering power of time to maturity as weight of variance. In my opinion, the contribution of this paper is not much. Assuming spot rate to be polynomials everywhere may not be good since it may not capture the bend in longer term, contrast to spline methods. 2.9. Nelson-Siegel (1987) and Svennson(1994, 1995) These models are more extreme to the right. NS model assumes that the forward rate follows: f(t) = β 0 + β 1 exp( t/τ) + β 2 (t/τ) exp( t/τ) The first term is relevent for long-term, the second term for short term, and the third term for the medium (hump) term. NSS model is generalization of NS model: f(t) = β 0 + β 1 exp( t/τ 1 ) + β 2 (t/τ 1 ) exp( t/τ 1 ) + β 3 (t/τ 2 ) exp( t/τ 2 ) The last term would be for the second hump. 2.10. Yield Curve Smoothness Concept by Adams and van Deventer (1994) This paper is the pioneer in applying smoothness concept to the forward curve, borrowing the concept from numerical analysis, i.e. a curve f is smooth if T (f (s)) 2 ds is relatively small. Given discounting at discrete 0 points d(t 1 ), d(t 2 ),..., d(t k ), a closed formula is for forward curve is proposed such that: (1) It match with the discounting at discrete points; (2) Among all that satisfy (1), the curve is the smoothest one. The computation in this paper is further corrected by Lim & Xiao (2002). Although this paper only suggested solution given discounting at discrete points, which could only be relevant for Carleton & Cooper (1976), the way to quantify the smoothness would be useful and used in subsequent papers. 2.11. Fisher, Nychka, and Zervos (1994): Estimating and Smoothing simultaneously While Adams and van Deventer (1994) proposes smoothing after estimating the discounting at several discrete points, Fisher, Nychka, and Zervos (1994) is the first to combine the estimation and smoothness. This paper 11

listed three possible estimation: Spline with smoothing on discounting, spline with smoothing on log discounting, and spline with smoothing on forward rates. This entity is to be minimized: Sum of squared difference between model and observed price (with equal weight), in addition to λ T 0 (g (s)) 2 ds, where g could be discounting, log discounting, or forward curves. λ is chosen using Generalized Cross Validation formula. When trying to apply the three methods to data from 1987-1994, this paper says that fitting forward curve provides the most accurate fit. However, this result is very limited by the choice of data set and, more importantly, using only equal weight for price error. Therefore, we could still consider those three methods (the first one is originally from McCulloch (1971, 1975)). On the rest of the section would be the implementation details for cubic spline of forward rates and on log discounting. Spline forward rate: Forward rate f(t) is assumed to be cubic spline based on knot points 0 = t 1 < t 2 < < t k. Therefore the family of forward function f(t) is a vector space with dimension 3 + k 1 = k + 1, according to Proposition 2. This means, all possible f satisfying condition above is set of { k+2 } a i f i (t) a i R i=1 Define g i (t) = t 0 f i(u)du. Now, note that the discounting function is ( d(t) = exp t 0 ) ( k+2 ) f(u)du = exp a i g i (t) Let there are m bonds in the sample set, denote P l (1 l m) to be its dirty prices, and denote n 1, n 2,..., n m to be number of future payment times of each bond. For l-th bond, let s l,1 < s l,2 < < s l,nl to be time to next coupon/final payments, and A l,1, A l,2,..., A l,nl to be corresponding payment amounts. Note that if we assume forward rate as above, the price of l-th bond should be: n l j=1 n l A l,j d(s l,j ) = j=1 ( k+2 A l,j exp i=1 ) a i g i (s l,j ) = i=1 n l k+2 A l,j j=1 i=1 exp( g i (s l,j )) a i 12

Therefore, our aim to minimize sum of weighted square of error: ( m ( nl l=1 j=1 A l,j k+2 exp( g i (s l,j )) a i P l )/w(p l ) i=1 where w(p l ) is weight associated with l-th bond. Spline of log discounting: c(t) = log d(t) is assumed to be cubic spline based on knot points 0 = t 1 < t 2 < < t k such that c(0) = 0. Therefore the family of function c is a vector space with dimension 3+k 1 1 = k +1, according to Proposition 2. This means, all possible d satisfying condition above is set of { k+1 } a i c i (t) a i R i=1 where f i (t) : [0, t k ] R are k + 1 functions as a choice of basis. Now, it is assumed that n P = A j exp(c(s j )) + (w(p )ɛ) j=1 for all bonds in the bond sample set. This is equivalent to P = n ( k+1 ) A j exp(c i (s j )) a i + (w(p )ɛ) j=1 i=1 Therefore, our aim to minimize sum of weighted square of error: ( m ( n ( k+1 ) ) A j exp(c i (s j )) a i + (w(p )ɛ) P l w(p l ) l=1 j=1 i=1 where w(p l ) is weight associated with l-th bond. 2.12. Waggoner(1997) Waggoner generalized the smoothness penalty to T 0 λ(s)(g (s)) 2 ds. In the paper, it is suggested that λ(s) = 0.1 for 0 s 1, λ(s) = 100 for 1 s 100, and λ(s) = 100000 for 10 s, forcing more smoothness on longer term and less smoothness on shorter term. For details on the reasoning, see the paper. 13 ) 2 ) 2

2.13. Lapshin & Wang (2013) This paper proposes a method to fit forward curve such that it is always positive (no-arbitrage), so in this case f(t) = g(t) 2, and g(t) is the curve to be estimated, minimizing weighted sum of square error, in addition to smoothness penalty λ T 0 (g (s)) 2 ds. The result is tested on Chinese market and compared with Chinabond yield curve. 2.14. A Short Reflection Filtering data is a very market-contextual and subjective task, while modelling filtered data to yield curve is a more discussable task: Many authors propose model, others accept/reject by theoretical/empirical results, others improve or combine models in the past, etc. Currently (up to 16 December 2016), we will zoom in to implementation for cubic spline methods without smoothness penalty. It could be summarized in one sentence with four blanks: After filtering the data with (criteria), to construct yield curve, we assume cubic spline on (model), with choice of knot points to be (knot method), minimizing sum of squared difference of observed and model price weighted by (weight method). 2.15. Regarding error weight, and knot points The choice for error weight also depends on the sample data. In our sample, there are data from volume and quotes. Therefore, using weight in McCulloch (bid-ask price difference) is not available. Using equal weight is standard. Using price as weight is suggested in Litzenberger (1982). Using elasticity or duration is suggested in Vasicek & Fong (1982), and several countries central bank in BIS Report 2005. 3. Implementation and Analysis: Part 1 Following previous discussions, the choices are: 1. Criteria: 1, 2, 3 2. Model: McCulloch (spline on discounting), Vasicek-Fong (exponential spline on discounting) 3. Knot method: McCulloch (in short, mc), fixed following Chinabond, custom 14

4. Weight method: equal, price, duration, elasticity Let m 1 < m 2 < < m n be list of time to maturity of all bonds in the bond sample, for a given day. There are about 100 bonds in bond sample, for a given day. Regarding the choice of knot points, currently, there are three alternatives: 1. McCulloch (1975) suggestion: Knot points are chosen such that the number of knot points is around n 1. Therefore, we assume that the number of knot points is 9. The choice of knot points 0 = t 1 < t 2 <... t 9 = m n depends on the sequence m 1 < m 2 < < m n : Suppose we wish to compute t i for 1 1 9, then define f(i) = n(i 1)/8. Denote g(i) and h(i) to be integer part and non-integer part of f(i), respectively. Then define m g(i) + h(i)(m g(i)+1 m g(i) ). This will ensure almost equal number of bonds with maturity in each intervals (t i, t i+1 ). 2. Follow Chinabond key terms, so that the knot points are fixed at 0, 0.167, 0.25, 0.5, 0.75, 1, 2, 3, 5, 7, 10, 15, 20, 30 years. 3. Use custom fixed knot points. For criteria 1, use 0, 0.2, 0.5, 2, 4, 5, 7, 9, 50, while for criteria 3, use 0, 0.167, 0.5, 2, 3.5, 4.5, 6, 8.5, 50 years. By using equal weight, we minimize (P P ) 2 By using price as weight weight, we minimize (P P ) 2 P By using duration as weight, we minimize (P P ) 2 D(P ) By using elasticity as weight, we minimize Note that (P P ) 2 E(P ) E(P ) = P/ y D(P ) = P ( P/ y) 15

3.1. Metric Function In this section, metric functions are introduced to have quantitative measure of distance between function. Definition 2: For two functions f, g : [a, b] R and [c, d] [a, b], define metric of f and g on [c, d] to be m [c,d] (f, g) = d c f(x) g(x) dx Definition 3: For two functions f, g : [a, b] R and [c, d] [a, b], define adjusted metric of f and g on [c, d] to be am [c,d] (f, g) = m [c,d] (f, g)/(d c) Definition 4: For two functions f, g : [a, b] R and [c, d] [a, b], define max-metric of f and g on [c, d] to be 3.2. Regarding the Criteria Criteria 1: mm [ c, d](f, g) = max{f(x) g(x) x [c, d]} 1. Suppose that we want to build yield curve for a given transaction day. Collect all zero-coupon/fixed-coupon bonds with no option from set of active bonds in the category (government/adbc/cdb/export- Import) in that day. 2. For each bond in the collection, if the day is issuance end date of the bond, define the value of the bond to be the issue price in Interbank or Exchange Market; 3. Otherwise, if there is volume in Interbank Market, define the value of the bond to be the average transaction dirty price; 4. Otherwise, if there are quotes in Interbank Market, define the value of the bond to be the average of best bid and best ask dirty price, where the computation of accrued interest assumes settlement on the day of quotes; 5. Otherwise, remove the bond from our collection. 6. In the end, we have collection of bonds with corresponding value. This will serves as daily input in our model. 16

Now, we try to do robustness test on current criteria, i.e., how stable is the daily set of bonds with respect to the criteria? The testing period is May 2016 to October 2016. There are 125 trading days in this period, with one anomaly on 12 June 2016, where there are only about 20 bonds with trading/quotes recorded on Wind, possibly because quotes data are missing on that day. Figure 2: Criteria 1 samples Now, define B 1, B 2,... B 125 be be set of bonds included in 1st, 2nd,..., 125th transaction day in the testing period. Define T NY i = B i \ B i 1 for 2 i 125 (number of bonds in i-th day but not in (i 1)-th day). Furthermore, define T NT i = B i \ B i+1 for 1 i 124 (number of bonds in i-th day but not in (i + 1)-th day). Because of one anomaly, there will be one extreme outliers on each sequences T NY i and T NT i. We take average of sequence T NY i, resulting in 3.179, and for T NT i is 3.154. This is far above average number of new issuance per day which is 0.296. This means, by comparing set of bonds on a given day to previous day, about three bonds have been excluded and three new bonds have been included. By looking at Criteria 1, four disadvantages could be seen: 17

It may include an illiquid bond if there is volume on a given day, although there had been no trading for long time before. It may exclude a liquid bond if there is no volume/quotes on a given day, although there had been trading for long time before. It may include bonds with price anomaly: Large bid-ask price difference, or suddenly-jump transaction price. From observation, there is usually no volume/quotes record on the transaction day after the issuance end time, although afterwards, the bond may be very liquid. Define Criteria 2 as follows: 1. Suppose that we want to build yield curve for a given transaction day. Collect all zero-coupon/fixed-coupon bonds with no option from set of active bonds in the category (government/adbc/cdb/export- Import) in that day. 2. For each bond in the collection, if the day or previous trading day is issuance end date of the bond, define the value of the bond to be the corresponding dirty price of issue price in Interbank or Exchange Market; 3. Otherwise, if there is volume in Interbank Market on that day and previous day and (one of previous-previous or previous-previous-previous day), define the value of the bond to be the average transaction dirty price; 4. Otherwise, if there are quotes in Interbank Market, define the value of the bond to be the average of best bid and best ask dirty price, where the computation of accrued interest assumes settlement on the day of quotes; 5. Otherwise, remove the bond from our collection. 6. In the end, we have collection of bonds with corresponding value. This will serves as daily input in our model. 18

Figure 3: Criteria 2 samples We now compute the average (removing extreme outlier) of sequence T NY i, resulting in 1.236, and for T NT i is 1.260. The average number of new issuance per day which is 0.296. We could see that the robustness improved compared to Criteria 1. Define Criteria 3 as follows: 1. Suppose that we want to build yield curve for a given transaction day. Collect all zero-coupon/fixed-coupon bonds with no option from set of active bonds in the category (government/adbc/cdb/export- Import) in that day. 2. For each bond in the collection, if the day or previous trading day is issuance end date of the bond, define the value of the bond to be the corresponding dirty price of issue price in Interbank or Exchange Market; 3. Otherwise, if there is volume in Interbank Market on that day and previous day and (one of previous-previous or previous-previous-previous day), define the value of the bond to be the average transaction dirty price; 4. Otherwise, if there are quotes in Interbank Market for which the bidask yield spread not exceeding 50 bps, define the value of the bond to be 19

the average of best bid and best ask dirty price, where the computation of accrued interest assumes settlement on the day of quotes; 5. Otherwise, remove the bond from our collection. 6. In the end, we have collection of bonds with corresponding value. This will serves as daily input in our model. Figure 4: Criteria 3 samples We now compute the average (removing extreme outlier) of sequence T NY i, resulting in 1.667, and for T NT i is 1.593. The average number of new issuance per day which is 0.296. We could see that the robustness improved compared to Criteria 1. Furthermore, stress test is done on the yield curve, using a hypothetical, but realistic condition. The question to be asked is: Suppose three bonds (one with maturity about 0.01, one with maturity about 1.2, and one with maturity about 6) are removed from the set of bonds using Criteria 1 on a given day. The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate derived by removing 3 bonds (all figures are in bps): 20

Figure 5: Criteria 1 remove 3 bonds maturity about 0.01, 1.2, 6 From observations: 1. When removing 3 bonds from the sample list, the shift will be relatively local and not that much (less than 1 bps). Theoretical explanation: Sample size is large 2. The shift effects become even smaller when using duration or elasticity as weight (less than 0.5 bps). Theoretical explanation: More weight put into shorter term bonds, and there are many of them, so removing 3 will not affect much. 3. The shift effects become relatively smaller when using custom or mc as knot methods. Theoretical explanation: using fixed knot methods, sparse data of maturities in one interval make it sensitive. Furthermore, stress test is done on the yield curve, using a hypothetical, but realistic condition. The question to be asked is: Suppose one bond with maturity around 0.5 year are removed from the set of bonds using Criteria 3 on a given day. The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate derived by removing 1 bond (all figures are in bps): 21

Figure 6: Criteria 3 remove 1 bond maturity about 0.5 From observations: 1. When removing 1 bond from the sample list, the shift will be relatively local and not that much (less than 0.5 bps). Theoretical explanation: Sample size is large. 2. The shift effects become even smaller when using duration or elasticity as weight. Theoretical explanation: More weight put into shorter term bonds, and there are many of them, so removing 3 will not affect much. 3. Theoretically, since the input is more stable for Criteria 3, the yield curve will be more stable. Comparison of Criteria 1 vs Criteria 3: The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate when changing from Criteria 1 to Criteria 3: Figure 7: Criteria 1 vs Criteria 3 From observations: 22

1. Changing the Criteria 1 to Criteria 3 shifts the curve much especially in for maturity larger than 10 years. Theoretical explanation: Little trading/quotes data for long-term bonds, some may be outlier addressed in Criteria 3. 3.3. Regarding the knot methods Firstly, let us see how many bond s maturities in each intervals of Chinabond knot points, for each Criteria. The table is taken by taking average from May to Oct 2016: Figure 8: Average number of maturities in Chinabond knot points We could see that some intervals are very sparse: 0-2 data available, making the curve on that intervals very sensitive to the sample data. This is one of disadvantange of adopting Chinabond fixed knot points, and could be solved by McCulloch knot points: choosing knot points such that there are almost equal number of bond s maturities on each intervals. Let me start with giving example on how Mcculloch knot method work. On 31 Oct 2016, this is the distribution of maturities on bonds in the sample: Figure 9: Distribution 0-20 years Figure 10: Distribution 20-50 years In this case, the knot points would be: 0, 0.23, 0.863, 2.479, 3.693, 4.978, 6.559, 9.51, 49.559. If the distribution of maturities change, the knot points will also change. So the knot points would change daily. Possibly this makes 23

the curve using Mcculloch knot points unstable from day to day (although this needs further research). I am then motivated to define custom knot points for each criteria, such that it is fixed and there are almost equal number of bond s maturity on each intervals. For criteria 1, use 0, 0.2, 0.5, 2, 4, 5, 7, 9, 50, while for criteria 3, use 0, 0.167, 0.5, 2, 3.5, 4.5, 6, 8.5, 50 years. Here are the average for May-Oct 2016: Figure 11: Custom knot points We use 9 knot points in McCulloch choice of knot points currently. The question now: What if we change to 10 points? The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate when changing from 9 knot points to 10 knot points: Figure 12: Stress Testing McCulloch knot points from 9 to 10 24

From observations: 1. The shift is quite large for 0-1 year, and moderate for other intervals. 2. The shift becomes very small using duration or elasticity as weight. How does three knot methods compare to each other? The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate, for pairwise comparison among Chinabond fixed, Mcculloch, and custom knot methods: 25

Figure 13: Comparison among 3 knot methods From observations: 1. Moderate to large shift when changing from fixed to mc or fixed to custom. 2. Low to moderate shift when changing from mc to custom. 3. The shift become small when using duration or elasticity as weight. 26

3.4. Regarding the weight methods By giving more weight for a bond, we allow more uncertainty/noise of the bond prices. Let us look at bid-ask price difference for quotes recorded in Criteria 1 on one day. The trend is: As maturity increases, price spread increases, allowing for more uncertainty. Figure 14: Bid-ask price difference vs time Let us look at bid-ask yield difference for quotes recorded in Criteria 1 on one day. There is no trend that could be seen here. Figure 15: Bid-ask yield difference vs time Therefore, theoretically, using duration or elasticity as weight is better 27

than using equal or price as weight. Indeed, by using elasticity, the error, neglecting convexity effect, is approximately the error term in the regression will be approximately yield error. The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate, for pairwise comparison equal vs duration, equal vs price, and duration vs elasticity: Figure 16: Weight comparisons From observations: 1. Not much shift when changing from using equal to price as weight, or duration to elasticity as weight, except on some intervals in Criteria 1. 2. Much shift when changing from equal to duration, especially for shorter term. 28

3.5. Stress Testing alpha in Vasicek-Fong What if we change long-term forward rate α in Vasicek-Fong from 0.035 to 0.030 or 0.040? The testing period is October 2016Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate, for pairwise comparison between α 0.035 vs 0.030 and 0.035 vs 0.040: From observations: Figure 17: Stress Testing alpha 1. In all cases, choosing different alpha will not change the curve much. 3.6. Regarding the model How does Mcculloch compare to Vasicek-Fong? The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate, for McCulloch vs Vasicek-Fong: 29

Figure 18: Models comparison From observations: 1. Changing mcculloch to vasicek-fong method will not shift the curve much, but the shift become larger as maturities become larger. Theoretical explanation: alpha (long-term forward rate) will pull the curve from infinity. 2. Advantage of vasicek-fong: We could determine long-term forwards rates, and this choice of alpha will not be sensitive, until 30 years. 3.7. Smooothness The testing period is October 2016. Here are the summary of results of smoothness calculated by Adams & van Deventer (1994) measure: 30

From observations: Figure 19: Smoothness comparison 1. More smooth when using criteria 3. 2. More smooth when using duration or elasticity as weight. 3. More smooth when using mc or custom as knot methods. 3.8. Conclusions From observations, using equal or price as weight could be rejected (with theoretical and empirical reasons). Furthermore, using fixed Chinabond knot points will also be rejected (with theoretical and empirical reasons). Although it could be seen that Criteria 3 improves from Criteria 1, Criteria 3 also has disadvantages: It may exclude data from volume but still including the quotes. We will still consider using Criteria 1 and 3 in further testing. 4. Implementation and Analysis: Part 2 Following previous discussions, the choices are: 31

1. Criteria: 1, 3 2. Model: McCulloch (spline on discounting), Vasicek-Fong (exponential spline on discounting), Smoot (spline of spot), spline on log discounting, spline on forward. 3. Knot method: McCulloch (in short, mc), custom according to the criteria 4. Weight method: duration, elasticity 4.1. Regarding the Criteria Stress test is done on the yield curve, using a hypothetical, but realistic condition. The question to be asked is: Suppose three bonds (one with maturity about 0.01, one with maturity about 1.2, and one with maturity about 6) are removed from the set of bonds using Criteria 1 on a given day. The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate derived by removing 3 bonds (all figures are in bps): Figure 20: Criteria 1 remove 3 bonds maturity about 0.01, 1.2, 6 From observations: 1. When removing 3 bonds from the sample list, the shift will be relatively local and not that much (less than 1 bps). Theoretical explanation: Sample size is large. Suppose one bond with maturity around 0.5 year are removed from the set of bonds using Criteria 3 on a given day. The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate derived by removing 1 bond (all figures are in bps): 32

Figure 21: Criteria 3 remove 1 bond maturity about 0.5 From observations: 1. When removing 1 bond from the sample list, the shift will be relatively local and not that much (less than 0.3 bps). Theoretical explanation: Sample size is large. 2. Theoretically, since the input is more stable for Criteria 3, the yield curve will be more stable. Comparison of Criteria 1 vs Criteria 3: The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate when changing from Criteria 1 to Criteria 3: Figure 22: Criteria 1 vs Criteria 3 From observations: 1. Changing the Criteria 1 to Criteria 3 shifts the curve much especially in for maturity 10 years. Theoretical explanation: Little trading/quotes data for long-term bonds, some may be outlier addressed in Criteria 3. 33

4.2. Regarding the knot methods We use 9 knot points in McCulloch choice of knot points currently. The question now: What if we change to 10 points? The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate when changing from 9 knot points to 10 knot points: Figure 23: Stress Testing McCulloch knot points from 9 to 10 From observations: 1. The shift is moderate to small in all cases How does three knot methods compare to each other? The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate, for pairwise comparison among Chinabond fixed, Mcculloch, and custom knot methods: 34

Figure 24: Comparison among 3 knot methods From observations: 1. For Criteria 1, moderate shift for short term and small shift for longer term when changing knot methods from mc to custom. 2. For Criteria 3, small shift for all term when changing knot methods from mc to custom. 3. Using custom could appeal because of knot points stability on daily basis, but this need further research. 4.3. Regarding the weight methods The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate, using weight duration vs elasticity: 35

Figure 25: Weight comparisons From observations: 1. Not much shift when changing from elasticity to duration as weight, except for longer maturity in Criteria 1. 4.4. Stress Testing alpha in Vasicek-Fong What if we change long-term forward rate α in Vasicek-Fong from 0.035 to 0.030 or 0.040? The testing period is October 2016Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate, for pairwise comparison between α 0.035 vs 0.030 and 0.035 vs 0.040: 36

Figure 26: Stress Testing alpha From observations: 1. In all cases, choosing different alpha will not change the curve much. 4.5. Regarding the model How does Mcculloch compare to Vasicek-Fong? The testing period is October 2016. Here are the summary of results of adjusted metric and maximum metric for different intervals of spot rate, for McCulloch vs Vasicek-Fong: 37

Figure 27: Models comparison 38

Figure 28: Models comparison From observations: 1. Not much shift when using either mc, vf, logdisc. 2. Larger shift for longer maturity when using forward model. 39

4.6. Stress Testing by chaning YTM of one bond ±1% The testing period is Oct 2016. Here are the average shift when changing YTM of one bond with maturity about 0.2 years, by one percent: Figure 29: YTM one bond (maturity 0.2 years) plus one percent Here are the average shift when changing YTM of one bond with maturity about 0.2 years, by minus one percent: Figure 30: YTM one bond (maturity 0.2 years) minus one percent 40

Here are the average shift when changing YTM of one bond with maturity about 4 years, by one percent: Figure 31: YTM one bond (maturity 4 years) plus one percent Here are the average shift when changing YTM of one bond with maturity about 4 years, by minus one percent: Figure 32: YTM one bond (maturity 4 years) minus one percent Here are the average shift when changing YTM of one bond with largest maturity in sample, by one percent: 41

Figure 33: YTM one bond (maturity largest in sample) plus one percent Here are the average shift when changing YTM of one bond with maturity about 0.2 years, by minus one percent: Figure 34: YTM one bond (maturity largest in sample) minus one percent From observations: 1. In all cases, stressing YTM of a sample only affects the curve locally. 42

4.7. Smooothness The testing period is October 2016. Here are the summary of results of smoothness calculated by Adams & van Deventer (1994) measure: Figure 35: Smoothness comparison From observations: 1. Less smoothness when using smoot or forward may be inconclusive, could be affected by shorter term bend. Use Waggoner (1997) for further research. 2. More smooth when using criteria 3 4.8. Conclusions There are tradeoffs of using McCulloch knot points and custom knot points. Using Criteria 3 will exclude outlier in volume, but still include quotes data. Further improvements could be made on the Criteria. Not much difference when using duration vs elasticity. 43

5. Discussions and Future Work In the future, the criteria could be improved, using ranking process as in Chinamoney/SSI yield curves. Furthermore, smoothness penalty could be implemented for some curves, but the optimization would be non-trivial anymore and would take longer time. Imposing some conditions could also be done: Assuming spot/forward rate to be fixed at several points. Or, we could try to do the same test for policy banks/corporate bonds first? Exploration for other models could also be done. By comparing price using the model curve and Chinabond yield curve in the future, we could learn why there is consistent difference between both of them. 5.1. Optimization Problem In this section, the following optimization problem is stated, and will be used repeatedly afterwards.given positive integers n 1, n 2,..., n m and real numbers P 1, P 2,... P m. Besides these, given positive real numbers a i,j,l for 1 i n l, 1 l m and 0 j k, and we try to find real numbers (β 1, β 2,..., β k ) that minimize m n l ( a i,0,l a β 1 i,1,l aβ 2 i,2,l aβ k i,k,l P l) 2 l=1 i=1 Currently, this optimization problem is solved using scipy.optimize.minimize, in particular, Newton-Conjugate-Gradient algorithm. Currently, this method could not give confidence that the optimized point would be global optimum (using the above method with zero vector as starting point), although when some cases are tried using different starting points, it indeed converge to the same point. Further research need to be done on this issue. 5.2. How Chinabond construct the spot yield curve? Lapshin and Wang (2013) and other sources noted that there are 4 steps: 1. Filtering of data on several bonds with maturity near key terms: Removing outliers from quotes significantly different from history, etc. 2. Augmenting trading data with expert estimates and sometimes historical data. 44

3. Key yields are chosen for interpolation, then estimate YTM at this key terms. Then there will be approximation of spot rate from YTM at those key terms. 4. Use monotone cubic spline interpolation (this will be unique) from key points before to obtain the whole curve. The way they estimate YTM first rather that estimating the spot rates directly is a non-rigorous step. Their advantage is having more data (for example, details of market maker and transaction/quotes time). 6. Bibliography [1] Adams, K.J., van Deventer, D.R., Fitting Yield Curves and Forward Rate Curves with Maximum Smoothness. Journal of Fixed Income, June 1994. [2] Bank of International Setllements, Zero-Coupon Yield Curves: Technical Documentation. BIS Papers No 25, October 2005. [3] Carleton, W.T., Cooper, I.A., Estimation and Uses of the Term Structure of Interest Rates. The Journal of Finance, Vol XXXI, No 4, September 1976. [4] Chambers, D.R., Carleton, W.T., Waldman, D.W., A New Approach to the Estimatiomn of the Term Structure of Interest Rates. Jpurnal of Financial and Quantitative Analysis, Vol 19, No 3, September 1984. [5] China Central Depository and Clearing Co., Ltd., Chinabond Pricing System. 2013. [6] Cohen, K.J., Kramer, R.L., Waugh, W.H., Regressin Yield Curves for U.S. Government Securities. Management Science, December 1966. [7] Cooper, I.A., Asset Values, Interest-Rate Changes, and Duration. The Journal of Financial and Quantitative Analysis, Vol 12, No 5, December 1977. [8] de Boor, C., A Practical Guide to Splines. Applied Mathematical Sciences Volume 27, first edition 1978, second edition 2001. [9] Durand, D., Basic Yields of Corporate Bonds, 1900-1942. National Bureau of Economic Research, June 1942. 45