Technische Universität Ilmenau Institut für Mathematik

Tehnishe Universität Ilmenau Institut für Mathematik Preprint No. M 09/23 The Repeater Tree Constrution Problem Bartoshek, Christoph; Held, Stephan; Maßberg, Jens; Rautenbah, Dieter; Vygen, Jens 2009 Impressum: Hrsg.: Leiter des Instituts für Mathematik Weimarer Straße 25 98693 Ilmenau Tel.: +49 3677 69 3621 Fax: +49 3677 69 3270 http://www.tu-ilmenau.de/ifm/ ISSN xxxx-xxxx

The Repeater Tree Constrution Problem C. Bartoshek 1, S. Held 1, J. Maßberg 1, D. Rautenbah 2, and J. Vygen 1 1 Forshungsinstitut für Diskrete Mathematik, Universität Bonn, Lennéstr. 2, D-53113 Bonn, Germany, emails: {bartosh,held,massberg,vygen}@or.uni-bonn.de 2 Institut für Mathematik, TU Ilmenau, Postfah 100565, D-98684 Ilmenau, Germany email: {dieter.rautenbah}@tu-ilmenau.de Abstrat A tree-like substruture on a omputer hip whose task it is to arry a signal from a soure iruit to possibly many sink iruits and whih onsists only of wires and so-alled repeater iruits is alled a repeater tree. We present a mathematial formulation of the optimization problems related to the onstrution of suh repeater trees. Furthermore, we prove theoretial properties of a simple iterative proedure for these problems whih was suessfully applied in pratie. Keywords: VLSI design; repeater tree; Steiner tree; minimum spanning tree AMS subjet lassifiation: 05C05, 05C85, 68W25, 68W35 1 Introdution During every omputation yle of a modern highly omplex omputer hip millions of signals have to travel between iruits at different loations on the hip area. While for most of these signals the distanes are relatively small and an be bridged by a pure metal onnetion between the iruits, there are still many signals whih have to travel a relatively long distane. Elementary physial onsiderations [5] imply that the delay of an eletrial signal propagating along a metal onnetion approximately grows quadratially with the traversed distane. Traditionally, the iruit delay dominated the wire delay and this quadrati growth did not represent a problem. Nowadays though, due to the ontinuous shrinking of feature sizes [4, 10], an ever growing part of the total delay is aused by wires, and long metal onnetions have to be split into several parts by inserting so-alled repeaters. These repeaters just evaluate the boolean identity funtion and serve no logial purpose within the omputation of the hip. Their task is only to linearize the delay as a funtion of the distane. It is estimated [11] that for the upoming 45nm and 32nm tehnologies up to 35% and 70%, respetively, of all iruits on a hip might have to be repeaters. A tree-like substruture on a hip whose task it is to arry a signal from a soure iruit to possibly many sink iruits and whih onsists only of wires and repeaters is alled a repeater tree. In [2, 3] we proposed algorithms for the onstrution of repeater tree topologies and for the atual insertion of repeater iruits into these topologies. During this researh we oneived a simple yet relatively aurate delay model whih allows a onise mathematial 1

exat delay after buffering and sizing (ns 2 1.5 1 0.5 0 0 0.5 1 1.5 2 estimated delay (ns Figure 1: Quality of the delay model formulation of the repeater tree problem. The purpose of the present paper is to present this formulation, to explain the main optimization goals, and to prove some theoretial properties of the algorithms in [2, 3]. 2 The Repeater Tree Problem An instane of the repeater tree (topology problem onsists of a soure r R 2, a finite non-empty set S R 2 of sinks, a required arrival time a s R for every sink s S, and two numbers,d R >0. A feasible solution of suh an instane is a rooted tree T = (V (T,E(T with vertex set {r} S I where I R 2 is a set of S 1 points suh that r is the root of T and has exatly one hild, the elements of I are the internal verties of T and have exatly two hildren eah, and the elements of S are the leaves of T. In [2, 3] suh a feasible solution was alled a repeater tree topology, beause the number, types, and positions of the atual repeaters are not yet determined. The optimization goals for a repeater tree are related to the wiring, to the number of repeater iruits, and to the timing. We assume that every edge e = (u,v E(T of T is realized along a path between the two points u and v in the plane whih is shortest with respet to some norm on R 2. Furthermore, we assume that repeaters are inserted in a relatively uniform way into all wires in order to linearize the delay within the repeater tree. Hene the wiring and also the number of repeater iruits needed for the physial realization of the edge e are proportional to u v. For the entire repeater tree topology, this result in a total ost of l(t := u v. (u,v E(T 2

The delay of the signal starting at the root and travelling through T to the sinks has two omponents. Let E[r,s] denote the set of edges on the path P in T between the root r and some sink s S. The linearized delay along the edges of P is modelled by d u v. (u,v E[r,s] Furthermore, every internal vertex on P orresponds to a bifuration whih auses an additive delay of along P. For the entire path P, these additional delays sum up to ( E[r,s] 1. In pratie there is sometimes a ertain degree of freedom how to distribute the additional delay aused by a bifuration to the two branhes [9]. Altogether, we estimate the delay of the signal along P by the sum of these two omponents. Assuming that the signal starts at time 0 at the root, the slak at some sink s S in T is estimated by σ(t,s := a s d u v ( E[r,s] 1 and the worst slak equals (u,v E[r,s] σ(t := min{σ(t,s s S}. The restritions on the number of hildren of the root and the internal verties of T imply that the number of sinks ontributes logarithmially to the delay, whih orresponds to physial experiene. The auray of our simple delay estimation is shown in Figure 1, whih ompares our estimation with the real physial delay one the repeater tree has been realized and optimized. The parameters and d are tehnology-dependent. For the 65nm tehnology their values are about = 20ps and d = 220ps/mm. In priniple, a repeater tree topology is aeptable with respet to timing if σ(t is nonnegative, i.e. the signal arrives at every sink s S not later than a s. Nevertheless, in order to aount for inaurate estimations and manufaturing variation, the worst slak σ(t should have at least some reasonable positive value σ min or should even be maximized. We an formulate three main optimization senarios: Determine T suh that (O1 σ(t is maximized, or (O2 l(t is minimized, or (O3 for suitable onstants α,β,σ min > 0, the expression is maximized. α min{σ(t,σ min } βl(t While senario (O1 is reasonable for instanes whih are very timing ritial, senario (O2 is reasonable for very timing unritial instanes. Senario (O3 is probably the pratially most relevant one. In the next setion, we will show that (O1 an be solved exatly in polynomial time. In ontrast to that, (O2 is hard even for restrited hoies of the norm suh as the l 1 -norm, sine it is essentially the Steiner tree problem [6]. 3

3 A Simple Proedure and its Properties In [2, 3] we onsidered the following very simple proedure for the onstrution of repeater tree topologies. Choose a sink s 1 S; V (T 1 {r,s 1 }; E(T 1 {(r,s 1 }; T 1 (V (T 1,E(T 1 ; n S ; for i = 2 to n do Choose a sink s i S \ {s 1,s 2,...,s i 1 }, an edge e i = (u,v E(T i 1, and an internal vertex x i R 2 ; V (T i V (T i 1. {x i }. {s i }; E(T i (E(T i 1 \ {(u,v} {(u,x i,(x i,v,(x i,s i }; T i (V (T i,e(t i ; end The proedure inserts the sinks one by one aording to some order s 1,s 2,...,s n starting with a tree ontaining only the root r and the first sink s 1. The sinks s i for i 2 are inserted by subdividing an edge e i with a new internal vertex x i and onneting x i to s i. The behaviour of the proedure learly depends on the hoie of the order, the hoie of the edge e i, and the hoie of the point x i R 2. In view of the large number of instanes whih have to be solved in an aeptable time [2, 3] the simpliity of the above proedure is an important advantage for its pratial appliation. Furthermore, implementing suitable rules for the hoie of s i, e i, and x i allows to pursue and balane various pratial optimization goals. We present two variants (P1 and (P2 of the proedure orresponding to the above optimization senarios (O1 and (O2, respetively. (P1 The sinks are inserted in an order of non-inreasing ritiality, where the ritiality of a sink s S is quantified by (a s d r s. (Note that this is the estimated worst slak of a repeater tree topology ontaining only the one sink s. Sine a sink s an be ritial beause its required arrival time a s is small and/or beause its distane r s to the root is large, this is a reasonable measure for its ritiality. During the i-th exeution of the for-loop, the new internal vertex x i is always hosen at the same position as r formally this turns V (T i into a multiset and the edge e i is hosen suh that σ(t i is maximized. (P2 s 1 is hosen suh that r s 1 = min{ r s s S} and during the i-th exeution of the for-loop, s i, e i = (u,v, and x i are hosen suh that is minimized. l(t i = l(t i 1 + u x i + x i v + x i s i u v 4

Theorem 1 The largest ahievable worst slak σ opt equals { σ (S := max σ R } 2 1(as d r s σ 1, s S and (P1 generates a repeater tree topology T (P1 with σ ( T (P1 = σopt. Proof: Let a s = a s d r s for s S. Let T be an arbitrary repeater tree topology. By the definition of σ(t and the triangle-inequality for, we obtain E[r, s] 1 1 a s d u v σ(t 1 ( a s σ(t (u,v E[r,s] for every s S. Sine the unique hild of the root r is itself the root of a binary subtree of T in whih eah sink s S has depth exatly E[r,s] 1, Kraft s inequality [8] implies s S 2 1 (as σ(t s S 2 E[r,s] +1 1. By the definition of σ (S, this implies σ(t σ (S. Sine T was arbitrary, we obtain σ opt σ (S. It remains to prove that σ ( T (P1 = σopt = σ (S, whih we will do by indution on n = S. For n = 1, the statement is trivial. Now let n 2. Let s n be the last sink inserted by (P1, i.e. a s n = max{a s s S}. Let S = S \ {s n }. Claim ( σ (S fra { ( } a fra s s S (1 where fra(x := x x denotes the frational part of x R. Proof of the laim: Note that the definition of σ (S implies that 1 (a s σ (S is an integer for at least one s S. If the laim is false, then 1 ( a sn σ (S Z and 1 (a s σ (S Z for every s S. Sine a s n σ (S a s σ (S for every s S, this implies and hene 1 ( a sn σ (S 1 ( > max{ a s σ (S } s S 2 1 (a s σ (S 1 2 1 (a sn σ (S. s S Now, for some suffiiently small ɛ > 0, we obtain 2 1 (a s (σ (S+ɛ = 2 1 (a sn σ (S +1 + 2 1 (a s σ (S 1 s S s S whih ontradits the definition of σ (S and ompletes the proof of the laim. 5

Let T (P1 denote ( the tree produed by (P1 just before the insertion of the last sink s n. By indution, σ T (P1 = σ (S. First, we assume that there is some sink s S suh that within T (P1 1 E[r,s ( ] 1 < a s σ (S. Choosing e n as the edge of T (P1 leading to s, results in a tree T suh that whih implies σ ( T (P1 = σopt = σ (S. Next, we assume that within T (P1 σ (S σ opt σ ( T (P1 σ(t = σ (S σ (S, E[r,s] 1 = 1 ( a s σ (S for every s S. This implies 2 1 (a s σ (S > 2 1 (a s σ (S = 1 s S s S and hene σ (S < σ (S. By (1, we obtain { ( σ { ( }} a σ (S max σ σ < σ (S, fra fra s s S { ( σ σ = max σ σ < σ (S (S { ( a, fra fra s σ (S }} s S { = max x x < σ (S (, fra x σ (S { ( a fra s σ (S }} s S ( σ (S { ( a = 1 + max fra s σ (S } s S = σ (S (1 δ for If s S is suh that { ( a δ = max fra s σ (S ( a δ = fra s σ (S, s S }. then hoosing e n as the edge of T (P1 leading to s, results in a tree T suh that σ (S σ opt σ ( T (P1 σ(t = σ (S (1 δ σ (S, whih implies σ ( T (P1 = σopt = σ (S and ompletes the proof. Theorem 2 (P2 generates a repeater tree topology T for whih l(t is at most the total length of a minimum spanning tree on {r} S with respet to. 6

Proof: Let n = S and for i = 0,1,...,n, let T i denote the forest whih is the union of the tree produed by (P2 after the insertion of the first i sinks and the remaining n i sinks as isolated verties. Note that T 0 has vertex set {r} S and no edge, while for 1 i n, T i has vertex set {r} S {x j 2 j i} and 2i 1 edges. Let F 0 = (V (F 0,E(F 0 be a spanning tree on V (F 0 = {r} S suh that l(f 0 = uv E(F 0 u v is minimum. For i = 1,2,...,n, let F i = (V (F i,e(f i arise from ( V ( T i,e(fi 1 E ( T i by deleting an edge e E(F i 1 E(F 0 whih has exatly one endvertex in V (T i 1 suh that F i is a tree. (Note that this uniquely determines F i. Sine (P2 has the freedom to use the edges of F 0, the speifiation of the insertion order and the loations of the internal verties in (P2 imply that Sine F n = T n the proof is omplete. l(f 0 l(f 1 l(f 2... l(f n. For the l 1 -norm, the well-known result of Hwang [7] together with Theorem 2 imply that (P2 is an approximation algorithm for the l 1 -minimum Steiner tree on the set {r} S with approximation guarantee 3/2. We have seen in Theorems 1 and 2 that different insertion orders are favourable for different optimization senarios suh as (O1 and (O2. Alon and Azar [1] gave an example showing that for the online retilinear Steiner tree problem the best approximation ratio we an ahieve is Θ(log n/log log n, where n is the number of terminals. Hene inserting the sinks in an order disregarding the loations, like in (P1, an lead to long Steiner trees, no matter how we deide where to insert the sinks. The next example shows that inserting the sinks in an order different from the one onsidered in (P1 but still hoosing the edge e i as in (P1 results in a repeater tree topology whose worst slak an be muh smaller than the largest ahievable worst slak. Example 3 Let = 1, d = 0 and a N. We onsider the following sequenes of a s and 0 s A(1 = ( a, 0, A(2 = (A(1, a, 0, A(3 = (A(2, a, 0,......,0, }{{} 1+(2 1 1(a+2 A(4 = (A(3, a,0,......,0,..., }{{} 1+(2 2 1(a+2 i.e. for l 2, the sequene A(l is the onatenation of A(l 1, one a, and a sequene of 0 s of length 1 + ( 2 l 2 1 (a + 2. 7

If the entries of A(l are onsidered as the requires arrival times of an instane of the repeater tree topology problem, then Theorem 1 together with the hoie of and d imply that the largest ahievable worst slak for this instane equals ( ( l log 2 l2 a ( + 1 + 1 + (2 i 2 1(a + 2 2 0. i=2 For l = a + 1 this is at least 2 a log 2 (a + 2. If we insert the sinks in the order as speified by the sequenes A(l, and always hoose the edge into whih we insert the next internal vertex suh that the worst slak is maximized, then the following sequene of topologies an arise: T(1 is the topology with two exatly sinks at depth 2. The worst slak of T(1 is (a + 2. For l 2, T(l arises from T(l 1 by (a subdividing the edge of T(l 1 inident with the root with a new vertex x, (b appending an edge (x,y to x, ( attahing to y a omplete binary tree B of depth l 2, (d attahing to one leaf of B two new leaves orresponding to sinks with required arrival times a and 0, and (e attahing to eah of the remaining 2 l 2 1 many leaves of B a binary tree whih has a + 2 leaves, all orresponding to sinks of arrival times 0, whose depths in are 1,2,3,...,a 1,a,a + 1,a + 1. Note that this uniquely determines T(l. Clearly, the worst slak in T(l equals a (l + 1. Hene for l = a + 1, the worst slak equals 2a 2, whih differs approximately by a fator of 2 from the largest ahievable worst slak as alulated above. This example, however, does not show that there is no online algorithm for approximately maximizing the worst slak, say up to an additive onstant of. It is an open question to find a biriteria approximation algorithm, or an algorithm for (O3. Referenes [1] N. Alon and Y. Azar, On-line Steiner trees in the Eulidean plane, Disrete and Computational Geometry 10 (1993, 113 121. [2] C. Bartoshek, S. Held, D. Rautenbah, and J. Vygen, Effiient generation of short and fast repeater tree topologies, in: Proeedings of the International Symposium on Physial Design (2006, 120 127. [3] C. Bartoshek, S. Held, D. Rautenbah, and J. Vygen, Fast buffering for optimizing worst slak and resoure onsumption in repeater trees, in: Proeedings of the International Symposium on Physial Design (2009, 43 50. [4] J. Cong, An interonnet-entri design flow for nanometer tehnologies, in: Proeedings of the IEEE 89 (2001, 505 528. [5] W.C. Elmore, The transient response of damped linear networks with partiular regard to wideband amplifiers, Journal of Applied Physis 19 (1948, 55 63. [6] M.R. Garey, and D.S. Johnson, The retilinear Steiner tree problem is NP-omplete, SIAM Journal on Applied Mathematis 32 (1977, 826 834. 8

[7] F.K. Hwang, On steiner minimal trees with retilinear distane, SIAM Journal of Applied Mathematis 30 (1976, 104 114. [8] L.G. Kraft, A devie for quantizing grouping and oding amplitude modulated pulses, Master thesis, EE Dept., MIT, Cambridge 1949. [9] J. Maßberg and D. Rautenbah, Binary trees with hoosable edge lengths, to appear in Information Proessing Letters. [10] G.E. Moore, Cramming more omponents onto integrated iruits, Eletronis 38 (1965, 114 117. [11] P. Saxena, N. Menezes, P. Cohini, and D. Kirkpatrik, The saling hallenge: an orret-by-onstrution design help?, in: Proeedings of the International Symposium on Physial Design (2003, 51 58. 9