STOCHASTIC COALESCENCE IN LOGARITHMIC TIME. BY PO-SHEN LOH AND EYAL LUBETZKY Carnegie Mellon University and Microsoft Research

Size: px

Start display at page:

Download "STOCHASTIC COALESCENCE IN LOGARITHMIC TIME. BY PO-SHEN LOH AND EYAL LUBETZKY Carnegie Mellon University and Microsoft Research"

Sibyl Wilcox
6 years ago
Views:

1 The Annals of Appled Probablty 013, Vol. 3, No., DOI: /11-AAP83 Insttute of Mathematcal Statstcs, 013 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME BY PO-SHEN LOH AND EYAL LUBETZKY Carnege Mellon Unversty and Mcrosoft Research The followng dstrbuted coalescence protocol was ntroduced by Dahla Malkh n 006 motvated by applcatons n socal networkng. Intally there are n agents wshng to coalesce nto one cluster va a decentralzed stochastc process, where each round s as follows: every cluster flps a far con to dctate whether t s to ssue or accept requests n ths round. Issung a request amounts to contactng a cluster randomly chosen proportonally to ts sze. A cluster acceptng requests s to select an ncomng one unformly f there are such) and merge wth that cluster. Emprcal results by Fernandess and Malkh suggested the protocol concludes n Olog n) rounds wth hgh probablty, whereas numercal estmates by Oded Schramm, based on an ngenous analytc approxmaton, suggested that the coalescence tme should be super-logarthmc. Our contrbuton s a rgorous study of the stochastc coalescence process wth two consequences. Frst, we confrm that the above process ndeed requres super-logarthmc tme w.h.p., where the neffcent rounds are due to overszed clusters that occasonally develop. Second, we remedy ths by showng that a smple modfcaton produces an essentally optmal dstrbuted protocol; f clusters favor ther smallest ncomng merge request then the process does termnate n Olog n) rounds w.h.p., and smulatons show that the new protocol readly outperforms the orgnal one. Our upper bound hnges on a potental functon nvolvng the logarthm of the number of clusters and the cluster-susceptblty, carefully chosen to form a supermartngale. The analyss of the lower bound bulds upon the novel approach of Schramm whch may fnd addtonal applcatons: rather than seekng a sngle parameter that controls the system behavor, nstead one approxmates the system by the Laplace transform of the entre cluster-sze dstrbuton. 1. Introducton. The followng stochastc dstrbuted coalescence protocol was proposed by Malkh n 006, motvated by applcatons n socal networkng and the relable formaton of peer-to-peer networks see [11] for more on these applcatons). The objectve s to coalesce n partcpatng agents nto a sngle herarchal cluster relably and effcently. To do so wthout relyng on a centralzed authorty, the protocol frst dentfes each agent as a cluster a sngleton), and then proceeds n rounds as follows: 1) Each cluster flps a far con to determne whether t wll be ssung amergerequest or acceptng requests n the upcomng round. Receved March 011; revsed November 011. MSC010 subject classfcatons. 60K30, 60K35, 60J10. Key words and phrases. Stochastc coalescence processes, randomzed dstrbuted algorthms. 49

2 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 493 ) Issung a request amounts to selectng another cluster randomly proportonally to ts sze. 3) Acceptng requests amounts to choosng an ncomng request f there are any) unformly at random and proceedng to merge wth that cluster. In practce, each cluster s n fact a layered tree whose root s entrusted wth runnng the protocol, for example, each root decdes whether to ssue or accept requests n a gven round, etc. When attemptng to merge wth another cluster, the root of cluster C smply chooses a vertex v unformly out of [n], whch then propagates the request to ts root. Ths therefore corresponds to choosng the cluster C j proportonally to C j. Ths part of the protocol s well-justfed by the fact that agents wthn a cluster typcally have no nformaton on the structure of other clusters n the system. A second feature of the protocol s the symmetry between the roles of ssung or acceptng requests played by the clusters. Clearly, every protocol enjoyng ths feature would have roughly) at most half of ts clusters become acceptors n any gven round, and as such could termnate wthn Olog n) rounds. Furthermore, on an ntutve level, as long as all clusters are of roughly the same sze as s the case ntally), there are few collsons multple clusters ssung a request to the same cluster) each round and hence, the effect of a round s smlar to that of mergng clusters accordng to a random perfect matchng. As such, one mght expect that the protocol should conclude wth a roughly balanced bnary tree n logarthmc tme. Indeed, emprcal evdence by Fernandess and Malkh [10] showed that ths protocol seems hghly effcent, typcally takng a logarthmc number of rounds to coalesce. However, rgorous performance guarantees for the protocol were not avalable. Whle there are numerous examples of stochastc processes that have been successfully analyzed by means of dentfyng a sngle tractable parameter that controls ther behavor, here t appears that the entre dstrbuton of the cluster-szes plays an essental role n the behavor of the system. Demonstratng ths s the followng example: suppose that the cluster C 1 has sze n o n) whle all others are sngletons. In ths case t s easy to see that wth hgh probablty all of the merge-requests wll be ssued to C 1, who wll accept at most one of them we say an event holds wth hgh probablty, or w.h.p. for brevty, f ts probablty tends to1asn ). Therefore, startng from ths confguraton, coalescence wll take at least n 1/ o1) rounds w.h.p., a polynomal slowdown. Of course, ths scenaro s extremely unlkely to arse when startng from n ndvdual agents, yet possbly other mldly unbalanced confguratons are lkely to occur and slow the process down. In 007, Schramm proposed a novel approach to the problem, approxmately reducng t to an analytc problem of determnng the asymptotcs of a recursvely defned famly of real functons. Va ths approxmaton framework Schramm then

3 494 P.-S. LOH AND E. LUBETZKY gave numercal estmates suggestng that the runnng tme of the stochastc coalescence protocol s w.h.p. super-logarthmc. Unfortunately, the analytcal problem tself seemed hghly nontrval and overall no bounds for the process were known New results. In ths work we study the stochastc coalescence process wth two man consequences. Frst, we provde a rgorous lower bound confrmng that ths process w.h.p. requres a super-logarthmc number of rounds to termnate. Second, we dentfy the vulnerablty n the protocol, namely the choce of whch merge-request a cluster should approve. Whle the orgnal choce seems promsng n order to mantan the balance between clusters, t turns out that typcal devatons n cluster-szes are lkely to be amplfed by ths rule and lead to rreparably unbalanced confguratons. On the other hand, we show that a smple modfcaton of ths rule to favor the smallest ncomng request s already enough to guarantee coalescence n Olog n) rounds w.h.p. [Here and n what follows we let f g denote that f = Og) whle f g s short for f g f.] THEOREM 1.1. The unform coalescence process U coalesces n τ c U) log log n log n log log log n rounds w.h.p. Consder a modfed sze-based process S where every acceptng cluster C has the followng rule: Ignore requests from clusters of sze larger than C. Among other requests f any), select one ssued by a cluster C j of smallest sze. Then the coalescence tme of the sze-based process satsfes τ c S) log n w.h.p. Observe that the new protocol s easy to mplement effcently n practce as each root can keep track of the sze of ts cluster and can thus nclude t as part of the merge-request. 1.. Emprcal results. Our smulatons show that the runnng tme of the szebased process s approxmately 5 log n. Moreover, they further demonstrate that the new sze-based process emprcally performs substantally better than the unform process even for farly small values of n, that s, the mprovement appears not only asymptotcally n the lmt but already for ordnary nput szes. These results are summarzed n Fgure 1, where the plot on the left clearly shows how the unform process dverges from the lnear n logarthmc scale) trend correspondng to the runtme of the sze-based process. The rght-most plot dentfes the crux of the matter; the unform process rapdly produces a hghly skewed cluster-sze dstrbuton, whch slows t down consderably Related work. There s extensve lterature on stochastc coalescence processes whose varous flavors ft the followng scheme: the clusters act va a contnuous-tme process where the coalescence rate of two clusters wth gven

4 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 495 FIG. 1. The left plot compares the runnng tmes for the two processes. Statstcs are derved from 100 ndependent runs of each process, for each n {104, 048,..., 0 }. The rght plot tracks the rato between the maxmum and average cluster-szes, through a sngle run of each process, for n = There, the unform process took 18 rounds, whle the sze-based process fnshed n 96. masses x,y whch can be ether dscrete or contnuous) s dctated up to re-scalng by a rate kernel K. A notable example of ths s Kngman s coalescent [18], whch corresponds to the kernel Kx,y) = 1 and has been ntensvely studed n mathematcal populaton genetcs see, e.g., [8] for more on Kngman s coalescent and ts applcatons n genetcs). Other rate kernels that have been thoroughly studed nclude the addtve coalescent Kx,y) = x + y whch corresponds to Aldous s contnuum random tree [1], and the multplcatve coalescent Kx,y) = xy that corresponds to Erdős Rény random graphs [9] see the books [4, 17]). For further nformaton on these as well as other coalescence processes, whose applcatons range from physcs to chemstry to bology, we refer the reader to the excellent survey of Aldous []. A major dfference between the classcal stochastc coalescence processes mentoned above and those studed n ths work s the synchronous nature of the latter ones. Instead of ndvdual merges whose occurrences are governed by ndependent exponentals, here the process s comprsed of rounds where all clusters act smultaneously and the outcome of a round multple dsjont merges) s a functon of these combned actons. Ths framework ntroduces delcate dependences between the clusters, and rather than havng the coalescence rate of two clusters be gven by the rate kernel K as a functon of ther masses, here t s a functon of the entre cluster dstrbuton. For nstance, suppose nearly all of the mass s n one cluster C whch thus attracts almost all merge requests); ts coalescence rate wth a gven cluster C j n the unform coalescence process U clearly depends on the total number of clusters at that gven moment, and smlarly n the sze-based coalescence process S t depends on the szes of all other clusters, vewed as competng wth C j over ths merge. In face of these mentoned dependences, the task

5 496 P.-S. LOH AND E. LUBETZKY of analyzng the evoluton of the clusters along the hgh-dmensonal stochastc processes U and S becomes hghly nontrval. In terms of applcatons and related work n computer scence, the processes studed here have smlar flavor to those whch arose n the 1980s, most notably the random mate algorthm ntroduced by Ref, and used by Gazt [15] for parallel graph components and by Mller and Ref [0] for parallel tree contracton. However, as opposed to the settng of those algorthms, a key dfference here s the fact that as the process evolves through tme, each cluster s oblvous to the dstrbuton of ts peers at any gven round ncludng the total number of clusters for that matter). Therefore, for nstance, t s mpossble for a cluster to sample from the unform dstrbuton over the other clusters when ssung ts merge request. For another related lne of works n computer scence, recall that the coalescence processes studed n ths work organze n agents n a herarchc tree, where each merged cluster reports to ts acceptor cluster. Ths s closely related to the rch and ntensvely studed topc of randomzed leader electons see, e.g., [6, 1,, 3, 8]), where a computer network comprsed of n processors attempts to sngle out a leader n charge of communcaton, etc.) by means of a dstrbuted randomzed process generatng the herarchc tree. Fnally, studyng the dynamcs of randomly mergng sets s also fundamental to understandng the average-case performance of dsjont-set data structures see, e.g., the works of Bollobás and Smon [5], Knuth and Schönhage [19] and Yao[7]). These structures, whch are of fundamental mportance n computer scence, store collectons of dsjont sets and support two operatons; ) takng the unon of a par of sets and ) determnng whch set a partcular element s n see, e.g., [14] for a survey of these data structures). The processes studed here precsely consder the evoluton of a collecton of dsjont sets under random merge operatons and t s plausble that the tools used here could contrbute to advances n that area Man technques. As we mentoned above, the man obstacle n the coalescence processes studed here s that snce requests go to other clusters wth probablty proportonal to ther sze, the largest clusters can create a bottleneck, absorbng all requests yet each grantng only one per round. An ntutve approach for analyzng the sze-based process S would be to track a statstc that would warn aganst ths scenaro, wth the most obvous canddate beng the sze of the largest cluster. However, smulatons ndcate that ths alone wll be nsuffcent as the largest cluster does n fact grow out of proporton n typcal runs of the process. Nevertheless, the dstrbuton of large clusters turns out to be sparse. The key dea s then to track a smoother parameter nvolvng the susceptblty, whchs essentally the second moment of the cluster-sze dstrbuton. To smplfy notaton, normalze the cluster-szes w to sum to 1 so that the ntal dstrbuton conssts of n clusters of sze n 1 each. Wth ths normalzaton, the susceptblty χ t s defned as w, the sum of squares of cluster-szes after

6 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 497 the tth round. We note n passng that ths parameter has played a central role n the study of the phase-transton n percolaton and random graphs; see, e.g., [16, 6].) The proof that the sze-based protocol s optmal hnges on a carefully chosen potental functon t = χ t κ t + C log κ t,whereκ t denotes the number of clusters after the tth round and C s an absolute constant chosen to turn t nto a supermartngale. In Sectons 3 and 4 we wll control the evoluton of t and prove our upper bound on the runnng tme of the sze-based process. The analyss of the unform process U s delcate and reles on rgorzng and analyzng the novel framework of Schramm [4, 5] for approxmatng the problem by an analytc one. We beleve ths technque s of ndependent nterest and may fnd addtonal applcatons n the analyss of hgh-dmensonal stochastc processes. Instead of seekng a sngle parameter to summarze the system behavor, one nstead measures the system usng the Laplace transform of the entre clustersze dstrbuton. DEFINITION 1.. For any nteger t 0letF t be the σ -algebra generated by the frst t rounds of the process. Condtoned on F t, defne the functons F t s) and G t s) on the doman R as follows. Let κ be the number of clusters and let w 1,...,w κ be the normalzed cluster-szes after t rounds. Set κ 1.1) F t s) = exp w s), G t s) = 1 κ F tκs). =1 As we wll further explan n Secton, the Laplace transform F t smultaneously captures all the moments of the cluster-sze dstrbuton, n a manner analogous to the moment generatng functon of a random varable. Ths form s partcularly useful n our applcaton as we wll see n Secton 5 that the specfc evaluaton G t 1 ) governs the expected coalescence rate. Furthermore, t turns out that t s possble to estmate values of F t and G t ) recursvely. Although the resultng recurson s nonstandard and hghly complex, a somewhat ntrcate analyss eventually produces a lower bound for the unform process Organzaton. The rest of ths paper s organzed as follows. In Secton we descrbe Schramm s analytc approach for approxmatng the unform process U. Sectons 3 and 4 are devoted to the sze-based process S. Intheformer we prove that E[τ c S)] =Olog n) and n the latter we buld on ths proof together wth addtonal deas to show that τ c S) = Olog n) w.h.p. The fnal secton, Secton 5, bulds upon Schramm s aforementoned framework to produce a super-logarthmc lower bound for τ c U).. Schramm s analytc approxmaton framework for the unform process. In ths secton we descrbe Schramm s analytc approach as t was presented n [4, 5] for analyzng the unform coalescence process U, as well as the numercal

7 498 P.-S. LOH AND E. LUBETZKY evdence that Schramm obtaned based on ths approach suggestng that τ c U) s super-logarthmc. Throughout ths secton we wrte approxmatons loosely as they were sketched by Schramm and postpone any arguments on ther valdty ncludng concentraton of random varables, etc.) to Secton 5, where we wll turn elements from ths approach nto a rgorous lower bound on τ c U). Let F t denote the σ -algebra generated by the frst t rounds of the coalescence process U. The startng pont of Schramm s approach was to examne the followng functon condtoned on F t : κ t F t s) = exp w s), =1 where κ t s the number of clusters after t rounds and w 1,...,w κt denote the normalzed cluster-szes at that tme see Defnton 1.). The beneft that one could gan from understandng the behavor of F t s) s obvous as F t 0) recovers the number of clusters at tme t. More nterestng s the followng observaton of Schramm regardng the role that F t κ t /) plays n the evoluton of the clusters. Condtoned on F t, the probablty that the cluster C receves a merge request from another cluster C j s 1 w the factor 1 accounts for the choce of C j to ssue rather than accept requests). Thus, the probablty that C wll receve any ncomng request n round t + 1and ndependently decde to be an acceptor s 1 [1 1 w /) κ t 1 ] 1 [1 exp w κ t /)]. On ths event, C wll account for one merge at tme t + 1, and summng ths over all clusters yelds κ t E[κ t+1 F t ] κ t 1 [1 exp w κ t /)]= 1 [κ t + F t κ t /)] =1 or equvalently, re-scalng F t s) nto G t s) = 1/κ t )F t κ t s) as n 1.1), E[κ t+1 /κ t F t ] 1 + G t1/).1). In order to have τ c U) log n the number of clusters would need to typcally drop by at least a constant factor at each round. Ths would requre the rato n.1) to be bounded away from 1, or equvalently, G t 1 ) should be bounded away from 1. Unfortunately, the evoluton of the sequence G t 1 ) = 1/κ t)f t κ t /) appears to be qute complex and there does not seem to be a smple way to determne ts lmtng behavor. Nevertheless, Schramm was able to wrte down an approxmate recurson for the expected value of F t+1 n terms of multple evaluatons of F t by observng the followng. On the above event that C chooses to accept the merge

8 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 499 request of some other cluster C j, by defnton of the process U, the dentty of the cluster C j s unformly dstrbuted over all κ t 1 clusters other than C. Hence, E[F t+1 s) F t s) F t ] 1 1 e w κ t / ) 1 e w +w j )s e ws e w j s ). κ t j Ignorng the fact that the last sum n the approxmaton skps the dagonal terms j =, one arrves at a summaton over all 1, j κ t of exponents smlar to those n the defnton of F t wth an argument of ether s, κ t /, or s + κ t /, whch, after rearrangng, gves E[F t+1 s) F t ] 1 F ts + κ t /) + 1 F t s)[f t s) + F t κ t /) F t s + κ t /)]. κ t To turn the above nto an expresson for G t+1 s) one needs to evaluate F t+1 κ t+1 s) rather than F t+1 κ t s), to whch end the approxmaton κ t+1 1 [1 + G t 1 )]κ t can be used based on.1). Addtonally, for the startng pont of the recurson, note that the ntal confguraton of w = 1/κ 0 for all 1 κ 0 has G 0 s) = exp s). Altogether, Schramm obtaned the followng determnstc analytc recurrence, whose behavor should approxmately) dctate the coalescence rate: g 0 s) = exp s), g t+1 s) = 1 [g t αs) g t αs + 1 α where α = 1 [ )] g t. )g t αs) + g t αs + 1 ) ) ] 1 + g t g t αs), In lght of ths, asde from the task of assessng how good of an approxmaton the above defned functons g t provde for the random varables G t along the unform coalescence process U, the other key queston s whether the sequence g t 1 ) converges to 1 as t, and f so, at what rate. For the latter, as the complcated defnton of g t+1 attests, analyzng the recurson of g t seems hghly nontrval. Moreover, a nave evaluaton of g t 1 ) nvolves exponentally many terms, makng numercal smulatons already challengng. The computer-asssted numercal estmates performed by Schramm for the above recurson, shown n Fgure, seemed to suggest that ndeed g t 1 ) 1albetvery slowly), whch should lead to a super-logarthmc coalescence tme for U. How- ) or ts stochastc coun- ever, no rgorous results were known for the lmt of g t 1 terpart G t 1 ). As we show n Secton 5, n order to turn Schramm s argument nto a rgorous lower bound on τ c U), we move our attenton away from the sought value of G t 1 ) and focus nstead on G t1). By manpulatng Schramm s recurson for G t and combnng t wth addtonal analytc arguments and approprate concentraton nequaltes, we show that as long as κ t s large enough and G t 1 )<1 δ for

9 500 P.-S. LOH AND E. LUBETZKY FIG.. Numercal estmatons by Oded Schramm for the functons G t s) from hs analytc approxmaton of the unform coalescence process. TheleftplotfeaturesG t s) for t ={0,,...,40} and s [0, 1] and demonstrates how these ncrease wth t. The rght plot focuses on G t 1 ) and suggests that G t 1 ) 1 and that n turn the coalescence rate should be super-logarthmc. some fxed δ>0, then typcally G t+1 1)>G t 1) + ε for some εδ) > 0. Snce by defnton 0 G t 1) 1, ths can be used to show that ultmately G t 1 ) 1 w.h.p., and a careful quanttatve verson of ths argument produces the rgorous lower bound on τ c U) stated n Theorem Expected runnng tme of the sze-based process. The goal of ths secton s to prove that the expected tme for the sze-based process to complete has logarthmc order, as stated n Proposton 3.1. Followng a few smple observatons on the process, we wll prove ths proposton usng two key lemmas, Lemmas 3.4 and 3.5, whose proofs wll appear n Sectons 3. and 3.3, respectvely. In Secton 4 we extend the proof of ths proposton usng some addtonal deas to establsh that the coalescence tme s bounded by Olog n) w.h.p. PROPOSITION 3.1. Let τ c = τ c S) denote the coalescence tme of the szebased process S. Then there exsts an absolute constant C>0such that E 1 [τ c ] C log n, where E 1 [ ] denotes expectaton w.r.t. an ntal cluster dstrbuton comprsed of n sngletons. Throughout Sectons 3 and 4 we refer only to the sze-based process and use the followng notaton. Defne the fltraton F t to be the σ -algebra generated by the process up to and ncludng the tth round. Let κ t denote the number of clusters after the concluson of round t, notng that wth these defntons we are nterested n boundng the expected value of the stoppng tme 3.1) τ c = mn{t : κ t = 1}. As mentoned n the Introducton, we normalze the cluster-szes so that they sum to 1. Fnally, the susceptblty χ t denotes the sum of squares of the cluster-szes at the end of round t.

10 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 501 Observe that by Cauchy Schwarz, f w 1,...,w κt are the cluster-szes at the end of round t and as such χ t = w )thenwealwayshave 3.) χ t κ t κt ) w = 1 =1 wth equalty ff all clusters have the same sze. Indeed, the susceptblty χ t measures the varance of the cluster-sze dstrbuton. When χ t s smaller closer to κt 1 ), the dstrbuton s more unform. We further clam that 3.3) χ t+1 χ t for all t. To see ths, note that f a cluster of sze a merges wth a cluster of sze b the susceptblty ncreases by exactly a + b) a + b ) = ab a + b. Snce each round only nvolves merges between dsjont pars of clusters, ths mmedately mples that the total addtve ncrease n susceptblty s bounded by the current sum of squares of the cluster szes, that s, the current susceptblty χ t. Before commencng wth the proof of Proposton 3.1, we present a trval lnear bound for the expected runnng tme of the coalescence process, whch wll later serve as the fnal step n our proof. Here and n what follows, P w and E w denote probablty and expectaton gven the ntal cluster dstrbuton w. Whle the estmate featured here appears to be qute crude when w s unform, recall that n general τ c can n fact be lnear n the ntal number of clusters w.h.p., for example, when w s comprsed of one cluster of mass 1 1/ n and n other clusters of mass 1/n each. LEMMA 3.. Startng from κ clusters wth an arbtrary cluster dstrbuton w = w 1,...,w κ ) we have E w [τ c ] 8κ. Furthermore, P w τ c > 16κ) e κ/4. PROOF. Consder an arbtrary round n whch at least clusters stll reman. We clam that the probablty that there s at least one merge n ths round s at least 1 8. Indeed, let C 1 be a cluster of mnmal sze. The probablty that t decdes to send a request s 1, and snce there are at least two clusters and C 1 s the smallest one, the probablty that ths request goes to some C j wth j 1 s at least 1. Fnally, the probablty that C j s acceptng requests s agan 1. Condtoned on these events, C j wll defntely accept some request possbly not the one from C 1 as another cluster of the same sze as C 1 may have sent t a request) leadng to at least one merge, as clamed. The process termnates when the total cumulatve number of merges reaches κ 1. Therefore, the tme of completon s stochastcally domnated by the sum of κ 1 geometrc random varables wth success probablty 1 8, and n partcular E w [τ c ] 8κ 1).

11 50 P.-S. LOH AND E. LUBETZKY By the same reasonng, the total number of merges that occurred n the frst t rounds clearly stochastcally domnates a bnomal varable Bnt, 1 8 ) as long as t τ c. Therefore, P w τ c > 16κ) P Bn 16κ, 1 8) κ 1 ) e κ/4, where the last nequalty used the well-known Chernoff bounds see, e.g., [17], Theorem.1) Proof of Proposton 3.1 va two key lemmas. We next present the two man lemmas on whch the proof of the proposton hnges. The key dea s to desgn a potental functon comprsed of two parts, 1,, whle dentfyng a certan event A t such that the followng holds: E[ 1 t + 1) 1 t) F t,a t ] < c 1 < 0andE[ t + 1) t) F t ] <c,wherec 1,c are absolute constants, and a smlar statement holds condtoned on A c t when reversng the roles of 1 and. At ths pont we wll establsh that an approprate lnear combnaton of 1, s a supermartngale, and the requred bound on τ c wll follow from optonal stoppng. Note that throughout the proof we make no attempt to optmze the absolute constants nvolved. The event A t of nterest s defned as follows. DEFINITION 3.3. after the tth round: Let A t be the event that the followng two propertes hold ) At least κ t / clusters have sze at most 1/600κ t ). ) The cluster-sze dstrbuton satsfes w 1 {w <41/κ t } < The ntuton behnd ths defnton s that property ) boosts the number of tny clusters, thereby severely retardng the growth of the largest clusters, whch wll tend to see ncomng requests from these tny clusters. Property ) ensures that most of the mass of the cluster-sze dstrbuton s on relatvely large clusters, of sze at least 41 tmes the average. Examnng the event A t wll ad n trackng the varable χ t κ t, the normalzed susceptblty [recall from 3.) that ths quantty s always at least 1 and t equals 1 whenever all clusters are of the same sze]. The next lemma, whose proof appears n Secton 3., estmates the expected change n ths quantty and most notably showsthattsatmost 1 00 f we condton on A t. LEMMA 3.4. Let 1 t) = χ t κ t and suppose that at the end of the tth round one has κ t. Then 3.4) and furthermore, 3.5) E[ 1 t + 1) 1 t) F t ] 5 E[ 1 t + 1) 1 t) F t,a t,χ t < ] 1 00.

12 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 503 Fortunately, when A t does not hold the behavor n the next round can stll be advantageous n the sense that n ths case the number of clusters tends to fall by at least a constant fracton. Ths s establshed by the followng lemma, whose proof s postponed to Secton 3.3. LEMMA 3.5. κ t. Then 3.6) Let t) = log κ t and suppose that after the tth round one has E[ t + 1) t) F t,a c t ] < We are now n a poston to derve Proposton 3.1 from the above two lemmas. PROOF OF PROPOSITION 3.1. Defne the stoppng tme τ to be τ = mn{ : χ t }. Observe that the susceptblty s ntally 1/n, ts value s 1 once the process arrves at a sngle cluster.e., at tme τ c ) and untl that pont t s nondecreasng, hence, Eτ Eτ c < by Lemma 3.. Further defne the random varable Z t = χ t κ t log κ t + t 00. We clam that Z t τ ) s a supermartngale. Indeed, consder E[Z t+1 F t,τ >t] and note that the fact that τ>tmples n partcular that κ t snce n that case χ t < < 1. If A t holds then by 3.5) the condtonal expected change n χ t κ t s below 00 1, whle log κ t can only decrease as κ t s nonncreasng), hence, E[Z t+1 F t,a t,τ >t] Z t. If A t does not hold, then by 3.4) the condtonal expected change n χ t κ t s at most +5 whereas the condtonal expected change n log κ t s below 10 7 due to 3.6). By the scalng n the defnton of Z t,theseadduptogvee[z t+1 F t,a c t,τ >t] Z t Altogether, Z t τ ) s ndeed a supermartngale. As ts ncrements are bounded and the stoppng tme τ s ntegrable we can apply the optonal stoppng theorem see, e.g., [7], Chapter 5) and get 3.7) EZ τ Z 0 = χ 0 κ log κ 0 = Olog n). At the same tme, by defnton of τ we have χ τ and so 3.8) Z τ = χ τ κ τ log κ τ + τ κ τ + τ/8). Takng expectaton n 3.8) and combnng t wth 3.7) wefndthat E[τ + 8κ τ ] Olog n).

13 504 P.-S. LOH AND E. LUBETZKY Fnally, condtoned on the cluster dstrbuton at tme τ we know by Lemma 3. that the expected number of addtonal rounds t takes the process to conclude s at most 8κ τ, thus E[τ c ] E[τ + 8κ τ ]. We can now conclude that E[τ c ]=Olog n), as requred. 3.. Proof of Lemma 3.4: Estmatng the normalzed susceptblty when A t holds. The frst step n controllng the product χ t κ t s to quantfy the coalescence rate n terms of the susceptblty, as acheved by the followng clam. 3.9) CLAIM 3.6. and furthermore, Suppose that at the end of the tth round one has κ t. Then E[κ t+1 F t ] κ t 46χ t ) 1 P κ t+1 <κ t 100χ t ) 1 F t,χ t < ) 1 e 100. PROOF. To smplfy the notaton let κ = κ t, χ = χ t and κ = κ t+1 throughout the proof of the clam. Further let the clusters C be ndexed n ncreasng order of ther szes and let w = C. Recall that the number of merges n round t + 1 s precsely the number of clusters whch decde to accept requests and then receve at least one ncomng request from a cluster of sze no larger than tself. Consder the probablty of the latter event for a cluster C wth > κ/. Snce the clusters are ordered by sze there are at least κ/ clusters of sze at most w and each wll send a request to C ndependently wth probablty w / the factor of s due to the probablty of ssung rather than recevng requests ths round). The probablty that none of these clusters do so s thus at most 1 w /) κ/ e wκ/6 where we used the fact that κ/ κ/3foranyκ ), and altogether the probablty that C accepts a merge request from one of these clusters s at least 1 1 e wκ/6 ). Summng over these clusters we conclude that E[κ κ F t ] 1 κ 1 e wκ/6 1 ) 4 1 e wκ/6 ), > κ/ where the last nequalty follows from the fact that the summand s ncreasng n w and hence, the sum over the κ/ largest clusters should be at least as large as the sum over the κ/ smallest ones. Next, observe that by concavty, for all 0 w 6χ the fnal summand s at least w 14 1 e χκ )/6χ) whch n turn s at least w 14 1 e 1 )/6χ) by 3.). As ths last expresson always exceeds w /38χ) we get 3.10) E[κ κ F t ] 1 38χ w 6χ =1 w.

14 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 505 We now am to show that much of the overall mass s spread on clusters of sze at most 6χ. To ths end recall that by defnton χ = w whle w = 1, hence, we can wrte χ = EY where Y s the random varable that accepts the value w wth probablty w for = 1,...,κ. Ths gves that w = PY 6EY)> 5 6 w 6χ wth the fnal bound due to Markov s nequalty) and revstng 3.10) we obtan that E[κ κ F t ] > 1 38χ 5 6 > 1 46χ, establshng nequalty 3.9). To complete the proof of the clam t suffces to show that the random varable X = κ κ s sutably concentrated, to whch end we use Talagrand s nequalty see, e.g., [1], Chapter 10). In ts followng verson we say that a functon f : R s C-Lpschtz f changng ts argument ω n any sngle coordnate changes fω) by at most C, andthatf s r-certfable f for every s and ω wth fω) s there exsts a subset I of at most rs coordnates such that every ω that agrees wth ω on the coordnates ndexed by I also has fω ) s. In the context of a product space = these defntons carry to the random varable that f corresponds to va the product measure. THEOREM 3.7 Talagrand s nequalty). If X s a C-Lpschtz and r-certfable random varable on = n =1, then P X EX >t+ 60C rex) 4exp t /8C rex)) for any 0 t EX. Observe that round t + 1, condtoned on F t, s clearly a product space as the actons of the ndvdual clusters are ndependent. Formally, each cluster chooses ether to accept requests or to send a request to a random cluster. Changng the acton of a sngle cluster can only affect X, the number of merges n round t + 1, by at most one merge and so X s 1-Lpschtz. Also, f X s then one can dentfy s clusters whch accepted merge requests from smaller clusters. By fxng the decsons of the s clusters comprsng these merges the acceptors together wth ther correspondng requesters) we must have X s regardless of the other clusters actons, as the s acceptors wll accept possbly dfferent) merge-requests no matter what. Thus, X s also -certfable. Let μ = EX and assume now that χ< By the frst part of the proof [equaton 3.9)], t then follows that μ 46χ) 1 > 70,000, n whch case Talagrand s nequalty gves P X μ > μ ) μ 4exp μ/6) /16μ) ) = 4e μ/576 <e 100.

15 506 P.-S. LOH AND E. LUBETZKY Also, note that our above bound μ>70000 > 180 mples that 60 μ<μ/3, so n fact the probablty of X fallng below μ μ 6 + μ 3 ) s at most e 100.As μ 46χ) 1 we conclude that κ κ = X>100χ) 1 wth probablty at least 1 e 100, as requred. As the above clam demonstrated the effect of the susceptblty on the coalescence rate, we move to study the evoluton of the susceptblty. The crtcal advantage of the sze-based process s that large clusters grow more slowly than small clusters. The ntuton behnd ths s that larger clusters tend to receve more requests, and snce clusters choose to accept ther smallest ncomng request, these clusters typcally have more choces to mnmze over. It turns out that ths effect s enough to produce a useful quanttatve bound on the growth of the susceptblty. CLAIM ) Suppose that after the tth round κ t. Then E[χ t+1 F t ] χ t + 5 κ t. PROOF. Set κ = κ t and χ = χ t. Let the clusters C be ndexed n ncreasng order of ther szes and let w = C. For each cluster C let the random varable X be the sze of the smallest cluster that t receves a merge request from, as long as that cluster s no larger than tself, and not tself; otherwse the case where C receves no merge requests from another cluster of sze less than or equal to ts own) set X = 0. Under these defntons we have κ 3.1) E[χ t+1 F t ]=χ + w E[X ], =1 snce each C s an acceptor wth probablty 1 and f t ndeed accepts a request from a cluster of sze X then the susceptblty wll ncrease by exactly w + X ) w + X ) = w X. Next, note that snce we ordered the clusters by ncreasng order of sze, each of the frst κ/ clusters has sze at most /κ otherwse the last κ/ clusters would combne to a total mass larger than 1). We wll use ths fact to bound E[X F t ] by consderng two stuatons: 1) If C receves an ncomng request from at least one of the frst κ/ clusters ncludng tself), then X /κ by the above argument. The probablty of ths s precsely 1 1 w ) κ/ as each of the frst κ/ clusters C j ndependently sends a request to C wth probablty w / wth the factor of due to the decson of C j whether or not to ssue requests).

16 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 507 ) If C gets no requests from the frst κ/ clusters, then use the trval bound X w. Combnng the two cases we deduce that EX 1 1 w ) κ/ ) κ + 1 w ) κ/ 3.13) w. We clam that EX s n fact always at most 5/κ. To see ths, frst note that f w /κ then ths mmedately holds, for example, snce X w. Consder therefore the case where w > /κ.snce3.13) s a weghted average of /κ and w > /κ, t ncreases whenever the weght on w s ncreased. As 1 w ) κ/ e w/) κ/ e wκ/6, we have that, n ths case, EX 1 e w κ/6 ) κ + e w κ/6 w 1 κ + w κe w κ/6 ). One can easly verfy that the functon fx)= xe x/6 satsfes fx) 3forallx, hence, we conclude that EX 5/κ n all cases, as clamed. Pluggng ths nto 3.1) we obtan that E[χ t+1 F t ] χ + 5 κ w = χ + 5 κ κ as requred. Whle the last clam allows us to lmt the growth of the susceptblty, ths bound s unfortunately too weak n general. For nstance, when used n tandem wth Clam 3.6, t results n the susceptblty growng out of control, whle the number of clusters decreases slower and slower. Crucally, however, condtoned on the event A t as gven n Defnton 3.3) we can refne these bounds to show that the growth of χ t+1 slows down dramatcally, as the followng clam establshes. =1 CLAIM ) Suppose that at the end of the tth round κ t. Then E[χ t+1 F t,a t ] χ t + 01κ t ) 1. PROOF. Let κ = κ t and χ = χ t, and defne the random varables X as n the proof of Clam 3.8. By the same reasonng used to deduce nequalty 3.13), only now usng property ) of A t accordng to whch each of the smallest κ/ clusters has sze at most 1/600κ t ),wehave EX 1 1 w ) κ/ ) 1 600κ + 1 w ) κ/ 3.15) w.

17 508 P.-S. LOH AND E. LUBETZKY Recall that equaton 3.1) establshed that E[χ t+1 F t ]=χ + κ =1 w EX.Ths tme we wll need to bound ths sum more delcately by splttng t nto two parts based on whether or not w < 41/κ. In the case w < 41/κ we can use the trval bound X w to arrve at w 1 {w <41/κ}EX < 41 w 1 {w <41/κ} κ < κ, where the last nequalty s by property ) of A t. For the second part of the summaton we use the same weghted mean argument from the proof of Clam 3.8 to deduce that when w >600κ) 1, the rght-hand sde of 3.15) ncreases wth the weght on w, whch n turn s at most 1 w ) κ/ exp w κ/4). In partcular, n case w 41/κ, wehave EX 1 e wκ/4 1 ) 600κ + e wκ/4 w 1 κ 1 ) 1 κ e 41/4 ) w κe w κ/4 here we used the fact that the functon xe x/4 s decreasng for x 41). Combnng our bounds, κ w EX )) 1 w 1 {w 41/κ} κ e 41/4 < 1 01κ =1 snce w = 1. Together wth 3.1), the proof s complete. Combnng the bound on κ t+1 n Clam 3.6 wth the bounds on χ t+1 from Clams 3.8 and 3.9 wll now result n the statement of Lemma 3.4. PROOF OF LEMMA 3.4. For convenence let κ = κ t and χ = χ t,aswellas κ = κ t+1 and χ = χ t+1. The frst statement of the lemma s an mmedate consequence of Clam 3.8 snce κ κ and so E[χ κ F t ] κe[χ F t ] κ χ + 5 ) = χκ + 5. κ For the second statement, snce we can break down χ κ nto χ κ = χ κ 1 ) + χ κ κ χ 100χ + χ κ κ χ ) 1 {κ <κ 1/100χ)}, ) 1 {κ κ 1/100χ)} notcng that the last expresson n the rght-hand sde s at most 0, and recallng that 0 <χ χ χ [due to 3.3)] and 1 κ κ, we now obtan that E[χ κ

18 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 509 F t,a t,χ < ] s at most E [χ κ 1 ) ] Ft,A t,χ < χ [ 1 ] + E χ F t,a t,χ < = κ 1 100χ 100χ 1 {κ κ 1/100χ)} ) E[χ F t,a t,χ < ] P κ κ 1 100χ Applyng Clams 3.6 and 3.9 now gves E[χ κ F t,a t,χ < ] and the proof s complete. κ 1 100χ F t,a t,χ < ). ) χ + 1 ) κ 50 e 100 <χκ e 100 <χκ Proof of Lemma 3.5: Estmatng the number of components when A t fals. We wsh to show that whenever ether one of the two propertes specfed n A t does not hold, the expected number of clusters drops by a constant factor. Suppose that property ) of A t fals. In ths case a constant fracton of the clusters have sze whch s at least a constant fracton of the average sze 1/κ t. We wll show that each such cluster receves an ncomng request from another cluster of no larger sze) n the next round wth a probablty that s unformly bounded from below. Consequently, we wll be able to conclude that the number of clusters shrnks by at least a constant factor n expectaton. CLAIM Suppose that at the end of the tth round κ t and property ) of A t does not hold, that s, more than κ t / clusters have sze greater than 600κ t ) 1. Then 3.16) E[κ t+1 F t ] )κ t. PROOF. Let κ = κ t and κ = κ t+1 and as usual, order the clusters by ncreasng order of sze. Consder an arbtrary cluster C whch s one of the last κ/ clusters, and let w denote ts sze. If C opts to accept requests n ths round wth probablty 1 )andanyofthefrst κ/ clusters sends t a request, t wll contrbute

19 510 P.-S. LOH AND E. LUBETZKY a merge n ths round. Ths occurs wth probablty w ) κ/ ) 1 1 e wκ/6 )> 1 1 e 1/3600 )>10 4, where we used our assumpton that w 600κ) 1. Thus, the probablty that C contrbutes to a merge s at least We conclude that the expected number of merges n ths round s at least 10 4 κ/, from whch the desred result follows. Now suppose that property ) of A t fals. Here at least a constant proporton of the mass of the cluster-sze dstrbuton falls on clusters wth sze at most a constant multple of the average sze. Such clusters behave ncely as n ths wndow the relaton between the cluster-sze and the typcal number of ncomng requests can be bounded by a lnear functon. Agan, ths wll result n a constant proporton of clusters mergng n the next round n expectaton. CLAIM Suppose that at the end of the tth round κ t and property ) of A t does not hold, that s, w 1 {w <41/κ t } , where w denotes the sze of C. Then 3.17) E[κ t+1 F t ] )κ t. PROOF. Let κ = κ t and κ = κ t+1. Order the clusters by sze and let r be the number of clusters whch are smaller than 41/κ. Snce clearly at most κ/41 clusters can have sze at least 41/κ, wehaver 40 41κ. Notce that snce κ, ths mples that n partcular r/ κ/3. By the same arguments as before, each cluster C wth r/ < r wll accept a merge request from a smaller cluster wth probablty at least w ) r/ ) 1 1 e w /) r/ ) 1 1 e wκ/6 ). Snce we are concentratng our attenton on the clusters of sze w < 41/κ, concavty mples that the last expresson s actually at least 1 1 e 41/6 ) w 41/κ > w κ 100. We conclude that the expected number of merges n ths round s at least r w κ 100 κ r w κ = 10 7 κ, = r/ +1 =1 where we used the fact that the w s are sorted n ncreasng order to relate the sum over the cluster ndces r/ +1,...,r to the one over the frst r clusters. Ths gves the desred result.

20 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 511 PROOF OF LEMMA 3.5. The proof readly follows from the combnaton of Clams 3.10 and Indeed, these clams establsh that whenever the event A t fals we have E[κ t+1 F t,a c t ] )κ t. Therefore, by the concavty of the logarthm, Jensen s nequalty mples that E[log κ t+1 F t,a c t ] log E[κ t+1 F t,a c t ] log κ t + log ) as requred. < log κ t Optmal upper bound for sze-based process. We now prove the upper bound n Theorem 1.1 by buldng upon the deas of the prevous secton. Recall that n the proof of Proposton 3.1 we defned the sequence Z t = χ t κ t + M log κ t + t where M = , 00 establshed that t was a supermartngale and derved the requred result from optonal stoppng. That approach was only enough to produce a bound on E[τ c ],the expected completon tme. For the stronger result on the typcal value of τ c we wll analyze Z t ) more delcately. Namely, we estmate ts ncrements n L to qualfy an applcaton of an approprate Bernsten Kolmogorov large-devaton nequalty for supermartngales due to Freedman [13]. An mportant element n our proof s the modfcaton of the above gven varable Z t nto an overestmate Y t whch allows far better control over the ncrements n L. Ths s defned as Y 0 = Z 0 = χ 0 κ 0 + M log κ 0 = 1 + M log n, 4.1) Y Y t+1 = t + t+1 log /3 n) + M log κ t κ t 00, f τ c >t, Y t, f τ c t, where t+1 = χ t+1 κ t+1 κ t 1 )) χ t κ t. χt The purpose of the κ t 1 χ t ) term s to lmt the potental decrease from negatve. In ths secton, we wll need two-sded estmates n addton to one-sded bounds such as those used n the prevous secton) due to the fact that we must control the L ncrements. It s clear that Y t+1 Y t Z t+1 Z t as long as t<τ c and t+1 log /3 n. Therefore, settng τ = mn{t : t+1 > log /3 n},

21 51 P.-S. LOH AND E. LUBETZKY t follows that 4.) Y t Z t for all t τ c τ. In what follows we wll establsh a large devaton estmate for Y t ), then use ths overestmate for Z t to show that w.h.p. τ c = Olog n). We thus focus our attenton on the sequence Y t ). LEMMA 4.1. The sequence Y t ) s a supermartngale. PROOF. Snce by defnton Y t = Y t τc, t suffces to consder the tmes t<τ c. As we clearly have κ t+1 κ t χ 1 t )) κ t and Clam 3.8 establshed that E[χ t+1 F t ] χ t + κ 5 t, we can deduce that 4.3) E[ t+1 F t ] 5. Combned wth Lemma 3.5 as n the proof of Proposton 3.1, t then follows that E[Y t+1 F t,a c t ] 0. We turn to consder E[Y t+1 F t,a t ].Snceκ t+1 κ t holds for all t, t suffces to show that E[ t+1 F t,a t ] Indeed, as n the proof of Lemma 3.4, we wrte t+1 χ t+1 κ t 1 ) 100χ t + χ t+1 [κ t+1 κ t 1 χt )) κ t χ t ] 1 {κt+1 κ t 1/100χ t )} χ t κ t χ t+1 κ t 1 ) 1 + χ t 1 {κt+1 κ 100χ t 100χ t 1/100χ t )} χ t κ t, t whch as stated before gves rse to E[ t+1 F t,a t ] < e 100 < 1 00, and we conclude that Y t ) s ndeed a supermartngale, as requred. LEMMA 4.. The ncrements of the supermartngale Y t ) are unformly bounded n L. Namely, for every t we have E[Y t+1 Y t ) F t ] < M where M = ) PROOF. Frst observe that Y t+1 Y t ) 3 t+1 ) + 3 M log κ t+1 κ t ) )

22 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 513 Snce 1 κ t κ t+1 κ t,wehave M log M log κ t+1 κ t 0, hence, the last two expressons above sum to, at most, 3 M wth room to spare) and t remans to bound E[ t+1 ) F t ]=O1) for a sutably small mplct constant. Observe that when t+1 0wemusthave t+1 χ t+1 κ t χ t κ t snce κ t+1 κ t χ 1 t )) κ t.conversely,f t+1 0 then necessarly t+1 χ t κ t χ t+1 κ t 1 χ t ) 1, wth the last nequalty due to the fact that κ t 1/χ t and χ t+1 χ t. Combnng the cases we deduce that, n partcular, t+1 κ t χ t+1 χ t ) + 1. By Clam 3.8 we have E[χ t+1 χ t F t ] 5/κ t, hence, we get 4.5) E[ t+1 ) F t ] κ t E[χ t+1 χ t ) F t ]+1 + κ t 5/κ t ) κ t E[χ t+1 χ t ) F t ]+11. It remans to show that E[χ t+1 χ t ) F t ]=O1/κt ).Todoso,letw 1,...,w κt be the cluster-szes after the tth round and recall that by 3.1) and the arguments followng t we have [ κt ) E[χ t+1 χ t ) F t ]=E w X I ], =1 where each X s a nonnegatve random varable satsfyng EX 5/κ t markng the sze of another cluster of no larger sze that ssued a request to C or 0 f there was no such cluster) and each I s a Bernoull 1 ) varable ndependent of X ndcatng whether or not C chose to accept requests). Snce w = 1, t follows from convexty that κt ) w X I w X I, =1 =1 hence, takng expectaton whle recallng that I and X are ndependent, κ t E[χ t+1 χ t ) F t ] 4 w EX )PI ) = w EX, =1 and t remans to bound EX. Followng the same argument that led to 3.13)now gves EX 1 1 w ) κt / ) ) + 1 w ) κt / w κt. As before, we now deduce that ether w /κ t, n whch case clearly EX 4/κ t,orwehave EX 1 e w κ t /6 ) 4 κ t κ t κ t =1 + e w κ t /6 w e w κ t /6 κt w κ t ) ).

23 514 P.-S. LOH AND E. LUBETZKY Snce x exp x/6) <0 for all x 0, t then follows that EX < 4/κt room to spare). Ether way we deduce that wth E[χ t+1 χ t ) F t ] < w 4/κ t ) = 48/κ t and so, gong back to 4.5), 4.6) E[ t+1 ) F t ] < < 60. Usng ths bound n 4.4) we can conclude the proof as we have E[Y t+1 Y t ) F t ] < 3E[ t+1 ) F t ]+ 3 M < M. By now we have establshed that Y t ) s a supermartngale whch satsfes Y t+1 Y t L for a value of L = log /3 n and that, n addton, E[Y t+1 Y t ) F t ] M. We are now n a poston to apply the followng nequalty due to Freedman [13]; we note that ths result was orgnally stated for martngales yet ts proof, essentally unmodfed, extends also to supermartngales. THEOREM 4.3 [13], Theorem 1.6). Let S ) be a supermartngale wth respect to a flter F ). Suppose S S 1 L for all, and wrte V t = t =1 E[S S 1 ) F 1 ]. Then for any s,v > 0, P{S t S 0 + s,v t v} for some t) exp 1 s /v + Ls) ). By the above theorem and a standard applcaton of optonal stoppng, for any s>0, nteger t and stoppng tme τ we have PY t τ Y 0 + s) exp 1 s / M t + Ls)). In partcular, lettng t 0 = 500M log n and pluggng s = log 3/4 n and τ = τ n the last nequalty we deduce that PY t0 τ Y 0 + log 3/4 n) exp 1 o1) ) log 1/1 n ) = o1). Hence, recallng the value of Y 0 from 4.1) we have w.h.p. 4.7) Y t0 τ 1 + M log n + log 3/4 n M log n, where the last nequalty holds for suffcently large n. In order to compare t 0 and τ, recall from 4.3) thate[ t+1 F t ] 5, whereas we establshed n 4.6) thate[ t+1 ) F t ] < 60. By Chebyshev s nequalty, P t+1 log /3 n F t ) = OE[ t+1 ) F t ] log 4/3 n) = Olog 4/3 n). In partcular, a unon bound mples that P τ t 0 ) = Olog 1/3 n).

24 STOCHASTIC COALESCENCE IN LOGARITHMIC TIME 515 Revstng 4.7) ths mmedately mples that w.h.p. Y t0 M log n, and snce Y t0 τ τ c Z t0 τ τ c [due to 4.)], we further have that w.h.p. Y t0 τ c Z t0 τ c t 0 τ c 00. Therefore, we must have τ c <t 0 w.h.p., otherwse the last two nequaltes would contradct our choce of t 0 = 500M log n. The proof s complete. 5. Super-logarthmc lower bound for the unform process. In ths secton we use the analytc approxmaton framework ntroduced by Schramm to prove the super-logarthmc lower bound stated n Theorem 1.1 for the coalescence tme of the unform process. Recall that a key element n ths framework s the normalzed Laplace transform of the cluster-sze dstrbuton, namely, G t s) = 1/κ t )F t κ t s), where F t s) = κ t =1 e ws see Defnton 1.). The followng proposton, whose proof entals most of the techncal dffcultes n our analyss of the unform process, demonstrates the effect of G t 1 ) and G t1) on the coalescence rate. PROPOSITION 5.1. Let ε t = 1 G t 1 ) and ζ t = G t 1). There exsts an absolute constant C>0 such that, condtoned on F t, wth probablty at least 1 Cκt 100, we have 5.1) κ t+1 1 ε t /)κ t κt /3, ζ t+1 ζ t + ε 13/ε t 5.) t 8κt 1/3. We postpone the proof of ths proposton to Secton 5.4 n favor of showng how the relatons that t establshes between κ t,g t 1), G t 1 ) can be used to derve the desred lower bound on τ c. We clam that as long as κ t,g t 1 ), G t1) satsfy equatons 5.1), 5.) andt = Olog n log log log n log log n ),thenκ t n 3/4 ; ths determnstc statement s gven by the followng lemma. LEMMA 5.. Set T = 75 1 log n log log n log log log n for a suffcently large n and let κ 0,...,κ T be a sequence of ntegers n {1,...,n} wth κ 0 = n. Further, let ε t and ζ t for t = 0,...,T be two sequences of reals n [0, 1] and suppose that for all t<t the three sequences satsfy nequaltes 5.1) and 5.). Then κ t >n 3/4 for all t T. Observe that the desred lower bound on the coalescence tme of the unform process U s an mmedate corollary of Proposton 5.1 and Lemma 5.. Indeed, condton on the frst t rounds where 0 t<t= 75 1 log n and assume log log n log log log n

STOCHASTIC COALESCENCE IN LOGARITHMIC TIME

STOCHASTIC COALESCENCE IN LOGARITHMIC TIME PO-SHEN LOH AND EYAL LUBETZKY Abstract. The followng dstrbuted coalescence protocol was ntroduced by Dahla Malkh n 006 motvated by applcatons n socal networkng.