A Data adaptive and Dynamic Segmentation Index for Whole Matching on Time Series

Size: px
Start display at page:

Download "A Data adaptive and Dynamic Segmentation Index for Whole Matching on Time Series"

Transcription

1 A Data adaptve and Dynamc Segmentaton Index for Whole Matchng on Tme Seres Yang Wang Peng Wang Jan Pe We Wang Sheng Huang School of Computer Scence, Fudan Unversty, Shangha, Chna School of Computng Scence, Smon Fraser Unversty, Burnaby, BC, Canada Informaton Management Team, IBM Research Chna, Shangha, Chna {844, pengwang5, ABSTRACT Smlarty search on tme seres s an essental operaton n many applcatons. In the state-of-the-art methods, such as the R-tree based methods, SAX and SAX, tme seres are by default dvded nto equ-length segments globally, that s, all tme seres are segmented n the same way. Those methods then focus on how to approxmate or symbolze the segments and construct ndexes. In ths paper, we make an mportant observaton: global segmentaton of all tme seres may ncur unnecessary cost n space and tme for ndexng tme seres. We develop, a data adaptve and dynamc segmentaton ndex on tme seres. In addton to savngs n space and tme, our new ndex can provde tght upper and lower bounds on dstances between tme seres. An extensve emprcal study shows that our new ndex supports tme seres smlarty search effectvely and effcently.. INTRODUCTION Smlarty search on tme seres s essental n many applcatons []. Gven a set T S of tme seres, a query tme seres Q, and a dstance threshold ǫ, a smlarty search retreves the tme seres S T S such that D(Q, S) ǫ, where D(, ) s a dstance functon. When the Eucldean dstance s used and the tme seres n queston are assumed of the same length, the problem s called whole matchng [], whch has been popularly used n varous applcatons. The problem s challengng n practce, snce often the set of tme seres T S to be searched may contan many tme seres and each tme seres may be long. To tackle the whole matchng We sncerely thank Dr. Thems Palpanas for sendng us the SAX. code. We are deeply grateful to the anonymous revewers for ther nsghtful and constructve comments and suggestons that help to mprove the qualty of ths paper. We tred our best to accommodate ther suggestons n ths camera-ready verson. The work s supported n part by NSFC under grants 639, 676, 633, IBM-Fudan Jont Study program JSA6, an NSERC Dscovery Grant and a BCFRST NRAS Endowment Research Team Program project. All opnons, fndngs, conclusons and recommendatons n ths paper are those of the author and do not necessarly reflect the vews of the fundng agences. Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. Artcles from ths volume were nvted to present ther results at The 39th Internatonal Conference on Very Large Data Bases, August 6th 3th 3, Rva del Garda, Trento, Italy. Proceedngs of the VLDB Endowment, Vol. 6, No. Copyrght 3 VLDB Endowment 5 897/3/... $.. S S S3 S4 Fgure : Dynamc segmentaton of tme seres. problem, many ndex structures have been proposed [, 4, 5,, 6, 3], whch wll be brefly revewed n Secton 6, all those ndexes are based on two fundamental prncples. Prncple : Dmensonalty Reducton by Global Segmentaton A tme seres can be regarded as a pont n a multdmensonal space, one dmenson representng a tme nstant. A fundamental challenge, however, s that the length of tme seres s often long. A tme seres often contans readngs at hundreds or even thousands of nstants. It s hghly neffectve to drectly ndex tme seres usng spatal ndexes, such as an R-tree [7]. To tackle ths problem, many exstng methods apply dmensonalty reducton technques, such as Sngular Value Decomposton (SVD) [8], Dscrete Fourer Transform (DFT) [], Dscrete Wavelet Transform (DWT) [4], Pecewse Lnear Approxmaton (PLA) [4], Pecewse Aggregate Approxmaton (PAA) [], Adaptve Pecewse Constant Approxmaton (APCA) [] and Chebyshev Polynomals (CP) []. After dmensonalty reducton, a multdmensonal ndex, such as R-tree [7], can be used as an ndex n the lower dmensonal space. Accordngly, n the state-of-the-art tme seres ndexng methods, such as the R-tree based methods, SAX [5], and SAX [6], all tme seres to be ndexed are segmented n the same way. Thus, they are global segmentaton approaches. Those methods focus on how to approxmate or symbolze segments and construct ndexes. The segmentaton of tme seres s not closely ntegrated wth ndex buldng. Does such a global segmentaton method provde the best beneft to tme seres ndexng? EXAMPLE (SEGMENTATION). Consder the 4 tme seres n Fgure. Each tme seres has 8 tme nstants. To reduce the dmensonalty, we can segment each tme seres nto 4 segments, each segment consstng of nstants. If we notce that tme seres S and S have (relatvely) stable values on the frst 4 nstants, we can segment S and S nto 3 segments: the frst segments consst of the frst 4 nstants, the second segments consst of the 5th and the 6th nstants, and the last segments consst of the 7th and the 8th nstants, as ndcated by the dotted lnes n the fgure. To the contrast, tme seres S3 and S4 have (relatvely) stable values both on the frst 4 nstants and on the last 4 ones. Accordngly, we can segment them such that the frst

2 segments cover the frst 4 nstants and the second segments cover the last 4 nstants. By dynamc segmentatons adaptve to data, we can reduce dmensonalty further, n ths example, from 4 to 3 for S and S, and to for S3 and S4,. At the same tme, we can retan good approxmaton qualty. Example, though smple, clearly shows that local segmentaton enables substantal opportuntes for more effectve ndexes. If we can segment tme seres n an adaptve way, we may be able to acheve better dmensonalty reducton and thus save more space and query answerng tme. Now, the challenge s how we can dynamcally segment tme seres n a data adaptve manner, and retan good qualty. Prncple : Usng Lower Bounds n Search Dmensonalty reducton almost unavodably comes wth errors n data representaton. An essental requrement n smlarty search, however, s no false dsmssals. The lower boundng property (also known as the contractve property) s an mportant desrable property for the dmensonalty reducton representaton methods of tme seres. A dmensonalty reducton method s sad to hold the lower boundng property f the method comes wth a dstance lower bound functon D LB( S, S ) D(S, S ) for any tme seres S and S, where S and S are the approxmaton representatons of S and S, respectvely, n the method. A method wth the lower boundng property guarantees no false negatve n search. That s, when a tme seres S s pruned usng the lower bound functon D LB( Q, S) > ǫ, snce D(Q, S) D LB( S, S ) > ǫ, S s defntely not an answer to the smlarty search. Whle lower boundng s well explored n lterature, to the best of our knowledge, no exstng methods consder usng upper bounds n smlarty search systematcally. If a method comes wth a dstance upper bound functon D UB( S, S ) D(S, S ) for any tme seres S and S, where S and S are the approxmaton representatons of S and S, respectvely, n the method, then once D UB( Q, S) < ǫ, we can mmedately know S s an answer to the smlarty search wthout computng the exact dstance D(Q, S). Although some prevous methods propose upper bound for tme seres smlarty computaton [7], they only consder how to defne and compute the upper bound of the dstance between two tme seres. How to utlze the upper bound n the ndex for a large number of tme seres s far from trval and has not been solved. Moreover, upper boundng can be used to answer nterestng queres beyond smlarty search. For example, consder query what s the dstrbuton of the dstance between Q and the tme seres n the database? Wth both lower boundng and upper boundng, we may be able to gve a bounded hstogram quckly as the answer to the queston based on the ndex only wthout accessng the orgnal data. An applcaton example of the hstogram obtaned as such s to help to set a meanngful threshold n smlarty search. Now, the challenge s how to develop an effectve upper boundng mechansm n ndexes for effcent smlarty search. In ths paper, we study the whole matchng problem [] where Eucldean dstance s used and tme seres have the same length. It s a fundamental tme seres processng problem tackled by numerous prevous studes [, 4, 5,, 6, 3]. Please note that our work can be easly extended to subsequence matchng [6] where query tme seres are allowed to have dfferent lengths. We explore data-adaptve dynamc segmentaton and upper boundng n tme seres ndexes. We propose a new representaton of tme seres that s an extenson of the renowned Adaptve Pecewse Constant Approxmaton (APCA). It not only offers better representaton accuracy, but also supports upper bound estmaton, whch enrches the functonalty of ndex greatly. Symbol Meanng S, S tme seres S the length of tme seres S D(S, S ) the (Eucldean) dstance between tme seres S and S D LB (N, Q), D UB (N, Q) the lower and upper bound of dstance between tme seres Q and tme seres n node N S = a tme seres s dvded nto m segments (S,..., S m) r j the rght end tme nstant of segment j µ S j the mean of segment j n tme seres S σj S the standard devaton of segment j n tme seres S SG a segmentaton of a tme seres C the number of tme seres ndexed n the subtree rooted at a node Z the synopss at a node ψ Leaf capacty of N a node n a T S N the tme seres assgned to node N LB µ, LBσ, the lower and upper bounds usng mean or standard UB µ, UBσ devaton, see Equatons,, 3, and 4 for detal H-splt M H-splt usng mean value H-splt SD H-splt usng standard devaton V-splt L V-splt usng the left subsegment V-splt R V-splt usng the rght subsegment Table : Some frequently used symbols. We develop, a data adaptve and dynamc segmentaton ndex on tme seres. In addton to savngs n space and tme, our new ndex can provde tght upper and lower bounds on dstances between tme seres. An extensve emprcal study shows that our new ndex supports tme seres smlarty search effectvely and effcently. The rest of the paper s organzed as follows. Secton ntroduces the new representaton of tme seres. Secton 3 develops the new ndex and the constructon method. Secton 4 apples the new ndex n smlarty search. Secton 5 reports the experment results. Secton 6 revews the related work. Secton 7 concludes the paper. Table summarzes the symbols frequently used n ths paper.. EXTENDING APCA REPRESENTATION In ths secton, we extend the well-known Adaptve Pecewse Constant Approxmaton (APCA) of tme seres data. Our extenson, called EAPCA, wll be used to represent tme seres n our ndex. We also derve upper and lower bounds of dstances among tme seres usng the extended approxmaton.. APCA A tme seres S = (s,..., s n) s a sequence of values. Wthout loss of generalty, n ths paper we assume that every tme seres has a value at every tme nstant t =,,..., n. We denote by S = n the length of the tme seres S, and by S[] = s ( S ) the value of S at tme nstant t =. Gven two tme seres S and S such that S = S, the (Eucldean) dstance between S and S s D(S, S ) = S = (S[] S []). The Eucldean dstance s popularly used n tme seres analyss. Moreover, there are strong evdences showng that the Eucldean dstance s superor n accuracy comparng to other smlarty measures [3, 8, ]. In the rest of the paper, we assume the Eucldean dstance, and, when the dstance between two tme seres s concerned, the tme seres have the same length. In many applcatons, t s hghly desrable to estmate the dstance between two tme seres quckly. There are exstng methods provdng lower bounds by segmentng tme seres. Here, we revew a popularly used method, APCA.

3 APCA dvdes a tme seres S = (s,..., s n) nto several dsjont segments, S = (S,..., S m), (m n), where S j = (s rj +,..., s rj ) ( j m, r =, r < < r m = n). APCA approxmates each segment S j by a par (µ j, r j), where µ j = r j k=r j + s k r j r j s the mean value of the segment. That s, S can be approxmated as S = ((µ, r ),..., (µ m, r m)). For two tme seres X and Y such that X = Y, let X = ((µ X, r ),..., (µ X m, r m)) and Ỹ = ((µy, r ),..., (µ Y m, r m)) be the APCA representatons of X and Y, respectvely. The segmentatons of the two tme seres are sad to be algned n X and Ỹ, snce X and Ỹ use the same r,..., rm and r =. Usng the mean values, APCA can gve a lower bound of the dstance between two tme seres. Apparently, we have the followng. LEMMA (APCA LOWER BOUND). Gven two tme seres X and Y such that X = Y, let X = ((µ X, r ),..., (µ X m, r m)) and Ỹ = ((µy, r ),..., (µ Y m, r m)) be two algned APCA representatons of X and Y, respectvely. Then, D(X, Y ) m (r r )(µ X µ Y ) () = Equpped wth only mean values, APCA cannot provde any upper bound on dstance between tme seres. Next, we show that combnng standard devatons we can derve an upper bound and a tghter lower bound on dstances.. EAPCA and Upper/Lower Bounds Usng Standard Devatons We can extend APCA by ncludng the standard devaton for every segment. Concretely, for a tme seres S = (s,..., s n) and an APCA representaton S = ((µ, r ),..., (µ m, r m)), we extend the approxmaton to the extended APCA representaton (EAPCA for short), denote by S = ((µ, σ, r ),..., (µ m, σ m, r m)), r r j=r + sj ) s the stan- j=r where σ = + s j ( r r r r dard devaton of the -th segment ( m). We have the followng results. THEOREM (BOUNDS). Gven two tme seres X and Y such that X = Y, let X = ((µ X, σ X, r ),..., (µ X m, σm, X r m)) and Ỹ = ((µy, σ Y, r ),..., (µ Y m, σm, Y r m)) be two algned EAPCA representatons of X and Y, respectvely. Then, D(X, Y ) m (r r )[(µ X µ Y ) + (σ X σ Y ) ] and = D(X, Y ) m (r r )[(µ X µ Y ) + (σ X + σ Y ) ] = (3) The lower and upper bounds n Equatons and 3 are realzable. () PROOF. D(X, Y ) = = = (X Y ) where µ m r = j=r + m (r r )µ = m = = r j=r + (x j y j ) r r (x j y j) (X Y ) (r r )[(µ X µ Y ) + (σ X Y ) ] r j=r + (x j y j ) ( r (4) r r and σ X Y = j=r + (x j y j ) r r ). Due to the defnton of standard devaton, we have where (σ X Y ) = (σ X ) + (σ Y ) Cov(X, Y ) Cov(X, Y ) = = (σ X ) + (σ Y ) ρ(x, Y )σ X σ Y (5) r j=r + (xj µx )(y j µ Y ) r r (6) s the covarance between X and Y, and r j=r ρ(x, Y ) = + (xj µx )(y j µ Y ) r j=r + (xj µx ) r j=r + (yj µy ) (7) s the correlaton coeffcent between segments X and Y. Snce ρ(x, Y ), we have (σ X σ Y ) (σ X Y ) (σ X + σ Y ) (8) Combnng Equatons 8 and 4, we have Equatons and 3. Comparng Equatons and, the lower bound gven by EAPCA uses the standard devatons to acheve a tghter bound. The bounds are realzable..3 Boundng Dstances to a Set of Tme Seres Often, we need to estmate the dstance between a tme seres and a set of tme seres. We can nfer the lower and upper bounds of the dstance based on Equatons and 3. For a tme seres X and a set of tme seres Y,..., Y l ( X = Y = = Y l ), let X = ((µ X, σ X, r ),..., (µ X m, σ X m, r m)), Ỹ = ((µ Y, σ Y, r ),..., (µ Y m, σ Y m, r m)),..., Ỹ l = ((µ Y l, σy l, r),..., (µy l m, σ Y l m, r m)) be algned EACPA representatons, respectvely. Let the mnmal and maxmal mean values n the -th segments of Y,..., Y l, respectvely, be µ mn = mn j l {µy j } and µ max = max j l {µy j }. Moreover, let the mnmal and maxmal standard devaton values n the -th segments of Y,..., Y l, respectvely, be σ mn max j l {σy j }. We have the followng result. = mn j l {σy j } and σ max = THEOREM (BOUNDS ON SET). For algned EAPCA representatons X of a tme seres X and Ỹ,..., Ỹl of a set of tme seres Y,..., Y l, mn {D(X, Yj)} m (r r )(LB µ + j l LBσ ) (9) =

4 and max {D(X, Yj)} j l m (r r )(UB µ + UBσ ) () = where LB µ = and, LB σ = UB µ = { (µ mn µ X ) f µ X µ mn ; f µ mn < µ X µ max. (µ max µ X ) f µ max < µ X ; (σ mn σ X ) f σ X σ mn ; f σ mn < σ X σ max. (σ max σ X ) f σ max < σ X ; (µ max µ X ) f µ X µmn (µ mn +µ max +µ max ; µ X ) f µmn < µ X ; () () (3) UB σ = (σ max + σ X ) (4) PROOF. (Lower boud) From Equaton, t s easy to see that for the -th segment, the component (µ X µ Y ) +(σ X σ Y ) can be decomposed nto two tems: (µ X µ Y ), whch s only related to the mean value, and (σ X σ Y ), whch s related to the standard devaton. Snce both of them are non-negatve, We can obtan a lower bound from the lower bounds of these two tems. We compare µ X and the range of mean values of Y j s, [µ mn, µ max ], and have 3 cases as follows. Case : If µ X s smaller than the mnmal mean value µ mn of Y j s, t s obvous that for any Y j, (µ X µ Y j ) (µ X µ mn ). Thus, (µ X µ mn ) (µ X µ Y ). We denote by LB µ = (µ X µ mn ). Case : If µ X falls wthn the range, (µ X µ Y ) =. We set LB µ =. Case 3: If µ X > µ max, then for any Y j, (µ X µ Y j ) (µ X µ max ). We set LB µ = (µ X µ max ). We can derve a lower bound LB σ smlarly. Combnng these tems, we can obtan the lower bound n the theorem. The upper bound can be proved n a smlar way. Theorem ndcates that, for a set of tme seres {Y,..., Y l }, to compute the lower and upper bounds between the set and any other tme seres, we need to mantan only the mnmum and maxmum mean values and standard devaton of each segment for all Y j s: µ mn, µ max, σ mn and σ max ( m). 3. THE DSTREE INDEX In ths secton, we develop our new dynamc splttng tree ndex ( for short) on tme seres. 3. One crtcal feature n our new s the segmentaton nformaton. In general, for a tme seres S = (s,..., s n), a segmentaton of S dvdes S nto m exclusve segments S = (S,..., S m), where S = (s,..., s r ) and S = (s r +,..., s r ) ( >, r m = n). Apparently, to record a segmentaton, we only need to record m, the number of segments, and (r,..., r m), the rghtendponts of the segments, where r < < r m = n. Gven a tme seres S, let SG = (r,..., r m) and SG = (r,..., r m ) be two segmentatons. SG s called a one-segment refnement of SG, denoted by SG SG, f m = m + and there exsts a number ( < m) such that, for all, r = r ; and for >, r = r +. Fgure : EXAMPLE. Consder a tme seres S such that S =, and two segmentatons SG = (3, 8,) and SG = (3,5, 8,). SG dvdes S nto 3 segments, (,3), (4,8), (9,). SG dvdes S nto 4 segments, (,3), (4,5), (6,8),(9, ). SG s a one-segment refnement of SG snce t further dvdes the second segment n SG nto two smaller segments. We call a segmentaton SG a refnement of segmentaton SG, denoted by SG SG, f there exst a seres of segmentatons SG,..., SG l (l ) such that SG = SG, SG = SG l, and SG SG + for ( < l). As llustrated n Fgure, a organzes tme seres to be ndexed nto a herarchy. There are two types of nodes: nternal nodes and leaf nodes. Each node contans the followng nformaton.. The number C of tme seres ndexed n the subtree rooted at ths node.. The segmentaton SG = (r,..., r m) of the tme seres ndexed at ths node, where r < < r m = n, and r ( m) s the rght-endpont of the -th segment. 3. A synopss Z = (z, z,, z m), where z = (µ mn, µ max, σ mn, σ max ). The synopss s used to compute the upper and lower bounds. 4. A leaf node lnks to a dsk fle that stores up to ψ tme seres represented by the synopss of ths leaf node, where ψ s the leaf capacty of the. An nternal node has two ponters pontng to chldren nodes. 5. An nternal node stores a splttng strategy SP, whch wll be dscussed n detal n Secton 3.3. In Fgure, a crcle represents an nternal node, and a rectangle represents a leaf node, where up to ψ = tme seres are stored. In the fgure, the segmentaton and the number of segments m are shown for each node, too. In a, for an nternal node N and ts segmentaton SG N, the segmentaton SG N n any descendant node N of N s ether the same as SG N or a refnement of SG N. Consequently, dfferent nodes n a may have dfferent segmentatons. Dfferent segmentatons may dvde tme seres nto dfferent numbers of segments, such as the segmentatons n nodes N 4 and N 5 n Fgure. Even f two segmentatons have the same number of segments, they stll can be dfferent. For example, n Fgure, the segmentatons n nodes N 4 and N 7 both have 3 segments, but the segmentatons are stll dfferent.

5 Algorthm N.Insert(X): N s a node, X s a tme seres : update Z n node N accordng to X; : f N s a leaf node then 3: f C < ψ then N has space to hold X 4: Append X to data fle ponted by N, C = C + ; 5: else C == ψ, no space n N to hold X 6: Append X to data fle ponted by N, C = C + ; 7: SP = BestSplt(); 8: Create two chldren nodes for N; 9: for each tme seres Y n N do : N = N.routeToChld(Y, SP); N.nsert(Y ); : end for : end f 3: else 4: N = N.routeToChld(X, SP); N.nsert(X); 5: end f (a) Splttng usng mean (b) Splttng usng standard devaton Fgure 3: Horzontal splttng 3. Constructon Gven a set T S of tme seres, each of length n, a s constructed n two steps as follows. Step : Intalzaton. We ntalze a that contans only the root node N R. The segmentaton SG = (n), that s, each tme seres s regarded as contanng only one segment. Step : Inserton. We nsert the tme seres n T S one by one nto the. The nserton step s to assgn every tme seres X to a leaf node. Ideally, smlar tme seres are allocated to the same leaf node or a subtree, so that they can be delberated n smlarty search usng the same segmentatons. For the nterest of computatonal effcency, we heurstcally follow a path from the root node to assgn a tme seres X to a leaf node. Specfcally, for each tme seres X T S, we frst vst the root node N R. In the case that N R s a leaf node, we assgn X to N R f N R has space; otherwse, we splt N R accordng to the splttng strategy SP of N R, whch wll be dscussed later n Secton 3.3. If N R s an nternal node, we select the chld node of N R that X fts better, and recursvely search untl a leaf node s met. The pseudocode of functon Insert s shown n Algorthm. A crtcal step of ths algorthm s the functon BestSplt(), whch selects the best splttng strategy whenever a node s splt. We provde multple types of splttng strateges. Whenever splttng a node, we call BestSplt() to fnd the best one, denoted as SP. Functon routet ochld() uses SP to determne whch chld node one tme seres belongs to. These two functons wll be dscussed n the next secton. 3.3 Node Splttng Strateges At an nternal node whose subtree ndexes a subset of tme seres, there are multple possble ways to partton the tme seres nto smaller subsets and assgn them to chldren nodes. We need to defne a good measurement to assess the beneft of dfferent strateges, and fnd a good splttng strategy. In ths subsecton, we frst demonstrate the deas behnd varous splttng strateges, and then present those strateges and a qualty measure Ideas We can splt a set of tme seres n two ways: horzontal splttng (H-splt for short) and vertcal splttng (V-splt for short). In an H-splt, the segmentaton remans unchanged, but the set of tme seres are splt nto two dsjont sets. To splt, the tme seres are assgned to dfferent subsets accordng to a selected segment. Ether mean or standard devaton of the selected segment can be used to make the assgnments. Two examples are shown n Fgure 3, n whch only the -th segment s shown. Fgure 3(b) shows an example where the tme seres cannot be dvded well nto two (a) Usng H-splt Fgure 4: Vertcal splttng (b) Usng V-splt subsets usng the mean, but can be parttoned well usng standard devaton. V-splt leads to a one-segment refnement of the current segmentaton. We llustrate ths process n Fgure 4. The tme seres n Fgure 4(a) cannot be splt well usng an H-splt for the -th segment, snce the 4 tme seres have smlar mean and standard devaton values. In a V-splt, we frst splt the segment nto two, and then cluster tme seres accordng to the mean of the left subsegment, as shown n Fgure 4(b). provdes more possble ways to dvde and conquer tme seres, and thus has the potental to acheve more smlar tme seres n leaf nodes. All the state-of-the-art methods, such as the R-tree based methods and SAX, only support horzontal splttng, and only the mean values can be used n splttng. No segmentaton refnement s allowed n those methods Splttng Strateges A selecton of splttng strateges happens only when a leaf node has no space to accommodate a newly assgned tme seres, and thus has to be splt to host two chldren nodes. The global userspecfed parameter ψ defnes the maxmum number of tme seres that can be ndexed by a leaf node. Consder a leaf node N that needs to hold a set T S N of ψ + tme seres, where the segmentaton s SG. We need to splt N nto two nodes. Now we specfy the splttng strateges H-splt and V-splt as follows. H splt. Suppose the -th segment s selected to be used n splttng. We wll dscuss the choce of segment when we dscuss the qualty measure n Secton In an H-splt M (for H-splt usng mean values), suppose the range of the mean values n the -th segment of T S N s [µ mn, µ max ], we splt N and generate two chldren nodes N l and N r wth the same segmentaton SG n N. [µ mn The range of mean values of the -th segment n N l s +µ max +µ max, µmn ), and that n N r s [ µmn, µ max ]. The tme seres n N wll be assgned to N l and N r accordng to ther mean values. Smlarly, n an H-splt SD (for H-splt usng standard devaton values), suppose the range of the standard devaton value n the -th segment s [σ mn, σ max the -th segment n N l s [σ mn ], the range of standard devaton n, σmn +σ max ), and that n N r s

6 [ σmn +σ max, σ max ]. V splt. Suppose the -th segment s selected to be used n splttng. Agan, we wll dscuss the choce of segment when we dscuss the qualty measure n Secton We refne the segmentaton SG by splttng the -th segment nto two equal-length segments: S = [r +, r + r r ] and S = [ r r r +, r ]. We use one of the two new subsegments and apply an H-splt to partton the tme seres. We denote by V-splt L and V-splt R, respectvely, the left and the rght subsegment s chosen. Consequently, two chldren nodes are created for node N. Clearly, V-splt contans an H-splt step. A splttng strategy can be wrtten as a tuple SP =(sd, strategy, measure), where sd [, m] s the segment d that s selected n the splttng, m s the number of segments n the current segmentaton SG, strategy {H-splt, V-splt L, V-splt R } s the choce of H- or V-splt (and the subsegment n the case of V-splt), and measure {M, SD} records whether the mean values or the standard devaton values are used n the H-splt. For example, n Fgure, SP n node N s (, V-splt L, M), whch means that the second segment s selected, a V-splt s appled, the left subsegment and the mean values are used n the H- splt. SP of node N 3 s (, H-splt, M), whch means the second segment s selected, an H-splt s appled usng the mean values Splttng Strategy Qualty Measure When a node s splt, the tme seres assgned to the node are then assgned to the two chldren nodes created n the splttng process. As just dscussed, several dfferent strateges can be used to make the assgnment, ncludng choosng H-splt or V-splt, the segment used and the measurement (mean or standard devaton). We need a qualty measure to evaluate the beneft of varous splttng strateges n order to choose a good one. A brute-force method to evaluate the qualty of a splttng strategy s that, for every possble strategy, we compute the smlarty among the tme seres assgned to each chld node. Ths brute-force method, however, s very costly. For each splttng strategy, the tme complexty s O(ψ ). If there are m segments, then the total cost to fnd the best strategy s O(m ψ ) = O(m ψ ). In ths subsecton, we tackle the cost by usng the upper and lower bounds of the tme seres n the chldren nodes to evaluate the splt qualty. Gven a node N n a, let Q be a query tme seres. The effectveness of the upper and lower bounds n node N wth respect to Q can be measured by the bound range, whch s the dfference between the upper bound and the lower bound of the dstances between Q and the set of tme seres ndexed n N, that s, m R(Q) = (r r )((UB µ +UBσ ) (LB µ +LBσ )), (5) = where UB µ, UBσ, LB µ, LBσ are defned n Equatons 3, 4,, and, respectvely. m From Equaton 5,we have R(Q) = (r r )(R µ +Rσ ), = where R µ = UBµ LBµ and Rσ = UB σ LB σ. For R µ, accordng to the relatonshp of µq and [µ mn (µ max, µ max ], µ Q ) (µ mn µ Q ), f µ Q µ mn ; µ Q ), f µ mn < µ Q (µmn +µ max ) R µ = (µ max ; (µ mn µ Q ), f (µmn +µ max ) < µ Q µ max ; (µ mn µ Q ) (µ max µ Q ), f µ max < µ Q ; (6) In the second and thrd cases, the range has the same upper bound (µ max µ mn ). The more smlar µ max and µ mn are, the smaller the range s. In the frst and fourth cases, t also holds that the more smlar µ max and µ mn are, the smaller the range s. Thus, we can use (µ max µ mn ) to evaluate the range related to the mean value. Smlarly, for R σ, (σ R σ max + σ Q ) (σ mn σ Q ), f σ Q σ mn ; = (σ max + σ Q ), f σ mn < σ Q σ max ; (σ max + σ Q ) (σ max σ Q ), f σ max < σ Q ; (7) By smple transformaton, we can see that, n both the frst and second cases, the range s smaller than (σ max ). Moreover, n the thrd case, the range equals to 4σ max σ Q. In all cases, t holds that the smaller σ max s, the smaller the range related to standard devaton s. Thus, we can use (σ max ) to evaluate the range related to standard devaton. We combne the above two components and defne our measurement of estmaton qualty as Qos = m = (r r r )((µ max µ mn ) + (σ max ) ) (8) The measurement Qos does not depend on query tme seres. The smaller Qos s, the more effectve the bounds n a node are for smlarty estmaton Fndng Splttng Strateges Denote by N the node to be splt, and by Qos N ts Qos value. We splt N to two chldren nodes N l and N r, and denote ther Qos values by Qos l and Qos r, respectvely. We defne the splttng beneft as B = Qos N Qos l+qos r. The larger B s, the better the splttng strategy. Now, we ntroduce functon BestSplt. For each segment, we compute B for all possble vertcal and horzontal splttng strateges, and select the one wth the maxmum B value as the best strategy. After the ndex buldng, each nternal node mantans ts own splttng strategy, SP. Gven splttng strategy SP of node N and a query tme seres Q, the functon routetochld can correctly fnd the approprate chld node. The process s smlar to that of reassgnng tme seres when splttng occurs. We frst transform Q accordng to the segmentaton of N. Then, we re-transform Q accordng to the correspondng splttng strategy and check whch chld node t belongs to, and assgn Q to t. 3.4 Analyss of In ths secton, we dscuss some factors that are related to the performance of the ndex. Adaptve segmentaton. In all prevous approaches, one has to specfy the dmensonalty of the tme seres representaton, such as the number of coeffcents n DFT and DWT, and the number of segments n SAX and APCA. However, t s hard to determne the optmal parameters. avods ths dffculty by automatc segmentaton splttng. Data dstrbuton. Ideally, the performance of an ndex s nsenstve to the dstrbuton of tme seres to be ndexed. Many exstng methods assume or target at some dstrbutons n desgn. For example, the R-tree based approaches assume that all tme seres can be represented well usng the same number of coeffcents, whch may not hold n many applcatons. If some tme seres are domnated by low-frequency data and the others are domnated by hghfrequency data, DFT-based ndex may have poor performance. does not assume any data dstrbuton snce dfferent nodes have ther own representatons. For ths reason, tme seres wth dramatcally dfferent characterstcs can stll be handled well by usng dfferent nodes. Balance of. SAX and both may generate mbalanced ndex trees. Our expermental results (Table ) show that

7 s better than SAX. n terms of balancng because of the multple splttng strateges. Heurstcally, we can mprove the balance of s n two ways. Frst, we can shuffle the data set and buld the ndex several (e.g., 3-5) tmes usng dfferent nput orders of tme seres, and then pck the best one as the fnal ndex. Second, we can adjust the tree by a post-process where we move the extraordnarly deep subtrees toward the root node. Lmted by space, we omt the detals here. The major search cost n and also some other tme seres ndexes, such as SAX, s to retreve tme seres data from dsk. Searchng nternal nodes n the ndex s relatvely quck. Therefore, keepng smlar tme seres n a leaf node can help to reduce the number of I/O operatons needed n a smlarty search. Ths s very dfferent from lookup queres usng B+-tree or smlar ndexes, where the whole ndex s stored on dsk. Extenson to subsequence matchng. The subsequence matchng problem s to fnd matchng subsequences between two tme seres, whch may have dfferent lengths. The state-of-the-art approaches partton (long) tme seres nto a set of equal-length subsequences based on overlapped wndows, and then buld the ndex for these subsequences for fast smlarty search. The search results are assembled to compute matchng subsequences. Snce the tme seres after parttonng for smlarty search are of the same length, can be used to support subsequence matchng drectly. 4. QUERY ANSWERING ALGORITHMS A supports two types of queres. The frst one s the tradtonal smlarty search, whch returns the tme seres nearest to the query tme seres. The second type s to estmate the dstance dstrbuton, whch returns a hstogram of dstances between the query tme seres and all ndexed tme seres. Algorthm exactsearch(q) : Input: A query tme seres Q : Output: The nearest tme seres TS wth dstance D bsf 3: N bsf = HeurstcSearch(Q); 4: (TS, D bsf ) = calcmndst(n bsf, Q); 5: Intalze dstance prorty queue pq; 6: pq.add(n R, D LB(N R, Q)); 7: whle!pq.sempty() do 8: (N cur, LB cur) = pq.popmn(); 9: f LB cur > D bsf then : break; : end f : f N cur s a leaf node then 3: (X, Dst) = calcmndst(n cur, Q); 4: f Dst < D bsf then 5: D bsf = Dst; TS = X; 6: end f 7: else 8: for all chldren nodes N of N cur do 9: f D LB(N, Q) < D bsf then : pq.add(n, D LB(N, Q)); : end f : end for 3: end f 4: end whle 5: return TS,D bsf ; 4. Smlarty Search Before ntroducng the exact smlarty search, we frst ntroduce a heurstc search method, whch s more effcent and wll be used n the exact search method later. 4.. A Heurstc Method Algorthm 3 Hstogram(Q) : Input: A query tme seres Q : Output: A dstance hstogram Hst 3: Intalze a dstance range count lst L; 4: Intalze a node stack Stack; 5: Stack.push(Root,, + ); 6: whle!stack.sempty() do 7: (N, LB p, UB p) = Stack.Pop(); 8: LB = D LB(N, Q); 9: UB = D UB(N, Q); : LB = max(lb,lb p); : UB = mn(ub, UB p); : f N s a leaf node then 3: Count = N.C; L.add(LB, UB, Count); 4: else 5: for all chld node of N, N do 6: Stack.Push(N, LB, UB); 7: end for 8: end f 9: end whle : Hst = BuldHstogram(L); : return Hst; Instead of fndng the exact most smlar tme seres by checkng all possble nodes n a, a heurstc search only nvestgates one leaf node, and tres to fnd the most smlar tme seres n ths node. Ths method s based on the heurstc that smlar tme seres are often ndexed n the same node. Specfcally, gven a query Q, we start from the root node. If the root node s not a leaf node, then we fnd a chld node of the root node that can hold Q as f Q ware nserted nto the ndex. Ths search process s conducted recursvely untl a leaf node N s met. Then, we calculate the dstance D(S, Q) for every tme seres S T S N, and return the tme seres of the shortest dstance. Please note that the heurstc method, as the name suggests, may not fnd the most smlar tme seres n the whole data set. 4.. The Exact Search To speed up search, we combne the heurstc search method and the lower boundng dstance functon to prune the search space. The pseudo-code s gven n Algorthm. The algorthm begns wth a best-so-far (BSF) answer returned by the heurstc search method. The ntuton s that, by quckly obtanng a tme seres that s lkely smlar to the query tme seres, a large porton of the search space may be pruned effectvely. Once a BSF s obtaned, a prorty queue, denoted by pq, s created to examne nodes that may host tme seres that are potentally more smlar to the query tme seres than the BSF answer. Ths prorty queue s ntalzed to nclude only the root node. The algorthm then repeatedly extracts the node wth the smallest lower bound dstance from the prorty queue untl ether the prorty queue becomes empty or an early termnaton condton s met. The early termnaton occurs when the lower bound dstance s greater than or equal to the dstance of the BSF answer. When the condton s satsfed, the remanng tme seres n the prorty queue cannot qualfy as the nearest neghbor and can be pruned. To process a node from the prorty queue, two possble cases may happen. () In the case that the node s a leaf node, we fetch the tme seres from dsk and compute the dstance from the query to these tme seres, recordng the mnmum dstance. If ths dstance s less than our BSF answer, we update the BSF answer. () In the case that the node s an nternal node, ts chldren nodes are nserted nto the prorty queue provded ther lower bound dstances to the query tme seres are less than the dstance of the BSF answer.

8 4. Dstance Dstrbuton Hstogram Algorthm 3 gves the pseudocode of computng an equ-wdth hstogram of the dstances from a query tme seres to all tme seres ndexed by a. We collect all statstcal nformaton of the leaf nodes to form a lst, denoted by L, n whch each entry represents the number of tme seres fallng n certan dstance range. The range can be estmated based on Theorem. EXAMPLE 3. A lst L = ([, ], ), ([5, 3], 5), ([4, 5], ) means that there are 3 leaf nodes: N, N and N 3. N ncludes tme seres, and ther dstance from Q s between [, ]. The dstance range and number of tme seres n N and N 3 are ([5, 3], 5) and ([4, 5], ) respectvely. Snce the entres of any two leaf nodes are dsjont, there s not redundant nformaton n L. Thus, we can obtan a correspondng hstogram quckly. One ssue s that n some cases, the lower (or upper) bound of a chld node may be smaller (or larger) than that of ts parent node. In other words, the bounds n the parent node may be tghter than those n the chldren nodes. Usng the bounds at such chldren nodes causes less accurate estmaton of the chldren nodes. We address t wth Theorem 3, whch s easy to show. THEOREM 3. If the estmated range of the dstance n a node s [LB, UB], and that n ts parent nodes s [LB p, UB p], then [max(lb, LB p), mn(ub, UB p)] s a tghter and correct range of ths node. Usng Theorem 3, whenever a node s traversed, we frst compute the lower and upper bounds accordng to Theorem. Then we compare the bounds wth those n ts parent node, and use the tghter bounds nstead Lnes 7- n Algorthm 3. Furthermore, one may buld a hstogram more quckly by traversng some nternal nodes nstead of all leaf nodes. Specfcally, we propose an approach here, called α-level ( < α ), to compute a hstogram. Denote by H the heght of a. For each path from the root to a leaf node, we select the α H -th nternal node nstead of the correspondng leaf node to generate the lst L. If the length of a path s shorter than α H, we smply use the leaf node. In other words, we use the nodes located n certan cross secton of the whole tree to generate L. Algorthm 3 can be extended to ths case easly. The expermental results show that we can obtan good estmaton wth 3 -level. Once we obtan the lst L, we can compute a hstogram based on t. There are multple ways to compute a hstogram. A straghtforward way s to assume that the tme seres contaned n a node are dstrbuted unformly. EXAMPLE 4. Consder the lst L n Example 3. We can estmate the number of tme seres wthn the range [5, ] as =. 3 5 In general, we can assume that the tme seres n a node follow some dstrbuton, such as Gaussan dstrbuton. We can spend some extra space to mantan the parameters of the model, such as mean and varance, whch allow more accurate and effcent estmaton. Lmted by space, we omt the detals here. 5. EMPIRICAL EVALUATION In ths secton, we report extensve experments to verfy the effectveness of. We compare both PAA-ndex (usng PAA as representaton and R-tree as ndex) and SAX. wth DStree n ndex effcency, approxmate search error rate and prunng power. We also showcase the lower bound tghtness and accuracy of hstogram estmaton. All experments were executed on a laptop computer wth an Intel Core 5.5GHz CPU and 4GB man memory. All expermental results were averaged over 5 runs. 5. Data Sets and Default Settng The tme seres n both synthetc and real data sets were normalzed wth Z-normalzaton. 5.. Synthetc Data sets Each of our synthetc data set s a combnaton of four types of tme seres as follows. Random walk tmes seres. The start pont s pcked randomly from range [ 5, 5] and the step length s chosen randomly n range [, ]; One-segment Gaussan tme seres. The values n the whole tme seres are pcked from a Gaussan Dstrbuton wth mean value and standard devaton randomly selected n ranges [ 5, 5] and [, ], respectvely; Mult-segment Gaussan tme seres. Such a tme seres s concatenated by multple one-segment Gaussan tme seres. The number of segments s randomly set between 3 to. A mxed sne tme seres. Each tme seres s a mxture of several sne waves whose perod s randomly set n range [, ], ampltude s randomly set n range [, ], and mean value randomly chosen n range [ 5, 5]. To generate a tme seres, the synthetc data generator frst randomly chooses a type, and then pcks the correspondng parameters randomly to generate the tme seres. We generated four synthetc data sets of tme seres lengths 64, 8, 56 and 5, respectvely. Each data set contans one mllon tme seres by default. We also use synthetc data sets of up to mllon tme seres n the scalablty test. 5.. Real Data sets We used a real data set collected n a brdge condton montorng system. In ths system, data was collected from about one thousand sensors of more than types, such as thermometers, accelerometers, stran gauges, dsplacement meters, and fatgue meters. The length of each tme seres s 56, and one mllon tmes seres were collected. The total storage space s about 3GB Parameters To verfy the effectveness of data-adaptve and dynamc segmentaton versus global segmentaton, we compared wth PAA-ndex (mplemented by ourselves) and SAX. (source code provded by the authors). Both PAA-ndex and SAX. use fxed, global segmentatons. To test the performance extensvely, we bult PAA-ndex and SAX. wth segment szes of 8,, 6 and respectvely. The leaf capacty threshold, ψ, was set to. The FBL sze for SAX. was set to,. The fll factor of R-tree n PAA-ndex was set to Index Sze We dd not mplement SAX. by ourselves. Instead, we used the mplementaton provded by the authors. We realze that the the mplementaton detals, partcularly the storage methods, n SAX. and may be dfferent. To avod any confuson, we report the absolute ndex sze for the methods we mplemented but not for SAX.. The frst group of experments compare the ndex space cost of, PAA-ndex and SAX. wth respect to the length of tme seres. Specfcally, we report three measurements, namely the number of nodes n the tree, the physcal ndex sze, and the average number of tme seres contaned by a leaf node. The number of nodes ncludes both nternal and leaf nodes. Consderng the dfference on data representatons n the three approaches, we also compare the physcal ndex sze for and PAA. We use the average number of tme seres n leaf nodes to evaluate the balance

9 PAA 8 PAA PAA 6 PAA ISAX 8 ISAX ISAX 6 ISAX 5 8 Node Count(*) Index Sze(MB) 5 #Tme Seres 6 4 #segments/node Length of Tme Seres Length of Tme Seres Length of Tme Seres Length of Tme Seres (a) Number of nodes (b) Index sze (c) Average #ts per leaf (d) # segments/node Fgure 5: Index sze on the synthetc data sets Node Count(*) Index Sze(MB) 5 5 #Tme Seres 6 4 #segments/node 5 5 PAA 8 PAA PAA 6 PAA ISAX 8 ISAX ISAX 6 ISAX PAA 8 PAA PAA 6 PAA (a) Number of nodes (b) Index sze (c) Average #ts per leaf (d) # segments/node Fgure 6: Index sze on the real data set. PAA 8 PAA PAA 6 PAA ISAX 8 ISAX ISAX 6 ISAX PAA 8 PAA PAA 6 PAA ISAX 8 ISAX ISAX 6 ISAX of the ndex nodes. The results on the synthetc data sets are shown n Fgure 5, and those on the real data set are n Fgure 6. Four dfferent segmentaton szes, 8,, 6 and, were tested for both PAA-ndex and SAX.. Label PAA-6 means PAA-ndex wth 6 segments. Fgure 5(a) shows that, n all the three approaches, the number of nodes s nsenstve to the length of tme seres. However, the number of segments has dfferent effects on SAX. and PAAndex. In SAX., the number of nodes ncreases exponentally as the number of segments ncreases, for example, SAX-6 and SAX- have much more nodes. PAA-ndex s nsenstve to the number of segments. The number of nodes n s stable and far less than those n SAX-6 and SAX-. The number of nodes affects the search effcency. If t s too large, the average number of tme seres per leaf node decreases and more I/O cost s needed to read data from dsk. Fgure 5(b) compares the absolute ndex sze, the smaller, the better. The sze of an ndex s determned by two factors: the number of nodes and the unt space cost per node. For PAA-ndex, the space cost of each node ncreases almost lnearly as the number of segments ncreases. Snce needs to mantan both mean and standard devaton values, t has a larger unt space cost. However, benefttng from the dynamc splttng strateges, the average segment sze of s small. Fgure 5(c) shows the average number of tme seres per leaf node. The smaller the number, the fewer tme seres n expectaton can be retreved from a leaf node. The number n s about 5. The number n PAA-ndex s the largest (about 6) due to the R- tree structure. In SAX., ths value decreases when the number of segments ncreases for two reasons. Frst, n SAX., the root node has too many chldren nodes (for example, 6 for SAX-6). Second, t uses fxed segmentaton. In some cases, t s dffcult to splt a set of tme seres only based on the mean value. Snce uses a dynamc segmentaton strategy, the segment sze vares n dfferent nodes. We report the average segment sze wth respect to length of tme seres, that s, the rato of the total number of segments n all nodes aganst the number of nodes. Fgure 5(d) shows the results. The average segment sze of ncreases moderately when the length of tme seres ncreases, whch confrms the effectveness of our splttng strateges. The average segment szes of the other approaches are nsenstve to the length of tme seres. Fgure 6 shows the results on the real data set. The trends are smlar to those on the synthetc data sets. The number of nodes of s smlar to those of SAX- and smaller than those of SAX-6 and SAX-. The average number of segments per node of s Although the tme seres n the real data set are more dverse, can stll represent the tme seres wth a small number of segments, whch verfes the effectveness of the dynamc splttng strategy n. Both SAX. and are bnary trees. To examne the balance of those ndexes, Table compares the average heght of SAX. and. We use the normalzed standard devaton (that s, standard devaton dvded by the average) of the tree heght to measure the balance of the trees. We do not consder PAA-ndex n ths comparson because R-tree, though balanced, has a much larger fan-out factor. The heght of all ndexes ncreases very moderately as the tme seres length ncreases. These ndexes are all scalable wth respect to long tme seres. SAX. are substantally shorter than n average heght, but clearly taller than n maxmum heght. The dynamc splttng strategy n can effectvely avod long branches. The small normalzed standard devaton values n clearly show that has good balance. Table 3 examnes the effect of the leaf capacty threshold ψ. The number of nodes and ndex sze of, SAX-, and SAX- 8 decrease dramatcally as ψ ncreases. SAX-6 and SAX- do not gan much from a larger leaf capacty threshold. One reason s that SAX. wth m segments may have up to m chldren nodes of the root node, though many such nodes may contan a very small number of tme seres. 5.3 Accuracy We tested the effectveness of the ndexes n smlarty search,

10 Data SAX-8 SAX- SAX-6 SAX- set Avg NSD Max Avg NSD Max Avg NSD Max Avg NSD Max Avg NSD Max S S S S R Table : Average heght (Avg), normalzed standard devaton (NSD), and maxmum length (Max) of the ndexes. ( S56 denotes the synthetc data set wth the length of tme seres 56, and R56 denotes the real data set wth length 56.) Leaf capacty SAX-8 SAX- SAX-6 SAX- threshold ψ # nodes Sze (MB) # nodes # nodes # nodes # nodes Table 3: Number of nodes and ndex sze (MB) versus leaf capacty threshold ψ. ncludng both heurstc search and exact search. The accuracy of heurstc search s measured by the error rate E = D D, where D D and D are the dstance between the query tme seres and the exact nearest neghbor and the heurstc search result, respectvely. For exact search, we compare the prunng power, whch s the rato of the number of tme seres pruned aganst the total number of tme seres. For both heurstc and exact search, tme seres were used as the queres, half of them pcked randomly from the data set, and the rest generated randomly. Fgures 7 and 8, respectvely, show the results on the synthetc and real data sets. In Fgure 7(a), although the error rate ncreases as the length of tme seres ncreases for all three methods, outperforms the others clearly. In Fgure 8(a), the error rate decreases for PAAndex and SAX. when the sze of segment ncreases, snce usng more segments can represent the tme seres more accurately. Wth the same segment sze, SAX. outperforms PAA-ndex. Interestngly, when the query tme seres s pcked from the data sets, both SAX. and correctly fnd the leaf node contanng the rght tme seres due to the dsjont space dvson property of these two approaches. PAA-ndex fnds the wrong node n some of such cases, because of the ntersecton of MBR n R-tree. When the query s generated randomly, s more accurate n fndng the correspondng leaf nodes than SAX. because of our dataadaptve splttng strategy. Fgures 7(b) and 8(b) show the prunng power of exact smlarty search. The prunng power of s greater than 95% on all synthetc data sets and s 98% on the real data set, whch s clearly better than those of the other two approaches. The prunng power of PAA-ndex ncreases dramatcally as the segment sze ncreases from 8 to. However, the margnal performance gan decreases as the segment sze ncreases further. A reason s that R-tree performs poorly wth hgh dmensonalty. SAX. has a smlar trend. The advantages of are from two factors. Frst, a tghter lower bound helps to prune more nodes. Second and more mportantly, the proposed data adaptve splttng strateges can cluster smlar tme seres better. Consequently, the heurstc search n s more accurate, whch gves a good startng pont n exact search. Moreover, fewer data fles are vsted snce smlar tme seres are clustered better nto fewer nodes. 5.4 Lower Bound Tghtness We tested the tghtness of the proposed lower bound estmaton approach. We measure the lower bound tghtness by the rato of the estmated lower bound dstance aganst the mnmum dstance from a query to all tme seres ndexed n a node. Ths rato s between Error Rate PAA 8 PAA PAA 6 PAA ISAX 8 ISAX ISAX 6 ISAX Length of Tme Seres (a) Heurstc search error rate Prunng Power Length of Tme Seres (b) Exact search prunng power Fgure 7: Error rate and prunng power on the synthetc data sets. Error Rate PAA 8 PAA PAA 6 PAA ISAX 8 ISAX ISAX 6 (a) Heurstc search error rate ISAX Prunng Power PAA 8 PAA PAA 6 PAA ISAX 8 ISAX ISAX 6 ISAX (b) Exact search prunng power Fgure 8: Error rate and prunng power on the real data set. and, the larger, the better. We collected ths nformaton durng the processng of exact search. Fgure 9 shows the results on both the synthetc and real data sets. The lower bound usng both mean and standard devaton values s tghter than that usng only mean values. 5.5 Hstogram Computaton Fgure compares the exact dstance hstogram by a full scan of the data and the dstance hstogram estmated the α-level method (Secton 4.). For the latter, three cases are shown. Full level uses all leaf nodes to estmate. /3 level and /3 level, respectvely, use nternal nodes located at the /3-level and /3-level cross sectons to compute the hstogram. Although the full and /3

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019 5-45/65: Desgn & Analyss of Algorthms January, 09 Lecture #3: Amortzed Analyss last changed: January 8, 09 Introducton In ths lecture we dscuss a useful form of analyss, called amortzed analyss, for problems

More information

3: Central Limit Theorem, Systematic Errors

3: Central Limit Theorem, Systematic Errors 3: Central Lmt Theorem, Systematc Errors 1 Errors 1.1 Central Lmt Theorem Ths theorem s of prme mportance when measurng physcal quanttes because usually the mperfectons n the measurements are due to several

More information

Tests for Two Correlations

Tests for Two Correlations PASS Sample Sze Software Chapter 805 Tests for Two Correlatons Introducton The correlaton coeffcent (or correlaton), ρ, s a popular parameter for descrbng the strength of the assocaton between two varables.

More information

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 0-80 /02-70 Computatonal Genomcs Normalzaton Gene Expresson Analyss Model Computatonal nformaton fuson Bologcal regulatory networks Pattern Recognton Data Analyss clusterng, classfcaton normalzaton, mss.

More information

Linear Combinations of Random Variables and Sampling (100 points)

Linear Combinations of Random Variables and Sampling (100 points) Economcs 30330: Statstcs for Economcs Problem Set 6 Unversty of Notre Dame Instructor: Julo Garín Sprng 2012 Lnear Combnatons of Random Varables and Samplng 100 ponts 1. Four-part problem. Go get some

More information

Tests for Two Ordered Categorical Variables

Tests for Two Ordered Categorical Variables Chapter 253 Tests for Two Ordered Categorcal Varables Introducton Ths module computes power and sample sze for tests of ordered categorcal data such as Lkert scale data. Assumng proportonal odds, such

More information

MgtOp 215 Chapter 13 Dr. Ahn

MgtOp 215 Chapter 13 Dr. Ahn MgtOp 5 Chapter 3 Dr Ahn Consder two random varables X and Y wth,,, In order to study the relatonshp between the two random varables, we need a numercal measure that descrbes the relatonshp The covarance

More information

Analysis of Variance and Design of Experiments-II

Analysis of Variance and Design of Experiments-II Analyss of Varance and Desgn of Experments-II MODULE VI LECTURE - 4 SPLIT-PLOT AND STRIP-PLOT DESIGNS Dr. Shalabh Department of Mathematcs & Statstcs Indan Insttute of Technology Kanpur An example to motvate

More information

Parallel Prefix addition

Parallel Prefix addition Marcelo Kryger Sudent ID 015629850 Parallel Prefx addton The parallel prefx adder presented next, performs the addton of two bnary numbers n tme of complexty O(log n) and lnear cost O(n). Lets notce the

More information

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode. Part 4 Measures of Spread IQR and Devaton In Part we learned how the three measures of center offer dfferent ways of provdng us wth a sngle representatve value for a data set. However, consder the followng

More information

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002 TO5 Networng: Theory & undamentals nal xamnaton Professor Yanns. orls prl, Problem [ ponts]: onsder a rng networ wth nodes,,,. In ths networ, a customer that completes servce at node exts the networ wth

More information

Dr. A. Sudhakaraiah* V. Rama Latha E.Gnana Deepika

Dr. A. Sudhakaraiah* V. Rama Latha E.Gnana Deepika Internatonal Journal Of Scentfc & Engneerng Research, Volume, Issue 6, June-0 ISSN - Splt Domnatng Set of an Interval Graph Usng an Algorthm. Dr. A. Sudhakaraah* V. Rama Latha E.Gnana Deepka Abstract :

More information

Quiz on Deterministic part of course October 22, 2002

Quiz on Deterministic part of course October 22, 2002 Engneerng ystems Analyss for Desgn Quz on Determnstc part of course October 22, 2002 Ths s a closed book exercse. You may use calculators Grade Tables There are 90 ponts possble for the regular test, or

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #21 Scrbe: Lawrence Dao Aprl 23, 2013 1 On-Lne Log Loss To recap the end of the last lecture, we have the followng on-lne problem wth N

More information

Production and Supply Chain Management Logistics. Paolo Detti Department of Information Engeneering and Mathematical Sciences University of Siena

Production and Supply Chain Management Logistics. Paolo Detti Department of Information Engeneering and Mathematical Sciences University of Siena Producton and Supply Chan Management Logstcs Paolo Dett Department of Informaton Engeneerng and Mathematcal Scences Unversty of Sena Convergence and complexty of the algorthm Convergence of the algorthm

More information

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem. Topcs on the Border of Economcs and Computaton December 11, 2005 Lecturer: Noam Nsan Lecture 7 Scrbe: Yoram Bachrach 1 Nash s Theorem We begn by provng Nash s Theorem about the exstance of a mxed strategy

More information

Price and Quantity Competition Revisited. Abstract

Price and Quantity Competition Revisited. Abstract rce and uantty Competton Revsted X. Henry Wang Unversty of Mssour - Columba Abstract By enlargng the parameter space orgnally consdered by Sngh and Vves (984 to allow for a wder range of cost asymmetry,

More information

Creating a zero coupon curve by bootstrapping with cubic splines.

Creating a zero coupon curve by bootstrapping with cubic splines. MMA 708 Analytcal Fnance II Creatng a zero coupon curve by bootstrappng wth cubc splnes. erg Gryshkevych Professor: Jan R. M. Röman 0.2.200 Dvson of Appled Mathematcs chool of Educaton, Culture and Communcaton

More information

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates Secton on Survey Research Methods An Applcaton of Alternatve Weghtng Matrx Collapsng Approaches for Improvng Sample Estmates Lnda Tompkns 1, Jay J. Km 2 1 Centers for Dsease Control and Preventon, atonal

More information

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique. 1.7.4 Mode Mode s the value whch occurs most frequency. The mode may not exst, and even f t does, t may not be unque. For ungrouped data, we smply count the largest frequency of the gven value. If all

More information

OPERATIONS RESEARCH. Game Theory

OPERATIONS RESEARCH. Game Theory OPERATIONS RESEARCH Chapter 2 Game Theory Prof. Bbhas C. Gr Department of Mathematcs Jadavpur Unversty Kolkata, Inda Emal: bcgr.umath@gmal.com 1.0 Introducton Game theory was developed for decson makng

More information

Chapter 3 Descriptive Statistics: Numerical Measures Part B

Chapter 3 Descriptive Statistics: Numerical Measures Part B Sldes Prepared by JOHN S. LOUCKS St. Edward s Unversty Slde 1 Chapter 3 Descrptve Statstcs: Numercal Measures Part B Measures of Dstrbuton Shape, Relatve Locaton, and Detectng Outlers Eploratory Data Analyss

More information

Capability Analysis. Chapter 255. Introduction. Capability Analysis

Capability Analysis. Chapter 255. Introduction. Capability Analysis Chapter 55 Introducton Ths procedure summarzes the performance of a process based on user-specfed specfcaton lmts. The observed performance as well as the performance relatve to the Normal dstrbuton are

More information

The Integration of the Israel Labour Force Survey with the National Insurance File

The Integration of the Israel Labour Force Survey with the National Insurance File The Integraton of the Israel Labour Force Survey wth the Natonal Insurance Fle Natale SHLOMO Central Bureau of Statstcs Kanfey Nesharm St. 66, corner of Bach Street, Jerusalem Natales@cbs.gov.l Abstact:

More information

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement CS 286r: Matchng and Market Desgn Lecture 2 Combnatoral Markets, Walrasan Equlbrum, Tâtonnement Matchng and Money Recall: Last tme we descrbed the Hungaran Method for computng a maxmumweght bpartte matchng.

More information

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost Tamkang Journal of Scence and Engneerng, Vol. 9, No 1, pp. 19 23 (2006) 19 Economc Desgn of Short-Run CSP-1 Plan Under Lnear Inspecton Cost Chung-Ho Chen 1 * and Chao-Yu Chou 2 1 Department of Industral

More information

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison ISyE 512 hapter 9 USUM and EWMA ontrol harts Instructor: Prof. Kabo Lu Department of Industral and Systems Engneerng UW-Madson Emal: klu8@wsc.edu Offce: Room 317 (Mechancal Engneerng Buldng) ISyE 512 Instructor:

More information

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME Vesna Radonć Đogatovć, Valentna Radočć Unversty of Belgrade Faculty of Transport and Traffc Engneerng Belgrade, Serba

More information

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of Module 8: Probablty and Statstcal Methods n Water Resources Engneerng Bob Ptt Unversty of Alabama Tuscaloosa, AL Flow data are avalable from numerous USGS operated flow recordng statons. Data s usually

More information

Introduction to PGMs: Discrete Variables. Sargur Srihari

Introduction to PGMs: Discrete Variables. Sargur Srihari Introducton to : Dscrete Varables Sargur srhar@cedar.buffalo.edu Topcs. What are graphcal models (or ) 2. Use of Engneerng and AI 3. Drectonalty n graphs 4. Bayesan Networks 5. Generatve Models and Samplng

More information

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers II. Random Varables Random varables operate n much the same way as the outcomes or events n some arbtrary sample space the dstncton s that random varables are smply outcomes that are represented numercally.

More information

Chapter 3 Student Lecture Notes 3-1

Chapter 3 Student Lecture Notes 3-1 Chapter 3 Student Lecture otes 3-1 Busness Statstcs: A Decson-Makng Approach 6 th Edton Chapter 3 Descrbng Data Usng umercal Measures 005 Prentce-Hall, Inc. Chap 3-1 Chapter Goals After completng ths chapter,

More information

Evaluating Performance

Evaluating Performance 5 Chapter Evaluatng Performance In Ths Chapter Dollar-Weghted Rate of Return Tme-Weghted Rate of Return Income Rate of Return Prncpal Rate of Return Daly Returns MPT Statstcs 5- Measurng Rates of Return

More information

Elements of Economic Analysis II Lecture VI: Industry Supply

Elements of Economic Analysis II Lecture VI: Industry Supply Elements of Economc Analyss II Lecture VI: Industry Supply Ka Hao Yang 10/12/2017 In the prevous lecture, we analyzed the frm s supply decson usng a set of smple graphcal analyses. In fact, the dscusson

More information

Spurious Seasonal Patterns and Excess Smoothness in the BLS Local Area Unemployment Statistics

Spurious Seasonal Patterns and Excess Smoothness in the BLS Local Area Unemployment Statistics Spurous Seasonal Patterns and Excess Smoothness n the BLS Local Area Unemployment Statstcs Keth R. Phllps and Janguo Wang Federal Reserve Bank of Dallas Research Department Workng Paper 1305 September

More information

Cyclic Scheduling in a Job shop with Multiple Assembly Firms

Cyclic Scheduling in a Job shop with Multiple Assembly Firms Proceedngs of the 0 Internatonal Conference on Industral Engneerng and Operatons Management Kuala Lumpur, Malaysa, January 4, 0 Cyclc Schedulng n a Job shop wth Multple Assembly Frms Tetsuya Kana and Koch

More information

Applications of Myerson s Lemma

Applications of Myerson s Lemma Applcatons of Myerson s Lemma Professor Greenwald 28-2-7 We apply Myerson s lemma to solve the sngle-good aucton, and the generalzaton n whch there are k dentcal copes of the good. Our objectve s welfare

More information

Survey of Math Test #3 Practice Questions Page 1 of 5

Survey of Math Test #3 Practice Questions Page 1 of 5 Test #3 Practce Questons Page 1 of 5 You wll be able to use a calculator, and wll have to use one to answer some questons. Informaton Provded on Test: Smple Interest: Compound Interest: Deprecaton: A =

More information

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed.

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed. Fnal Exam Fall 4 Econ 8-67 Closed Book. Formula Sheet Provded. Calculators OK. Tme Allowed: hours Please wrte your answers on the page below each queston. (5 ponts) Assume that the rsk-free nterest rate

More information

Available online at ScienceDirect. Procedia Computer Science 24 (2013 ) 9 14

Available online at   ScienceDirect. Procedia Computer Science 24 (2013 ) 9 14 Avalable onlne at www.scencedrect.com ScenceDrect Proceda Computer Scence 24 (2013 ) 9 14 17th Asa Pacfc Symposum on Intellgent and Evolutonary Systems, IES2013 A Proposal of Real-Tme Schedulng Algorthm

More information

Topics on the Border of Economics and Computation November 6, Lecture 2

Topics on the Border of Economics and Computation November 6, Lecture 2 Topcs on the Border of Economcs and Computaton November 6, 2005 Lecturer: Noam Nsan Lecture 2 Scrbe: Arel Procacca 1 Introducton Last week we dscussed the bascs of zero-sum games n strategc form. We characterzed

More information

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

Likelihood Fits. Craig Blocker Brandeis August 23, 2004 Lkelhood Fts Crag Blocker Brandes August 23, 2004 Outlne I. What s the queston? II. Lkelhood Bascs III. Mathematcal Propertes IV. Uncertantes on Parameters V. Mscellaneous VI. Goodness of Ft VII. Comparson

More information

Finance 402: Problem Set 1 Solutions

Finance 402: Problem Set 1 Solutions Fnance 402: Problem Set 1 Solutons Note: Where approprate, the fnal answer for each problem s gven n bold talcs for those not nterested n the dscusson of the soluton. 1. The annual coupon rate s 6%. A

More information

A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM

A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM Yugoslav Journal of Operatons Research Vol 19 (2009), Number 1, 157-170 DOI:10.2298/YUJOR0901157G A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM George GERANIS Konstantnos

More information

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 9.1. (a) In a log-log model the dependent and all explanatory varables are n the logarthmc form. (b) In the log-ln model the dependent varable

More information

Appendix - Normally Distributed Admissible Choices are Optimal

Appendix - Normally Distributed Admissible Choices are Optimal Appendx - Normally Dstrbuted Admssble Choces are Optmal James N. Bodurtha, Jr. McDonough School of Busness Georgetown Unversty and Q Shen Stafford Partners Aprl 994 latest revson September 00 Abstract

More information

Solution of periodic review inventory model with general constrains

Solution of periodic review inventory model with general constrains Soluton of perodc revew nventory model wth general constrans Soluton of perodc revew nventory model wth general constrans Prof Dr J Benkő SZIU Gödöllő Summary Reasons for presence of nventory (stock of

More information

Scribe: Chris Berlind Date: Feb 1, 2010

Scribe: Chris Berlind Date: Feb 1, 2010 CS/CNS/EE 253: Advanced Topcs n Machne Learnng Topc: Dealng wth Partal Feedback #2 Lecturer: Danel Golovn Scrbe: Chrs Berlnd Date: Feb 1, 2010 8.1 Revew In the prevous lecture we began lookng at algorthms

More information

ISE High Income Index Methodology

ISE High Income Index Methodology ISE Hgh Income Index Methodology Index Descrpton The ISE Hgh Income Index s desgned to track the returns and ncome of the top 30 U.S lsted Closed-End Funds. Index Calculaton The ISE Hgh Income Index s

More information

Members not eligible for this option

Members not eligible for this option DC - Lump sum optons R6.1 Uncrystallsed funds penson lump sum An uncrystallsed funds penson lump sum, known as a UFPLS (also called a FLUMP), s a way of takng your penson pot wthout takng money from a

More information

Finite Math - Fall Section Future Value of an Annuity; Sinking Funds

Finite Math - Fall Section Future Value of an Annuity; Sinking Funds Fnte Math - Fall 2016 Lecture Notes - 9/19/2016 Secton 3.3 - Future Value of an Annuty; Snkng Funds Snkng Funds. We can turn the annutes pcture around and ask how much we would need to depost nto an account

More information

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE) ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE) May 17, 2016 15:30 Frst famly name: Name: DNI/ID: Moble: Second famly Name: GECO/GADE: Instructor: E-mal: Queston 1 A B C Blank Queston 2 A B C Blank Queston

More information

Understanding price volatility in electricity markets

Understanding price volatility in electricity markets Proceedngs of the 33rd Hawa Internatonal Conference on System Scences - 2 Understandng prce volatlty n electrcty markets Fernando L. Alvarado, The Unversty of Wsconsn Rajesh Rajaraman, Chrstensen Assocates

More information

SIMPLE FIXED-POINT ITERATION

SIMPLE FIXED-POINT ITERATION SIMPLE FIXED-POINT ITERATION The fed-pont teraton method s an open root fndng method. The method starts wth the equaton f ( The equaton s then rearranged so that one s one the left hand sde of the equaton

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Informaton Retreval Federated Search 22 March 2016 Prof. Chrs Clfton Federated Search Outlne Introducton to federated search Man research problems Resource Representaton Resource Selecton Results

More information

Survey of Math: Chapter 22: Consumer Finance Borrowing Page 1

Survey of Math: Chapter 22: Consumer Finance Borrowing Page 1 Survey of Math: Chapter 22: Consumer Fnance Borrowng Page 1 APR and EAR Borrowng s savng looked at from a dfferent perspectve. The dea of smple nterest and compound nterest stll apply. A new term s the

More information

4. Greek Letters, Value-at-Risk

4. Greek Letters, Value-at-Risk 4 Greek Letters, Value-at-Rsk 4 Value-at-Rsk (Hull s, Chapter 8) Math443 W08, HM Zhu Outlne (Hull, Chap 8) What s Value at Rsk (VaR)? Hstorcal smulatons Monte Carlo smulatons Model based approach Varance-covarance

More information

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da *

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da * Copyrght by Zh Da and Rav Jagannathan Teachng Note on For Model th a Ve --- A tutoral Ths verson: May 5, 2005 Prepared by Zh Da * Ths tutoral demonstrates ho to ncorporate economc ves n optmal asset allocaton

More information

A Bootstrap Confidence Limit for Process Capability Indices

A Bootstrap Confidence Limit for Process Capability Indices A ootstrap Confdence Lmt for Process Capablty Indces YANG Janfeng School of usness, Zhengzhou Unversty, P.R.Chna, 450001 Abstract The process capablty ndces are wdely used by qualty professonals as an

More information

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect Transport and Road Safety (TARS) Research Joanna Wang A Comparson of Statstcal Methods n Interrupted Tme Seres Analyss to Estmate an Interventon Effect Research Fellow at Transport & Road Safety (TARS)

More information

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics Unversty of Illnos Fall 08 ECE 586GT: Problem Set : Problems and Solutons Unqueness of Nash equlbra, zero sum games, evolutonary dynamcs Due: Tuesday, Sept. 5, at begnnng of class Readng: Course notes,

More information

Risk and Return: The Security Markets Line

Risk and Return: The Security Markets Line FIN 614 Rsk and Return 3: Markets Professor Robert B.H. Hauswald Kogod School of Busness, AU 1/25/2011 Rsk and Return: Markets Robert B.H. Hauswald 1 Rsk and Return: The Securty Markets Lne From securtes

More information

Members not eligible for this option

Members not eligible for this option DC - Lump sum optons R6.2 Uncrystallsed funds penson lump sum An uncrystallsed funds penson lump sum, known as a UFPLS (also called a FLUMP), s a way of takng your penson pot wthout takng money from a

More information

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999 FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS by Rchard M. Levch New York Unversty Stern School of Busness Revsed, February 1999 1 SETTING UP THE PROBLEM The bond s beng sold to Swss nvestors for a prce

More information

Random Variables. b 2.

Random Variables. b 2. Random Varables Generally the object of an nvestgators nterest s not necessarly the acton n the sample space but rather some functon of t. Techncally a real valued functon or mappng whose doman s the sample

More information

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9 Elton, Gruber, Brown, and Goetzmann Modern Portfolo Theory and Investment Analyss, 7th Edton Solutons to Text Problems: Chapter 9 Chapter 9: Problem In the table below, gven that the rskless rate equals

More information

Financial mathematics

Financial mathematics Fnancal mathematcs Jean-Luc Bouchot jean-luc.bouchot@drexel.edu February 19, 2013 Warnng Ths s a work n progress. I can not ensure t to be mstake free at the moment. It s also lackng some nformaton. But

More information

Chapter 10 Making Choices: The Method, MARR, and Multiple Attributes

Chapter 10 Making Choices: The Method, MARR, and Multiple Attributes Chapter 0 Makng Choces: The Method, MARR, and Multple Attrbutes INEN 303 Sergy Butenko Industral & Systems Engneerng Texas A&M Unversty Comparng Mutually Exclusve Alternatves by Dfferent Evaluaton Methods

More information

Chapter 5 Student Lecture Notes 5-1

Chapter 5 Student Lecture Notes 5-1 Chapter 5 Student Lecture Notes 5-1 Basc Busness Statstcs (9 th Edton) Chapter 5 Some Important Dscrete Probablty Dstrbutons 004 Prentce-Hall, Inc. Chap 5-1 Chapter Topcs The Probablty Dstrbuton of a Dscrete

More information

>1 indicates country i has a comparative advantage in production of j; the greater the index, the stronger the advantage. RCA 1 ij

>1 indicates country i has a comparative advantage in production of j; the greater the index, the stronger the advantage. RCA 1 ij 69 APPENDIX 1 RCA Indces In the followng we present some maor RCA ndces reported n the lterature. For addtonal varants and other RCA ndces, Memedovc (1994) and Vollrath (1991) provde more thorough revews.

More information

New Distance Measures on Dual Hesitant Fuzzy Sets and Their Application in Pattern Recognition

New Distance Measures on Dual Hesitant Fuzzy Sets and Their Application in Pattern Recognition Journal of Artfcal Intellgence Practce (206) : 8-3 Clausus Scentfc Press, Canada New Dstance Measures on Dual Hestant Fuzzy Sets and Ther Applcaton n Pattern Recognton L Xn a, Zhang Xaohong* b College

More information

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8 Department of Economcs Prof. Gustavo Indart Unversty of Toronto November 9, 2006 SOLUTION ECO 209Y MACROECONOMIC THEORY Term Test #1 A LAST NAME FIRST NAME STUDENT NUMBER Crcle your secton of the course:

More information

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8 Department of Economcs Prof. Gustavo Indart Unversty of Toronto November 9, 2006 SOLUTION ECO 209Y MACROECONOMIC THEORY Term Test #1 C LAST NAME FIRST NAME STUDENT NUMBER Crcle your secton of the course:

More information

Homework 9: due Monday, 27 October, 2008

Homework 9: due Monday, 27 October, 2008 PROBLEM ONE Homework 9: due Monday, 7 October, 008. (Exercses from the book, 6 th edton, 6.6, -3.) Determne the number of dstnct orderngs of the letters gven: (a) GUIDE (b) SCHOOL (c) SALESPERSONS. (Exercses

More information

Problem Set 6 Finance 1,

Problem Set 6 Finance 1, Carnege Mellon Unversty Graduate School of Industral Admnstraton Chrs Telmer Wnter 2006 Problem Set 6 Fnance, 47-720. (representatve agent constructon) Consder the followng two-perod, two-agent economy.

More information

Clearing Notice SIX x-clear Ltd

Clearing Notice SIX x-clear Ltd Clearng Notce SIX x-clear Ltd 1.0 Overvew Changes to margn and default fund model arrangements SIX x-clear ( x-clear ) s closely montorng the CCP envronment n Europe as well as the needs of ts Members.

More information

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis Appled Mathematcal Scences, Vol. 7, 013, no. 99, 4909-4918 HIKARI Ltd, www.m-hkar.com http://dx.do.org/10.1988/ams.013.37366 Interval Estmaton for a Lnear Functon of Varances of Nonnormal Dstrbutons that

More information

02_EBA2eSolutionsChapter2.pdf 02_EBA2e Case Soln Chapter2.pdf

02_EBA2eSolutionsChapter2.pdf 02_EBA2e Case Soln Chapter2.pdf 0_EBAeSolutonsChapter.pdf 0_EBAe Case Soln Chapter.pdf Chapter Solutons: 1. a. Quanttatve b. Categorcal c. Categorcal d. Quanttatve e. Categorcal. a. The top 10 countres accordng to GDP are lsted below.

More information

Supplementary material for Non-conjugate Variational Message Passing for Multinomial and Binary Regression

Supplementary material for Non-conjugate Variational Message Passing for Multinomial and Binary Regression Supplementary materal for Non-conjugate Varatonal Message Passng for Multnomal and Bnary Regresson October 9, 011 1 Alternatve dervaton We wll focus on a partcular factor f a and varable x, wth the am

More information

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 2013 MODULE 7 : Tme seres and ndex numbers Tme allowed: One and a half hours Canddates should answer THREE questons.

More information

Hierarchical Complexity Control of Motion Estimation for H.264/AVC

Hierarchical Complexity Control of Motion Estimation for H.264/AVC MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Herarchcal Complexty Control of Moton Estmaton for H.264/AVC Changsung Km, Jun n, Anthony Vetro TR2006-004 February 2006 Abstract The latest

More information

Raising Food Prices and Welfare Change: A Simple Calibration. Xiaohua Yu

Raising Food Prices and Welfare Change: A Simple Calibration. Xiaohua Yu Rasng Food Prces and Welfare Change: A Smple Calbraton Xaohua Yu Professor of Agrcultural Economcs Courant Research Centre Poverty, Equty and Growth Unversty of Göttngen CRC-PEG, Wlhelm-weber-Str. 2 3773

More information

AC : THE DIAGRAMMATIC AND MATHEMATICAL APPROACH OF PROJECT TIME-COST TRADEOFFS

AC : THE DIAGRAMMATIC AND MATHEMATICAL APPROACH OF PROJECT TIME-COST TRADEOFFS AC 2008-1635: THE DIAGRAMMATIC AND MATHEMATICAL APPROACH OF PROJECT TIME-COST TRADEOFFS Kun-jung Hsu, Leader Unversty Amercan Socety for Engneerng Educaton, 2008 Page 13.1217.1 Ttle of the Paper: The Dagrammatc

More information

Optimising a general repair kit problem with a service constraint

Optimising a general repair kit problem with a service constraint Optmsng a general repar kt problem wth a servce constrant Marco Bjvank 1, Ger Koole Department of Mathematcs, VU Unversty Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands Irs F.A. Vs Department

More information

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8 Announcements: Quz starts after class today, ends Monday Last chance to take probablty survey ends Sunday mornng. Next few lectures: Today, Sectons 8.1 to 8. Monday, Secton 7.7 and extra materal Wed, Secton

More information

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x Whch of the followng provdes the most reasonable approxmaton to the least squares regresson lne? (a) y=50+10x (b) Y=50+x (c) Y=10+50x (d) Y=1+50x (e) Y=10+x In smple lnear regresson the model that s begn

More information

Lecture Note 2 Time Value of Money

Lecture Note 2 Time Value of Money Seg250 Management Prncples for Engneerng Managers Lecture ote 2 Tme Value of Money Department of Systems Engneerng and Engneerng Management The Chnese Unversty of Hong Kong Interest: The Cost of Money

More information

Equilibrium in Prediction Markets with Buyers and Sellers

Equilibrium in Prediction Markets with Buyers and Sellers Equlbrum n Predcton Markets wth Buyers and Sellers Shpra Agrawal Nmrod Megddo Benamn Armbruster Abstract Predcton markets wth buyers and sellers of contracts on multple outcomes are shown to have unque

More information

UNIVERSITY OF VICTORIA Midterm June 6, 2018 Solutions

UNIVERSITY OF VICTORIA Midterm June 6, 2018 Solutions UIVERSITY OF VICTORIA Mdterm June 6, 08 Solutons Econ 45 Summer A0 08 age AME: STUDET UMBER: V00 Course ame & o. Descrptve Statstcs and robablty Economcs 45 Secton(s) A0 CR: 3067 Instructor: Betty Johnson

More information

CHAPTER 3: BAYESIAN DECISION THEORY

CHAPTER 3: BAYESIAN DECISION THEORY CHATER 3: BAYESIAN DECISION THEORY Decson makng under uncertanty 3 rogrammng computers to make nference from data requres nterdscplnary knowledge from statstcs and computer scence Knowledge of statstcs

More information

OCR Statistics 1 Working with data. Section 2: Measures of location

OCR Statistics 1 Working with data. Section 2: Measures of location OCR Statstcs 1 Workng wth data Secton 2: Measures of locaton Notes and Examples These notes have sub-sectons on: The medan Estmatng the medan from grouped data The mean Estmatng the mean from grouped data

More information

Introduction. Chapter 7 - An Introduction to Portfolio Management

Introduction. Chapter 7 - An Introduction to Portfolio Management Introducton In the next three chapters, we wll examne dfferent aspects of captal market theory, ncludng: Brngng rsk and return nto the pcture of nvestment management Markowtz optmzaton Modelng rsk and

More information

2) In the medium-run/long-run, a decrease in the budget deficit will produce:

2) In the medium-run/long-run, a decrease in the budget deficit will produce: 4.02 Quz 2 Solutons Fall 2004 Multple-Choce Questons ) Consder the wage-settng and prce-settng equatons we studed n class. Suppose the markup, µ, equals 0.25, and F(u,z) = -u. What s the natural rate of

More information

Understanding Annuities. Some Algebraic Terminology.

Understanding Annuities. Some Algebraic Terminology. Understandng Annutes Ma 162 Sprng 2010 Ma 162 Sprng 2010 March 22, 2010 Some Algebrac Termnology We recall some terms and calculatons from elementary algebra A fnte sequence of numbers s a functon of natural

More information

Proceedings of the 2nd International Conference On Systems Engineering and Modeling (ICSEM-13)

Proceedings of the 2nd International Conference On Systems Engineering and Modeling (ICSEM-13) Proceedngs of the 2nd Internatonal Conference On Systems Engneerng and Modelng (ICSEM-13) Research on the Proft Dstrbuton of Logstcs Company Strategc Allance Based on Shapley Value Huang Youfang 1, a,

More information

Multifactor Term Structure Models

Multifactor Term Structure Models 1 Multfactor Term Structure Models A. Lmtatons of One-Factor Models 1. Returns on bonds of all maturtes are perfectly correlated. 2. Term structure (and prces of every other dervatves) are unquely determned

More information

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization CS 234r: Markets for Networks and Crowds Lecture 4 Auctons, Mechansms, and Welfare Maxmzaton Sngle-Item Auctons Suppose we have one or more tems to sell and a pool of potental buyers. How should we decde

More information

Real Exchange Rate Fluctuations, Wage Stickiness and Markup Adjustments

Real Exchange Rate Fluctuations, Wage Stickiness and Markup Adjustments Real Exchange Rate Fluctuatons, Wage Stckness and Markup Adjustments Yothn Jnjarak and Kanda Nakno Nanyang Technologcal Unversty and Purdue Unversty January 2009 Abstract Motvated by emprcal evdence on

More information

Information Flow and Recovering the. Estimating the Moments of. Normality of Asset Returns

Information Flow and Recovering the. Estimating the Moments of. Normality of Asset Returns Estmatng the Moments of Informaton Flow and Recoverng the Normalty of Asset Returns Ané and Geman (Journal of Fnance, 2000) Revsted Anthony Murphy, Nuffeld College, Oxford Marwan Izzeldn, Unversty of Lecester

More information

Fast Laplacian Solvers by Sparsification

Fast Laplacian Solvers by Sparsification Spectral Graph Theory Lecture 19 Fast Laplacan Solvers by Sparsfcaton Danel A. Spelman November 9, 2015 Dsclamer These notes are not necessarly an accurate representaton of what happened n class. The notes

More information