Optimal Black-Box Reductions Between Optimization Objectives

Size: px
Start display at page:

Download "Optimal Black-Box Reductions Between Optimization Objectives"

Transcription

1 Optmal Black-Box Reductons Between Optmzaton Objectves Zeyuan Allen-Zhu Prnceton Unversty Elad Hazan Prnceton Unversty arxv:163.56v3 [math.oc] May 16 frst crculated on February 5, 16 Abstract The dverse world of machne learnng applcatons has gven rse to a plethora of algorthms and optmzaton methods, fnely tuned to the specfc regresson or classfcaton task at hand. We reduce the complexty of algorthm desgn for machne learnng by reductons: we develop reductons that take a method developed for one settng and apply t to the entre spectrum of smoothness and strong-convexty n applcatons. Furthermore, unlke exstng results, our new reductons are optmal and more practcal. We show how these new reductons gve rse to new and faster runnng tmes on tranng lnear classfers for varous famles of loss functons, and conclude wth experments showng ther successes also n practce. 1 Introducton The basc machne learnng problem of mnmzng a regularzer plus a loss functon comes n numerous dfferent varatons and names. Examples nclude Rdge Regresson, Lasso, Support Vector Machne (SVM), Logstc Regresson and many others. A multtude of optmzaton methods were ntroduced for these problems, but n most cases specalzed to very partcular problem settngs. Such specalzatons appear necessary snce objectve functons for dfferent classfcaton and regularzaton tasks admn dfferent convexty and smoothness parameters. We lst below a few recent algorthms along wth ther applcable settngs. Varance-reducton methods such as SAGA and SVRG [7, 1] ntrnscally requre the objectve to be smooth, and do not work for non-smooth problems lke SVM. Ths s because for loss functons such as hnge loss, no unbased gradent estmator can acheve a varance that approaches to zero. Dual methods such as SDCA or APCG [18, 8] ntrnscally requre the objectve to be strongly convex (SC), and do not drectly apply to non-sc problems. Ths s because for a non-sc objectve such as Lasso, ts dual s not even be well-defned. Prmal-dual methods such as SPDC [3] requre the objectve to be both smooth and SC. Many other algorthms are only analyzed for both smooth and SC objectves [5, 1, 15]. In ths paper we nvestgate whether such specalzatons are nherent. Is t possble to take a convex optmzaton algorthm desgned for one problem, and apply t to dfferent classfcaton or Frst appeared on ArXv on March 17, 16. Corrected a few typos n ths most recent verson. 1

2 regresson settngs n a black-box manner? Such a reducton should deally take full and optmal advantage of the objectve propertes, namely strong-convexty and smoothness, for each settng. Unfortunately, exstng reductons are stll very lmted for at least two reasons. Frst, they ncur at least a logarthmc factor log(1/ε) n the runnng tme so leadng only to suboptmal convergence rates. 1 Second, after applyng exstng reductons, algorthms become based so the objectve value does not converge to the global mnmum. These theoretcal concerns also translate nto runnng tme losses and parameter tunng dffcultes n practce. In ths paper, we develop new and optmal regularzaton and smoothng reductons that can shave off a non-optmal log(1/ε) factor produce unbased algorthms Besdes such techncal advantages, our new reductons also enable researchers to focus on desgnng algorthms for only one settng but nfer optmal results more broadly. Ths s opposed to results such as [, 3] where the authors develop ad hoc technques to tweak specfc algorthms, rather than all algorthms, and apply them to other settngs wthout losng extra factors and wthout ntroducng bas. Our new reductons also enable researchers to prove lower bounds more broadly [3]. 1.1 Formal Settng and Classcal Approaches Consder mnmzng a composte objectve functon { def mn F (x) = f(x) + ψ(x) }, (1.1) x R d where f(x) s a dfferentable convex functon and ψ(x) s a relatvely smple (but possbly nondfferentable) convex functon, sometmes referred to as the proxmal functon. Our goal s to fnd a pont x R d satsfyng F (x) F (x ) + ε, where x s a mnmzer of F. In most classfcaton and regresson problems, f(x) can be wrtten as f(x) = 1 n n =1 f ( x, a ) where each a R d s a feature vector. We refer to ths as the fnte-sum case of (1.1). Classcal Regularzaton Reducton. Gven a non-sc F (x), one can defne a new objectve F (x) def = F (x) + σ x x n whch σ s on the order of ε. In order to mnmze F (x), the classcal regularzaton reducton calls an oracle algorthm to mnmze F (x) nstead, and ths oracle only needs to work wth SC functons. Example. If F s L-smooth, one can apply accelerated gradent descent to mnmze F and obtan an algorthm that converges n O( L/ε log 1 ε ) teratons n terms of mnmzng the orgnal F. Ths complexty has a suboptmal dependence on ε and shall be mproved usng our new regularzaton reducton. Classcal Smoothng Reducton (fnte-sum case). Gven a non-smooth F (x) of a fnte-sum form, one can defne a smoothed varant f (α) = 1 Recall that obtanng the optmal convergence rate s one of the man goals n operatons research and machne learnng. For nstance, obtanng the optmal 1/ε rate for onlne learnng was a major breakthrough snce the log(1/ε)/ε rate was dscovered [11, 13, ]. Smoothng reducton s typcally appled to the fnte sum form only. Ths s because, for a general hgh dmensonal functon f(x), ts smoothed varant f(x) may not be effcently computable.

3 E v [ 1,1] [f (α + εv)] for each f (α) and let F (x) = 1 n n =1 f ( a, x ) + ψ(x). 3 In order to mnmze F (x), the classcal smoothng reducton calls an oracle algorthm to mnmze F (x) nstead, and ths oracle only needs to work wth smooth functons. Example. If F (x) s σ-sc and one apples accelerated gradent descent to mnmze F, ths yelds an algorthm that converges n O ( 1 σε log 1 ε) teratons for mnmzng the orgnal F (x). Agan, the addtonal factor log(1/ε) can be removed usng our new smoothng reducton. Besdes the non-optmalty, applyng the above two reductons gves only based algorthms. One has to tune the regularzaton or smoothng parameter, and the algorthm only converges to the mnmum of the regularzed or smoothed problem F (x), whch can be away from the true mnmzer of F (x) by a dstance proportonal to the parameter. Ths makes the reducton hard to use n practce. 1. Our New Results To ntroduce our new reductons, we frst defne a property on the oracle algorthm. Our Black-Box Oracle. Consder an algorthm A that mnmzes (1.1) when the objectve F s L-smooth and σ-sc. We say that A satsfes the homogenous objectve decrease (HOOD) property n tme Tme(L, σ) f, for every startng vector x, A produces an output x satsfyng F (x ) F (x ) F (x ) F (x ) n tme Tme(L, σ). In other words, A decreases the objectve value dstance to the mnmum by a constant factor n tme Tme(L, σ), regardless of how large or small F (x ) F (x ) s. We gve a few example algorthms that satsfy HOOD: Gradent descent and accelerated gradent descent satsfy HOOD wth Tme(L, σ) = O(L/σ) C and Tme(L, σ) = O( L/σ) C respectvely, where C s the tme needed to compute a gradent f(x) and perform a proxmal gradent update [1]. Many subsequent works n ths lne of research also satsfy HOOD, ncludng [3, 5, 1, 15]. SVRG and SAGA [1, 31] solve the fnte-sum form of (1.1) and satsfy HOOD wth Tme(L, σ) = O ( n + L/σ ) C 1 where C 1 s the tme needed to compute a stochastc gradent f (x) and perform a proxmal gradent update. Katyusha [1] solves the fnte-sum form of (1.1) and satsfes HOOD wth Tme(L, σ) = O ( n + nl/σ ) C1. AdaptReg. For objectves F (x) that are non-sc and L-smooth, our AdaptReg reducton calls the an oracle satsfyng HOOD a logarthmc number of tmes, each tme wth a SC objectve F (x) + σ x x for an exponentally decreasng value σ. In the end, AdaptReg produces an output x satsfyng F ( x) F (x ) ε wth a total runnng tme t= Tme(L, ε t ). Snce most algorthms have an nverse polynomal dependence on σ n Tme(L, σ), when summng up Tme(L, ε t ) for postve values t, we do not ncur the addtonal factor log(1/ε) as opposed to the old reducton. In addton, AdaptReg s an unbased and anytme algorthm. F ( x) converges to F (x ) as the tme goes wthout the necessty of changng parameters, so the algorthm can be nterrupted at any tme. We menton some theoretcal applcatons of AdaptReg: 3 More formally, one needs ths varant to satsfy f (α) f (α) ε for all α and be smooth at the same tme. Ths can be done at least n two classcal ways f f (α) s Lpschtz contnuous. One s to defne f (α) = E v [ 1,1] [f (α+εv)] as an ntegral of f over the scaled unt nterval, see for nstance Chapter.3 of [1], and the other s to defne f (α) = max β { β α f (β) ε α } usng the Fenchel dual f (β) of f (α), see for nstance []. 3

4 Applyng AdaptReg to SVRG, we obtan a runnng tme O ( n log 1 ε + L ) ε C1 for mnmzng fnte-sum, non-sc, and smooth objectves (such as Lasso and Logstc Regresson). Ths mproves on known theoretcal runnng tme obtaned by non-accelerated methods, ncludng O ( n log 1 ε + L ε log 1 ε) C1 through the old reducton, as well as O ( ) n+l ε C1 through drect methods such as SAGA [7] and SAG [5]. Applyng AdaptReg to Katyusha, we obtan a runnng tme O ( n log 1 ε + nl ε ) C1 for mnmzng fnte-sum, non-sc, and smooth objectves (such as Lasso and Logstc Regresson). Ths s the frst and only known stochastc method that converges wth the optmal 1/ ε rate (as opposed to log(1/ε)/ ε) for ths class of objectves. [1] Applyng AdaptReg to methods that do not orgnally work for non-sc objectves such as [5, 1, 15], we mprove ther runnng tmes by a factor of log(1/ε) for workng wth non-sc objectves. AdaptSmooth and JontAdaptRegSmooth. For objectves F (x) that are fnte-sum, σ-sc, but non-smooth, our AdaptSmooth reducton calls an oracle satsfyng HOOD a logarthmc number of tmes, each tme wth a smoothed varant of F (λ) (x) and an exponentally decreasng smoothng parameter λ. In the end, AdaptSmooth produces an output x satsfyng F ( x) F (x ) ε wth a total runnng tme t= Tme( 1 ε, σ). t Snce most algorthms have a polynomal dependence on L n Tme(L, σ), when summng up Tme( 1 ε, σ) for postve values t, we do not ncur an addtonal factor of log(1/ε) as opposed to t the old reducton. AdaptSmooth s also an unbased and anytme algorthm for the same reason as AdaptReg. In addton, AdaptReg and AdaptSmooth can effectvely work together, to solve fnte-sum, non- SC, and non-smooth case of (1.1), and we call ths reducton JontAdaptRegSmooth. We menton some theoretcal applcatons of AdaptSmooth and JontAdaptRegSmooth: Applyng AdaptReg to Katyusha, we obtan a runnng tme O ( n log 1 ε + n ) C1 σε for mnmzng fnte-sum, SC, and non-smooth objectves (such as SVM). Therefore, Katyusha combned wth AdaptReg s the frst and only known stochastc method that converges wth the optmal 1/ ε rate (as opposed to log(1/ε)/ ε) for ths class of objectves. [1] Applyng JontAdaptRegSmooth to Katyusha, we obtan a runnng tme O ( n log 1 ε + n ) ε C1 for mnmzng fnte-sum, SC, and non-smooth objectves (such as L1-SVM). Therefore, Katyusha combned wth JontAdaptRegSmooth s the frst and only known stochastc method that converges wth the optmal 1/ε rate (as opposed to log(1/ε)/ε) for ths class of objectves. [1] Theory vs. Practce. In theory, not all algorthms solvng (1.1) satsfy HOOD. Some machne learnng algorthms such as APCG [18], SPDC [3], AccSDCA [9] and SDCA [8] ether do not satsfy HOOD or ncur some addtonal log(l/σ) factor n ts runnng tme so cannot beneft from our new reductons n theory. For example, APCG solves the fnte-sum form of (1.1) and produces an output x satsfyng F (x) F (x ) ε n tme O (( n+ nl ( σ ) log L σε)) C1. Ths runnng tme does not have a logarthmc dependence on ε that has the form log ( F (x ) F (x )) ε. In other words, APCG mght n prncple take a much longer runnng tme n order to decrease the objectve dstance to the mnmum from 1 to 1/, as compared to the tme needed to decrease from 1 1 to 1 1 /. Fortunately, although wthout theoretcal guarantee, these methods also beneft from our new reductons, and we nclude experments n ths paper to confrm such fndngs.

5 Related Works. Catalyst and APPA [9, 17] reductons turn non-accelerated methods nto accelerated ones. They can be used as regularzaton reductons too; however, n such a case they become dentcal to the tradtonal regularzaton reducton, and contnue to ntroduce bas and suffer from a log factor loss n the runnng tme. In fact, Catalyst and APPA fx the regularzaton parameter throughout the algorthm but our AdaptReg decreases t exponentally. Therefore, ther results cannot mply ours. PRISMA [3] turns Nesterov s accelerated gradent descent to work for non-smooth objectves wthout payng the log factor. However, PRISMA does not apply to all algorthms n a black-box manner so s not a reducton. Furthermore, PRISMA requres the algorthm to know the number of teratons n advance, whch AdaptSmooth does not. Roadmap. We nclude the descrpton and analyss of AdaptReg n Secton 3, but only nclude the descrpton of AdaptSmooth n Secton. We leave proofs as well as the descrpton and analyss of JontAdaptRegSmooth to the appendx. We nclude expermental results n Secton 6. Prelmnares In ths paper we denote by f(x) the full gradent of f f t s dfferentable, or the subgradent f f s only Lpschtz contnuous. Recall some classcal defntons on strong convexty and smoothness. Defnton.1 (smoothness and strong convexty). For a convex functon f : R n R, f s σ-strongly convex f x, y R n, t satsfes f(y) f(x) + f(x), y x + σ x y. f s L-smooth f x, y R n, t satsfes f(x) f(y) L x y. Characterzaton of SC and Smooth Regmes. In ths paper we gve numbers to the followng categores of objectves F (x) n (1.1). Each of them corresponds to some well-known tranng problems n machne learnng. (Lettng (a, b ) R d R be the -th feature vector and label.) Case 1: ψ(x) s σ-sc and f(x) s L-smooth. Examples: rdge regresson: f(x) = 1 n n =1 ( a, x b ) and ψ(x) = σ x. elastc net: f(x) = 1 n n =1 ( a, x b ) and ψ(x) = σ x + λ x 1. Case : ψ(x) s non-sc and f(x) s L-smooth. Examples: Lasso: f(x) = 1 n n =1 ( a, x b ) and ψ(x) = λ x 1. logstc regresson: f(x) = 1 n n =1 log(1 + exp( b a, x )) and ψ(x) = λ x 1. Case 3: ψ(x) s σ-sc and f(x) s non-smooth (but Lpschtz contnuous). Examples: SVM : f(x) = 1 n n =1 max{, 1 b a, x } and ψ(x) = σ x. Case : ψ(x) s non-sc and f(x) s non-smooth (but Lpschtz contnuous). Examples: l 1 -SVM : f(x) = 1 n n =1 max{, 1 b a, x } and ψ(x) = λ x 1. Defnton. (HOOD property). We say an algorthm A(F, x ) solvng Case 1 of problem (1.1) satsfes the homogenous objectve decrease (HOOD) property wth tme Tme(L, σ) f, for every startng pont x, t produces output x A(F, x ) such that F (x ) mn x F (x) F (x ) mn x F (x) n tme Tme(L, σ). Although our defnton s only for determnstc algorthms, f the guarantee s probablstc,.e., E [ F (x ) ] mn x F (x) F (x ) mn x F (x), all the results of ths paper reman true. 5

6 Algorthm 1 The AdaptReg Reducton Input: an objectve F ( ) n Case (smooth and not necessarly strongly convex); x a startng vector, σ an ntal regularzaton parameter, T the number of epochs; an algorthm A that solves Case 1 of problem (1.1). Output: x T. 1: x x. : for t to T 1 do 3: Defne F (σt) (x) def = σt x x + F (x). : x t+1 A(F (σt), x t ). 5: σ t+1 σ t /. 6: end for 7: return x T. In ths paper, we denote by C the tme needed for computng a full gradent f(x) and performng a proxmal gradent update of the form x { arg mn 1 x x x +η( f(x), x x +ψ(x)) }. For the fnte-sum case of problem (1.1), we denote by C 1 the tme needed for computng a stochastc (sub-)gradent f ( a, x ) and performng a proxmal gradent update of the form x { arg mn 1 x x x + η( f ( a, x )a, x x + ψ(x)) }. For fnte-sum forms of (1.1), C s usually on the magntude of n C 1. 3 AdaptReg: Reducton from Case to Case 1 We now focus on solvng Case of problem (1.1): that s, f( ) s L-smooth, but ψ( ) s not necessarly SC. We acheve so by reducng the problem to an algorthm A solvng Case 1 that satsfes HOOD. AdaptReg works as follows (see Algorthm 1). At the begnnng of AdaptReg, we set x to equal x, an arbtrary gven startng vector. AdaptReg conssts of T epochs. At each epoch t =, 1,..., T 1, we defne a σ t -strongly convex objectve F (σt) (x) def = σt x x + F (x). Here, the parameter σ t+1 = σ t / for each t and σ s an nput parameter to AdaptReg that wll be specfed later. We run A on F (σt) (x) wth startng vector x t n each epoch, and let the output be x t+1. After all T epochs are fnshed, AdaptReg smply outputs x T. We state our man theorem for AdaptReg below and prove t n Secton 3.1. Theorem 3.1 (AdaptReg). Suppose that n problem (1.1) f( ) s L-smooth. Let x be a startng vector such that F (x ) F (x ) and x x Θ. Then, AdaptReg wth σ = /Θ and T = log ( /ε) produces an output x T satsfyng F ( x T ) mn x F (x) O(ε) n a total runnng tme of T 1 t= Tme(L, σ t ). 5 Remark 3.. We compare the parameter tunng effort needed for AdaptReg aganst the classcal regularzaton reducton. In the classcal reducton, there are two parameters: T, the number of teratons that does not need tunng; and σ, whch had better equal ε/θ whch s an unknown quantty so requres tunng. In AdaptReg, we also need tune only one parameter, that s σ. Our T need not be tuned because AdaptReg can be nterrupted at any moment and x t of the current epoch can be outputted. In our experments later, we spent the same effort turnng σ n the classcal reducton and σ n AdaptReg. As t can be easly seen from the plots, tunng σ s much easer than σ. 5 If the HOOD property s only satsfed probablstcally as per Footnote, our error guarantee becomes probablstc,.e., E [ F ( x T ) ] mn x F (x) O(ε). Ths s also true for other reducton theorems of ths paper. 6

7 Corollary 3.3. When AdaptReg s appled to SVRG, we solve the fnte-sum case of Case wth runnng tme T 1 t= Tme(L, σ t ) = T 1 Lt t= O(n + σ ) C 1 = O(n log ε + LΘ ε ) C 1. Ths s faster than O (( n+ LΘ ) ε log ε ) C1 obtaned through the old reducton, and faster than O ( n+lθ ε ) C1 obtaned by SAGA [7] and SAG [5]. When AdaptReg s appled to Katyusha, we solve the fnte-sum case of Case wth runnng tme T 1 t= Tme(L, σ t ) = T 1 t= O(n + nl t σ ) C 1 = O(n log ε + nlθ/ε) C 1. Ths s faster than O (( n + nl/ε ) log ) ε C1 obtaned through the old reducton on Katyusha [1] Convergence Analyss for AdaptReg For analyss purpose, we defne x t+1 to be the exact mnmzer of F (σt) (x). The HOOD property of A ensures that F (σt) ( x t+1 ) F (σt) (x t+1 ) F (σt) ( x t ) F (σt) (x t+1 ). (3.1) We denote by x an arbtrary mnmzer of F (x), and the followng clam states a smple property about the mnmzers of F (σt) (x): Clam 3.. We have x t+1 x x x for each t. Proof. By the strong convexty of F (σt) (x) and the fact that x t+1 s ts exact mnmzer, we have F (σt) (x t+1 ) F (σt) (x ) σ t x t+1 x. Usng the fact that F (σt) (x t+1 ) F (x t+1 ), as well as the defnton F (σt) (x ) = σt x x +F (x ), we mmedately have σ t x x σ t x t+1 x F (x t+1 ) F (x ). def Defne D t = F (σt) ( x t ) F (σt) (x t+1 ) to be the ntal objectve dstance to the mnmum on functon F (σt) before we call A n epoch t. At epoch, we have upper bound D = F (σ) ( x ) mn x F (σ) (x) F (x ) F (x ). For each epoch t 1, we compute that 1 def D t = F (σt) ( x t ) F (σt) (x t+1 ) = F (σt 1) ( x t ) σ t 1 σ t x x t F (σt 1) (x t+1 ) + σ t 1 σ t x x t+1 F (σt 1) ( x t ) σ t 1 σ t x x t F (σt 1) (x t ) σ t 1 x t x t+1 + σ t 1 σ t x x t+1 F (σt 1) ( x t ) F (σt 1) (x t ) + σ t 1 σ t x x t+1 3 F (σt 1) ( x t ) F (σt 1) (x t ) + σ t 1 σ t ( x x + x t+1 x ) F (σt 1) ( x t ) F (σt 1) (x t ) + (σ t 1 σ t ) x x 5 D t 1 + (σ t 1 σ t ) x x 6 D t 1 = + σ t x x. Above, 1 follows from the defnton of F (σt) ( ) and F (σ t 1) ( ); follows from the strong convexty of F (σ t 1) ( ) as well as the fact that x t s ts mnmzer; 3 follows because for any two vectors a, b 6 If the old reducton s appled on APCG, SPDC, or AccSDCA rather than Katyusha, then two log factors wll be lost. 7

8 t satsfes a b a + b ; follows from Clam 3.; 5 follows from the defnton of D t 1 and (3.1); and 6 uses the choce that σ t = σ t 1 / for t 1. Recursvely applyng the above nequalty, we have D T D T + x x (σ T + σ T 1 + ) 1 T (F (x ) F (x )) + σ T x x, (3.) where the second nequalty uses our choce σ t = σ t 1 /. In sum, we obtan a vector x T satsfyng F ( x T ) F (x ) 1 F (σ T ) ( x T ) F (σ T ) (x ) + σ T x x F (σ T ) ( x T ) F (σ T ) (x T +1 ) + σ T x x 3 = DT + σ T x x 1 T (F (x ) F (x )) +.5σ T x x. (3.3) Above, 1 uses the fact that F (σ T ) (x) F (x) for every x; uses the defnton that x T +1 s the mnmzer of F (σ T ) ( ); 3 uses the defnton of D T ; and uses (3.). Fnally, after approprately choosng σ and T, (3.3) drectly mples Theorem 3.1. AdaptSmooth: Reducton from Case 3 to 1 We now focus on solvng the fnte-sum form of Case 3 for problem (1.1). That s, mn F (x) = 1 x n n f ( a, x ) + ψ(x), =1 where ψ(x) s σ-strongly convex and each f ( ) may not be smooth (but s Lpschtz contnuous). Wthout loss of generalty, we assume a = 1 for each [n] because otherwse one can scale f accordngly. We solve ths problem by reducng t to an oracle A whch solves the fnte-sum form of Case 1 and satsfes HOOD. Recall the followng defnton usng Fenchel conjugate: 7 Defnton.1. For each functon f : R R, let f def (β) = max α {α β f (α)} be ts Fenchel conjugate. Then, we defne the followng smoothed varant of f parameterzed by λ > : Accordngly, we defne f (λ) (α) def = max β F (λ) (x) def = 1 n { β α f (β) λ β}. n =1 f (λ) ( a, x ) + ψ(x). From the property of Fenchel conjugate (see for nstance the textbook [6]), we know that f (λ) ( ) s a (1/λ)-smooth functon and therefore the objectve F (λ) (x) falls nto the fnte-sum form of Case 1 for problem (1.1) wth smoothness parameter L = 1/λ. Our AdaptSmooth works as follows (see Algorthm n Appendx B). At the begnnng of AdaptSmooth, we set x to equal x, an arbtrary gven startng vector. AdaptSmooth conssts of 7 For every explctly gven f ( ), ths Fenchel conjugate can be symbolcally computed and fed nto the algorthm. Ths pre-process s needed for nearly all known algorthms n order for them to apply to non-smooth settngs (such as SVRG, SAGA, SPDC, APCG, SDCA, etc). SGD and ts strongly convex varant PEGASOS are the only known methods whch do not need ths computaton. However, they are not accelerated methods. 8

9 T epochs. At each epoch t =, 1,..., T 1, we defne a (1/λ t )-smooth objectve F (λt) (x) usng Defnton.1. Here, the parameter λ t+1 = λ t / for each t and λ s an nput parameter to AdaptSmooth that wll be specfed later. We run A on F (λt) (x) wth startng vector x t n each epoch, and let the output be x t+1. After all T epochs are fnshed, AdaptSmooth outputs x T. (Alternatvely, f one sets T to be nfnty, AdaptSmooth can be nterrupted at an arbtrary moment and output x t of the current epoch.) We state our man theorem for AdaptSmooth below and prove t n Appendx B. Theorem.. Suppose that n problem (1.1), ψ( ) s σ strongly convex and each f ( ) s G- Lpschtz contnuous. Let x be a startng vector such that F (x ) F (x ). Then, AdaptSmooth wth λ = /G and T = log ( /ε) produces an output x T satsfyng F ( x T ) mn x F (x) O(ε) n a total runnng tme of T 1 t= Tme(t /λ, σ). Remark.3. We emphasze that AdaptSmooth requres less parameter tunng effort than the old reducton for the same reason as n Remark 3.. Also, AdaptSmooth, when appled to Katyusha, provdes the fastest runnng tme on solvng the Case 3 fnte-sum form of (1.1), smlar to Corollary JontAdaptRegSmooth: From Case to 1 We show n Appendx C that AdaptReg and AdaptSmooth can work together to reduce the fntesum form of Case to Case 1. We call ths reducton JontAdaptRegSmooth and t reles on a jontly exponentally decreasng sequence of (σ t, λ t ), where σ t s the weght of the convexty parameter that we add on top of F (x), and λ t s the smoothng parameter that determnes how we change each f ( ). The analyss s analogous to a careful combnaton of the proofs for AdaptReg and AdaptSmooth. 6 Experments We perform experments to confrm our theoretcal speed-ups obtaned for AdaptSmooth and AdaptReg. We work on mnmzng Lasso and SVM objectves for the followng three well-known datasets that can be found on the LbSVM webste [8]: covtype, mnst, and rcv1. We defer some dataset and mplementaton detals to Appendx A. 6.1 Experments on AdaptReg To test the performance of AdaptReg, consder the Lasso objectve whch s non-sc but smooth. We apply AdaptReg to reduce t to Case 1 and apply ether APCG [18], an accelerated method, or (Prox-)SDCA [7, 8], a non-accelerated method. Let us make a few remarks: APCG and SDCA are both ndrect solvers for non-strongly convex objectves and therefore regularzaton s ntrnscally requred n order to run them for Lasso or more generally Case. 8 APCG and SDCA do not satsfy HOOD n theory. However, they stll beneft from AdaptReg as we shall see, demonstratng the practcal value of AdaptReg. A Practcal Implementaton. In prncple, one can mplement AdaptReg by settng the termnaton crtera of the oracle n the nner loop as precsely suggested by the theory, such as settng the number of teratons for SDCA to be exactly T = O(n + L σ t ) n the t-th epoch. However, n 8 Note that some other methods, such as SVRG, although only provdng theoretcal results for strongly convex and smooth objectves (Case 1), n practce works for Case drectly. Therefore, t s not needed to apply AdaptReg on such methods at least n practce. 9

10 E-5 1E-6 1E-7 1E-8 1E-9 1E-1 (a) covtype, λ = E-5 1E-6 1E-7 1E-8 (b) mnst, λ = E+ 1E-5 (c) rcv1, λ = 1 5 Fgure 1: Comparng AdaptReg and the classcal reducton on Lasso (wth l 1 regularzer weght λ). y-axs s the objectve dstance to mnmum, and x-axs s the number of passes to the dataset. The blue sold curves represent APCG under the old regularzaton reducton, and the red dashed curve represents APCG under AdaptReg. For other values of λ, or the results on SDCA, please refer to Fgure 3 and n the appendx. practce, t s more desrable to automatcally termnate the oracle whenever the objectve dstance to the mnmum has been suffcently decreased. In all of our experments, we smply compute the dualty gap and termnate the oracle whenever the dualty gap s below 1/ tmes the last recorded dualty gap of the prevous epoch. For detals see Appendx A. Expermental Results. For each dataset, we consder three dfferent magntudes of regularzaton weghts for the l 1 regularzer n the Lasso objectve. Ths totals 9 analyss tasks for each algorthm. σ For each such a task, we frst mplement the old reducton by addng an addtonal x term to the Lasso objectve and then apply APCG or SDCA. We consder values of σ n the set {1 k, 3 1 k : k Z} and show the most representatve sx of them n the plots (blue sold curves n Fgure 3 and Fgure ). Naturally, for a larger value of σ the old reducton converges faster but to a pont that s farther from the exact mnmzer because of the bas. We mplement AdaptReg where we choose the ntal parameter σ also from the set {1 k, 3 1 k : k Z} and present the best one n each of 18 plots (red dashed curves n Fgure 3 and Fgure ). Due to space lmtatons, we provde only 3 of the 18 plots for medum-szed λ n the man body of ths paper (see Fgure 1), and nclude Fgure 3 and only n the appendx. It s clear from our experments that AdaptReg s more effcent than the old regularzaton reducton; AdaptReg requres no more parameter tunng than the classcal reducton; AdaptReg s unbased so smplfes the parameter selecton procedure Experments on AdaptSmooth To test the performance of AdaptSmooth, consder the SVM objectve whch s non-smooth but SC. We apply AdaptSmooth to reduce t to Case 1 and apply SVRG [1]. We emphasze that SVRG s an ndrect solver for non-smooth objectves and therefore regularzaton s ntrnscally requred n order to run SVRG for SVM or more generally for Case It s easy to determne the best σ n AdaptReg, and n contrast, n the old reducton f the desred error s somehow changed for the applcaton, one has to select a dfferent σ and restart the algorthm. 1 Note that some other methods, such as APCG or SDCA, although only provdng theoretcal guarantees for strongly convex and smooth objectves (Case 1), n practce work for Case drectly wthout smoothng (see for nstance the dscusson n [7]). Therefore, t s unnecessary to apply AdaptSmooth to such methods at least n practce. 1

11 E+ 1E-5 1E-6 1E-7 (a) covtype, σ = E+ 1E-5 1E-6 1E-7 (b) mnst, σ = E+ (c) rcv1, σ = 1 Fgure : Comparng AdaptSmooth and the classcal reducton on SVM (wth l regularzer weght λ). y-axs s the objectve dstance to mnmum, and x-axs s the number of passes to the dataset. The blue sold curves represent SVRG under the old smoothng reducton, and the red dashed curve represents SVRG under AdaptSmooth. For other values of σ, please refer to Fgure 5 n the appendx. A Practcal Implementaton. In prncple, one can mplement AdaptSmooth by settng the termnaton crtera of the oracle n the nner loop as precsely suggested by the theory, such as settng the number of teratons for SVRG to be exactly T = O(n + 1 σλ t ) n the t-th epoch. In practce, however, t s more desrable to automatcally termnate the oracle whenever the objectve dstance to the mnmum has been suffcently decreased. In all of our experments, we smply compute the Eucldean norm of the full gradent of the objectve, and termnate the oracle whenever the norm s below 1/3 tmes the last recorded Eucldean norm of the prevous epoch. For detals see Appendx A. Expermental Results. For each dataset, we consder three dfferent magntudes of regularzaton weghts for the l regularzer n the SVM objectve. Ths totals 9 analyss tasks. For each such a task, we frst mplement the old reducton by smoothng the hnge loss functons (usng Defnton.1) wth parameter λ > and then apply SVRG. We consder dfferent values of λ n the set {1 k, 3 1 k : k Z} and show the most representatve sx of them n the plots (blue sold curves n Fgure 5). Naturally, for a larger λ, the old reducton converges faster but to a pont that s farther from the exact mnmzer due to ts bas. We then mplement AdaptSmooth where we choose the ntal smoothng parameter λ also from the set {1 k, 3 1 k : k Z} and present the best one n each of the 9 plots (red dashed curves n Fgure 5). Due to space lmtatons, we provde only 3 of the 9 plots for small-szed σ n the man body of ths paper (see Fgure, and nclude Fgure 5 only n the appendx. It s clear from our experments that AdaptSmooth s more effcent than the old smoothng reducton, especally when the desred tranng error s small; AdaptSmooth requres no more parameter tunng than the classcal reducton; AdaptSmooth s unbased so smplfes the parameter selecton for the same reason as Footnote 9. Acknowledgements We thank Yang Yuan for very enlghtenng conversatons, and Alon Gonen for catchng a few typos n an earler verson of ths paper. Ths paper s partally supported by a Mcrosoft Research Grant, no

12 References [1] Zeyuan Allen-Zhu. Katyusha: Accelerated Varance Reducton for Faster SGD. ArXv e-prnts, abs/ , March 16. [] Zeyuan Allen-Zhu and Lorenzo Oreccha. Lnear couplng: An ultmate unfcaton of gradent and mrror descent. ArXv e-prnts, abs/ , July 1. [3] Zeyuan Allen-Zhu, Peter Rchtárk, Zheng Qu, and Yang Yuan. Even Faster Accelerated Coordnate Descent Usng Non-Unform Samplng. In ICML, 16. [] Zeyuan Allen-Zhu and Yang Yuan. Improved SVRG for Non-Strongly-Convex or Sum-of-Non- Convex Objectves. In ICML, 16. [5] Sébasten Bubeck, Yn Tat Lee, and Moht Sngh. A geometrc alternatve to Nesterov s accelerated gradent descent. ArXv e-prnts, abs/ , June 15. [6] Antonn Chambolle and Thomas Pock. A frst-order prmal-dual algorthm for convex problems wth applcatons to magng. Journal of Mathematcal Imagng and Vson, (1):1 15, 11. [7] Aaron Defazo, Francs Bach, and Smon Lacoste-Julen. SAGA: A Fast Incremental Gradent Method Wth Support for Non-Strongly Convex Composte Objectves. In NIPS, 1. [8] Rong-En Fan and Chh-Jen Ln. LIBSVM Data: Classfcaton, Regresson and Mult-label. Accessed: [9] Roy Frostg, Rong Ge, Sham M. Kakade, and Aaron Sdford. Un-regularzng: approxmate proxmal pont and faster stochastc algorthms for emprcal rsk mnmzaton. In ICML, volume 37, pages 1 8, 15. [1] Elad Hazan. DRAFT: Introducton to onlne convex optmmzaton. Foundatons and Trends n Machne Learnng, XX(XX):1 168, 15. [11] Elad Hazan and Satyen Kale. Beyond the regret mnmzaton barrer: Optmal algorthms for stochastc strongly-convex optmzaton. The Journal of Machne Learnng Research, 15(1):89 51, 1. [1] Re Johnson and Tong Zhang. Acceleratng stochastc gradent descent usng predctve varance reducton. In Advances n Neural Informaton Processng Systems, NIPS 13, pages , 13. [13] Smon Lacoste-Julen, Mark W. Schmdt, and Francs R. Bach. A smpler approach to obtanng an o(1/t) convergence rate for the projected stochastc subgradent method. ArXv e-prnts, abs/11., 1. [1] Yn Tat Lee and Aaron Sdford. Effcent accelerated coordnate descent methods and faster algorthms for solvng lnear systems. In FOCS, pages IEEE, 13. [15] Laurent Lessard, Benjamn Recht, and Andrew Packard. Analyss and desgn of optmzaton algorthms va ntegral quadratc constrants. CoRR, abs/ , 1. [16] Hongzhou Ln. prvate communcaton, 16. 1

13 [17] Hongzhou Ln, Julen Maral, and Zad Harchaou. A Unversal Catalyst for Frst-Order Optmzaton. In NIPS, 15. [18] Qhang Ln, Zhaosong Lu, and Ln Xao. An Accelerated Proxmal Coordnate Gradent Method and ts Applcaton to Regularzed Emprcal Rsk Mnmzaton. In NIPS, pages , 1. [19] Arkad Nemrovsk. Prox-Method wth Rate of Convergence O(1/t) for Varatonal Inequaltes wth Lpschtz Contnuous Monotone Operators and Smooth Convex-Concave Saddle Pont Problems. SIAM Journal on Optmzaton, 15(1):9 51, January. [] Yur Nesterov. A method of solvng a convex programmng problem wth convergence rate O(1/k ). In Doklady AN SSSR (translated as Sovet Mathematcs Doklady), volume 69, pages 53 57, [1] Yur Nesterov. Introductory Lectures on Convex Programmng Volume: A Basc course, volume I. Kluwer Academc Publshers,. [] Yur Nesterov. Smooth mnmzaton of non-smooth functons. Mathematcal Programmng, 13(1):17 15, December 5. [3] Francesco Orabona, Andreas Argyrou, and Nathan Srebro. Prsma: Proxmal teratve smoothng algorthm. arxv preprnt arxv:16.37, 1. [] Alexander Rakhln, Ohad Shamr, and Karthk Srdharan. Makng gradent descent optmal for strongly convex stochastc optmzaton. In ICML, 1. [5] Mark Schmdt, Ncolas Le Roux, and Francs Bach. Mnmzng fnte sums wth the stochastc average gradent. arxv preprnt arxv: , pages 1 5, 13. Prelmnary verson appeared n NIPS 1. [6] Sha Shalev-Shwartz. Onlne Learnng and Onlne Convex Optmzaton. Foundatons and Trends n Machne Learnng, ():17 19, 1. [7] Sha Shalev-Shwartz and Tong Zhang. Proxmal Stochastc Dual Coordnate Ascent. arxv preprnt arxv: , pages 1 18, 1. [8] Sha Shalev-Shwartz and Tong Zhang. Stochastc dual coordnate ascent methods for regularzed loss mnmzaton. Journal of Machne Learnng Research, 1: , 13. [9] Sha Shalev-Shwartz and Tong Zhang. Accelerated Proxmal Stochastc Dual Coordnate Ascent for Regularzed Loss Mnmzaton. In ICML, pages 6 7, 1. [3] Blake Woodworth and Nat Srebro. Tght Complexty Bounds for Optmzng Composte Objectves. Workng manuscrpt, 16. [31] Ln Xao and Tong Zhang. A Proxmal Stochastc Gradent Method wth Progressve Varance Reducton. SIAM Journal on Optmzaton, ():57-75, 1. [3] Yuchen Zhang and Ln Xao. Stochastc Prmal-Dual Coordnate Method for Regularzed Emprcal Rsk Mnmzaton. In ICML,

14 Appendx A Experment Detals The datasets we used n ths paper are downloaded from the LbSVM webste [8]: the covtype (bnary.scale) dataset (581, 1 samples and 5 features). the mnst (class 1) dataset (6, samples and 78 features). the rcv1 (tran.bnary) dataset (, samples and 7, 36 features). To make easer comparson across datasets, we scale every vector by the average Eucldean norm of all the vectors n the dataset. In other words, we ensure that the data vectors have an average Eucldean norm 1. Ths step s for comparson only and not necessary n practce. We use the default step-length choce for APCG whch requres solvng a quadratc unvarate functon per teraton; for SDCA, to avod the ssue for tunng step lengths, we use the steepest descent (.e., automatc) choce whch s Opton I for SDCA [7]; for SVRG, we use the default step length η = 1/L E-5 1E-5 1E-6 1E-6 1E-7 1E-7 1E-8 1E-8 1E-7 1E-9 1E-9 1E-8 1E-1 1E-1 1E (c) covtype, λ = E-6 6 1E-5 (b) covtype, λ = (a) covtype, λ = E-5 1E-5 1E-6 1E-5 1E-6 1E-7 1E-8 1E-7 1E-9 1E-8 (d) mnst, λ = 1 1E-6 1E-7 6 (e) mnst, λ = 1 8 1E E+ 1E+ (f) mnst, λ = 1 6 1E-5 1E-6 1E-7 1E-5 (g) rcv1, λ = 1 (h) rcv1, λ = 1 5 () rcv1, λ = 1 6 Fgure 3: Performance Comparson for Lasso wth weght λ on the `1 regularzer. The y axs represents the objectve dstance to mnmum, and the x axs represents the number of passes to the dataset. The blue sold curves represent APCG under the old regularzaton reducton, and the red dashed curve represents APCG under AdaptReg. 1

15 E+ 1E+ 1E+ 1E-5 1E-5 1E-6 1E-6 1E-7 1E-7 1E-7 1E-8 1E-8 1E-8 1E E-5 1E-5 1E-6 1E E-5 1E-6 6 (b) covtype, λ = (a) covtype, λ = (c) covtype, λ = E-7 1E-5 1E-7 (d) mnst, λ = 1 1E-6 6 (e) mnst, λ = 1 8 1E E+ (f) mnst, λ = E+ 1E-5 1E-6 1E-5 (g) rcv1, λ = 1 (h) rcv1, λ = 1 5 () rcv1, λ = 1 6 Fgure : Performance Comparson for Lasso wth weght λ on the `1 regularzer. The y axs represents the objectve dstance to mnmum, and the x axs represents the number of passes to the dataset. The blue sold curves represent SDCA under the old regularzaton reducton, and the red dashed curve represents SDCA under AdaptReg. When applyng our reductons, t s desrable to automatcally termnate the oracle whenever the objectve dstance to the mnmum has been suffcently decreased, say, by a factor of. Unfortunately, the oracle usually does not know the exact mnmzer and cannot compute the exact objectve dstance to the mnmum (.e., Dt ). Instead, we use the followng heurstcs whch were also used by other reducton methods such as Catalyst [16]. Snce SDCA and APCG are prmal-dual methods, n our experments, we compute nstead the dualty gap whch gves a reasonable approxmaton on Dt. More specfcally, for both experments, we compute the dualty gap every n/3 teratons nsde the mplementaton of APCG/SDCA, and termnate t whenever the dualty gap s below 1/ tmes the last recorded dualty gap of the prevous epoch. Although one can further tune ths parameter 1/ for a better performance, to perform a far comparson, we smply set t to be dentcally 1/ across all the datasets and analyss tasks. When applyng SVRG to Lasso, we cannot compute the dualty gap because the objectve s not strongly convex. In our experments, we compute nstead the Eucldean norm of the full gradent of the objectve (.e., k f (x)k) whch gves a reasonable approxmaton on Dt. More specfcally, we use the default settng of SVRG Opton I that s to compute a gradent 15

16 E+ 1E+ 1E+ 1E-5 1E-5 1E-5 1E-6 1E-6 1E-6 1E-7 1E-7 1E-7 (a) covtype, σ = (b) covtype, σ = E E+ (c) covtype, σ = 1 7 1E+ 1E-5 1E-6 1E-7 1E-5 (d) mnst, σ = 1 6 (e) mnst, σ = (f) mnst, σ = E+ 1E+ 1E+ (g) rcv1, σ = 1 6 (h) rcv1, σ = 1 5 () rcv1, σ = 1 6 Fgure 5: Performance Comparson for L-SVM wth weght σ on the ` regularzer. The y axs represents the objectve dstance to mnmum, and the x axs represents the number of passes to the dataset. The blue sold curves represent SVRG under the old smoothng reducton, and the red dashed curve represents SVRG under AdaptSmooth. snapshot every n teratons. When a gradent snapshot s computed, we can also compute ts Eucldean norm almost for free. If ths norm s below 1/3 tmes the last norm-of-gradent of the prevous epoch, we termnate SVRG for the current epoch. Note that one can further tune ths parameter 1/3 for a better performance; however, to perform a far comparson n ths paper, we smply set t to be dentcally 1/3 across all the datasets and analyss tasks. B Convergence Analyss for AdaptSmooth (λ) We frst recall the followng property that bounds the dfference between f and f of λ: Lemma B.1. If each f ( ) s G-Lpschtz contnuous, t satsfes f (α) def λg as a functon (λ) f (α) f (α). Proof. Lettng β = arg maxβ {β α f (β)}, we have β [ G, G] because the doman of f ( ) equals the range of f ( ) whch s a subset of [ G, G] due to the Lpschtz contnuty of f ( ). As 16

17 Algorthm The AdaptSmooth Reducton Input: an objectve F ( ) n fnte-sum form of Case 3 (strongly convex and not necessarly smooth); x a startng vector, λ an ntal smoothng parameter, T the number of epochs; an algorthm A that solves the fnte-sum form of Case 1 for problem (1.1). Output: x T. 1: x x. : for t to T 1 do 3: Defne F (λt) (x) def = 1 n : x t+1 A(F (λt), x t ). 5: λ t+1 λ t /. 6: end for 7: return x T. n =1 f (λt) ( a, x ) + ψ(x) usng Defnton.1. a result, we have f (α) = max β {β α f (β)} = β α f (β ) λ (β ) + λ (β ) max β The other nequalty s obvous. We also note that {β α f (β) λ β } + λ (β ) = f (λ) (α) + λ (β ) f (λ) (α) + λg. Fact B.. For λ 1 λ, we have f (λ 1) (α) f (λ ) (α) for every α R. For analyss purpose only, we defne x t+1 to be the exact mnmzer of F (λt) (x). The HOOD property of the gven oracle A ensures that F (λt) ( x t+1 ) F (λt) (x t+1 ) F (λt) ( x t ) F (λt) (x t+1 ). (B.1) We denote by x def the mnmzer of F (x), and defne D t = F (λt) ( x t ) F (λt) (x t+1 ) to be the ntal objectve dstance to the mnmum on functon F (λt) ( ) before we call A n epoch t. At epoch, we smply have the upper bound D = F (λ ) (x ) F (λ ) (x 1 ) F (x ) F (x 1 ) + λ G F (x ) F (x ) + λ G Above, the frst nequalty s by Lemma B.1 and Fact B., and the second nequalty s because x s the mnmzer of F ( ). Next, for each epoch t 1, we compute that. def D t = F (λt) ( x t ) F (λt) (x t+1 ) F (λt 1) ( x t ) + λ t 1G F (λt 1) (x t+1 ) F (λt 1) ( x t ) + λ t 1G F (λt 1) (x t ) D t 1 + λ t 1G. Above, the frst nequalty s by Lemma B.1 and Fact B., and the second nequalty s because x t s the mnmzer of F (λ t 1) ( ). 17

18 Therefore, by telescopng the above nequalty and the choce λ t = λ t 1 /, we have that D T F (x ) F (x ) ( T + G λt 1 + λ ) T + F (x ) F (x ) 8 T + λ T G. In sum, we obtan a vector x T satsfyng F ( x T ) F (x ) F (λ T ) ( x T ) F (λ T ) (x ) + λ T G F (λ T ) ( x T ) F (λ T ) (x T +1 ) + λ T G = D T + λ T G 1 T (F (x ) F (x )) +.5λ T G (B.) Fnally, after approprately choosng λ and T, (B.) drectly mples Theorem.. C JontAdaptRegSmooth: Reducton from Case to Case 1 In ths secton, we show that AdaptReg and AdaptSmooth can work together to solve the fnte-sum form of Case. That s, mn F (x) = 1 n f ( a, x ) + ψ(x), x n =1 where ψ(x) s not necessarly strongly convex and each f ( ) may not be smooth (but s Lpschtz contnuous). Wthout loss of generalty, we assume a = 1 for each [n]. We solve ths problem by reducng t to an algorthm A solvng the fnte-sum form of Case 1 that satsfes HOOD. Followng the same defnton of f (λ) ( ) n Defnton.1, n ths secton, we consder the followng regularzed smoothed objectve F (λ,σ) (x): Defnton C.1. Gven parameters λ, σ >, let F (λ,σ) (x) def = 1 n f (λ) ( a, x ) + ψ(x) + σ n x x. =1 From ths defnton we know that F (λ,σ) (x) falls nto the fnte-sum form of Case 1 for problem (1.1) wth L = 1/λ and σ beng the strong convexty parameter. JontAdaptRegSmooth works as follows (see Algorthm 3). At the begnnng of the reducton, we set x to equal x, an arbtrary gven startng vector. JontAdaptRegSmooth conssts of T epochs. At each epoch t =, 1,..., T 1, we defne a (1/λ t )-smooth σ t -strongly convex objectve F (λt,σt) (x) usng Defnton C.1 above. Here, the parameters λ t+1 = λ t / and σ t+1 = σ t / for each t, and λ, σ are two nput parameters to JontAdaptRegSmooth that wll be specfed later. We run A on F (λt,σt) (x) wth startng vector x t n each epoch, and let the output be x t+1. After all T epochs are fnshed, JontAdaptRegSmooth smply outputs x T. (Alternatvely, f one sets T to be nfnty, JontAdaptRegSmooth can be nterrupted at an arbtrary moment and output x t of the current epoch.) We state our man theorem for JontAdaptRegSmooth below and prove t n Secton D. Theorem C.. Suppose that n problem (1.1), each f ( ) s G-Lpschtz contnuous. Let x be a startng vector such that F (x ) F (x ) and x x Θ. Then, JontAdaptRegSmooth wth λ = /G, σ = /Θ and T = log ( /ε) produces an output x T satsfyng F ( x T ) mn x F (x) O(ε) n a total runnng tme of T 1 t= Tme(t /λ, σ t ). 18

19 Algorthm 3 The JontAdaptRegSmooth Reducton Input: an objectve F ( ) n fnte-sum form of Case (not necessarly strongly convex or smooth); x startng vector, λ, σ ntal smoothng and regularzaton params, T number of epochs; an algorthm A that solves the fnte-sum form of Case 1 for problem (1.1). Output: x T. 1: x x. : for t to T 1 do 3: Defne F (λt,σt) (x) def = 1 n : x t+1 A(F (λt), x t ). 5: σ t+1 σ t /, λ t+1 λ t /. 6: end for 7: return x T. n =1 f (λt,σt) ( a, x ) + ψ(x) + σt x x usng Defnton C.1. Example C.3. When JontAdaptRegSmooth s appled to an accelerated gradent descent method such as [, 5, 1, 15, ], we solve the fnte-sum form of Case wth a total runnng tme T 1 t= Tme(t /λ, σ t ) = O(Tme(1/λ T, σ T )) = O(G Θ/ε) C. Ths matches the best known runnng tme of full-gradent frst-order methods on solvng Case, whch usually s obtaned va saddle-pont based methods such as Chambolle-Pock [6] or the mrror prox method of Nemrovsk [19]. D Convergence Analyss for JontAdaptRegSmooth For analyss purpose only, we defne x t+1 to be the exact mnmzer of F (λt,σt) (x). The HOOD property of the gven oracle A ensures that F (λt,σt) ( x t+1 ) F (λt,σt) (x t+1 ) F (λt,σt) ( x t ) F (λt,σt) (x t+1 ) We denote by x an arbtrary mnmzer of F (x). The followng clam states a smple property about the mnmzers of F (λt,σt) (x) whch s analogous to Clam 3.: Clam D.1. We have σt x t+1 x σt x x + λtg for each t. Proof. By the strong convexty of F (λt,σt) (x) and the fact that x t+1 s ts exact mnmzer, we have F (λt,σt) (x t+1 ) F (λt,σt) (x ) σ t x t+1 x. Usng the fact that F (λt,σt) (x t+1 ) F (λt,) (x t+1 ) F (x t+1 ) + λtg (where the second nequalty follows from Lemma B.1), as well as the defnton F (λt,σt) (x ) = F (λt,) (x ) + σt x x F (x )+ σt x x (where the second nequalty agan follows from Lemma B.1), we mmedately have λ t G + σ t x x σ t x t+1 x F (x t+1 ) F (x ). def Let D t = F (λt,σt) ( x t ) F (λt,σt) (x t+1 ) be the ntal objectve dstance to the mnmum on functon F (λt,σt) ( ) before we call A n epoch t. At epoch, we smply have the upper bound. D = F (λ,σ ) (x ) F (λ,σ ) (x 1 ) 1 F (,σ ) (x ) F (λ,) (x 1 ) F (x ) F (x 1 ) + λ G 3 F (x ) F (x ) + λ G. 19

20 Above, 1 uses F (λ,σ ) (x ) F (,σ) (x ) whch s a consequence of Fact B.; uses F (,σ) (x ) = F (x ) from the defnton and F (x 1 ) F (λ,) (x 1 ) + λ G from Lemma B.1; and 3 uses the mnmalty of x. Next, for each epoch t 1, we compute that def D t = F (λt,σt) ( x t ) F (λt,σt) (x t+1 ) 1 F (λt,σt 1) ( x t ) F (λt,σt 1) (x t+1 ) + σ t 1 σ t x t+1 x F (λ t 1,σ t 1 ) ( x t ) + λ t 1G F (λ t 1,σ t 1 ) (x t+1 ) + σ t x t+1 x 3 F (λ t 1,σ t 1 ) ( x t ) + λ t 1G F (λ t 1,σ t 1 ) (x t+1 ) + σ t x t+1 x + σ t x x F (λ t 1,σ t 1 ) ( x t ) + λ t 1G F (λ t 1,σ t 1 ) (x t ) + σ t x x + λ t G 5 D t 1 + σ t x x + λ t G. Above, 1 follows from the defnton; follows from Lemma B.1, Fact B. as well as the choce σ t 1 = σ t ; 3 follows because for any two vectors a, b t satsfes a b a + b ; follows from Clam D.1; 5 follows from the defnton of D t 1, from (3.1), and from the choce λ t 1 = λ t. By telescopng the above nequalty, we have D T F (x ) F (x ) ( T + G λ T + λ ) ( T x x σ T + σ ) T T (F (x ) F (x )) + λ T G + σ T x x, where the second nequalty uses our choce λ t = λ t 1 / and σ t = σ t 1 / agan. In sum, we obtan a vector x T satsfyng F ( x T ) F (x ) 1 F (λ T,) ( x T ) F (,σ T ) (x ) + λ T G F (λ T,σ T ) ( x T ) F (λ T,σ T ) (x ) + λ T G 3 F (λ T,σ T ) ( x T ) F (λ T,σ T ) (x T +1 ) + λ T G + σ T x x + σ T x x + σ T x x 1 T (F (x ) F (x )) +.5λ T G +.5σ T x x. (D.1) Above, 1 uses Lemma B.1 and the defnton; uses the monotoncty and Fact B., 3 uses the defnton that x T +1 s the mnmzer of F (λ T,σ T ) ( ); and uses the defnton of D T and our derved upper bound. Fnally, after approprately choosng σ, λ and T, (D.1) mmedately mples Theorem C..

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #21 Scrbe: Lawrence Dao Aprl 23, 2013 1 On-Lne Log Loss To recap the end of the last lecture, we have the followng on-lne problem wth N

More information

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019 5-45/65: Desgn & Analyss of Algorthms January, 09 Lecture #3: Amortzed Analyss last changed: January 8, 09 Introducton In ths lecture we dscuss a useful form of analyss, called amortzed analyss, for problems

More information

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement CS 286r: Matchng and Market Desgn Lecture 2 Combnatoral Markets, Walrasan Equlbrum, Tâtonnement Matchng and Money Recall: Last tme we descrbed the Hungaran Method for computng a maxmumweght bpartte matchng.

More information

Quiz on Deterministic part of course October 22, 2002

Quiz on Deterministic part of course October 22, 2002 Engneerng ystems Analyss for Desgn Quz on Determnstc part of course October 22, 2002 Ths s a closed book exercse. You may use calculators Grade Tables There are 90 ponts possble for the regular test, or

More information

Appendix - Normally Distributed Admissible Choices are Optimal

Appendix - Normally Distributed Admissible Choices are Optimal Appendx - Normally Dstrbuted Admssble Choces are Optmal James N. Bodurtha, Jr. McDonough School of Busness Georgetown Unversty and Q Shen Stafford Partners Aprl 994 latest revson September 00 Abstract

More information

Scribe: Chris Berlind Date: Feb 1, 2010

Scribe: Chris Berlind Date: Feb 1, 2010 CS/CNS/EE 253: Advanced Topcs n Machne Learnng Topc: Dealng wth Partal Feedback #2 Lecturer: Danel Golovn Scrbe: Chrs Berlnd Date: Feb 1, 2010 8.1 Revew In the prevous lecture we began lookng at algorthms

More information

2.1 Rademacher Calculus... 3

2.1 Rademacher Calculus... 3 COS 598E: Unsupervsed Learnng Week 2 Lecturer: Elad Hazan Scrbe: Kran Vodrahall Contents 1 Introducton 1 2 Non-generatve pproach 1 2.1 Rademacher Calculus............................... 3 3 Spectral utoencoders

More information

Tests for Two Correlations

Tests for Two Correlations PASS Sample Sze Software Chapter 805 Tests for Two Correlatons Introducton The correlaton coeffcent (or correlaton), ρ, s a popular parameter for descrbng the strength of the assocaton between two varables.

More information

A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM

A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM Yugoslav Journal of Operatons Research Vol 19 (2009), Number 1, 157-170 DOI:10.2298/YUJOR0901157G A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM George GERANIS Konstantnos

More information

Production and Supply Chain Management Logistics. Paolo Detti Department of Information Engeneering and Mathematical Sciences University of Siena

Production and Supply Chain Management Logistics. Paolo Detti Department of Information Engeneering and Mathematical Sciences University of Siena Producton and Supply Chan Management Logstcs Paolo Dett Department of Informaton Engeneerng and Mathematcal Scences Unversty of Sena Convergence and complexty of the algorthm Convergence of the algorthm

More information

Tests for Two Ordered Categorical Variables

Tests for Two Ordered Categorical Variables Chapter 253 Tests for Two Ordered Categorcal Varables Introducton Ths module computes power and sample sze for tests of ordered categorcal data such as Lkert scale data. Assumng proportonal odds, such

More information

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 0-80 /02-70 Computatonal Genomcs Normalzaton Gene Expresson Analyss Model Computatonal nformaton fuson Bologcal regulatory networks Pattern Recognton Data Analyss clusterng, classfcaton normalzaton, mss.

More information

Appendix for Solving Asset Pricing Models when the Price-Dividend Function is Analytic

Appendix for Solving Asset Pricing Models when the Price-Dividend Function is Analytic Appendx for Solvng Asset Prcng Models when the Prce-Dvdend Functon s Analytc Ovdu L. Caln Yu Chen Thomas F. Cosmano and Alex A. Hmonas January 3, 5 Ths appendx provdes proofs of some results stated n our

More information

Supplementary material for Non-conjugate Variational Message Passing for Multinomial and Binary Regression

Supplementary material for Non-conjugate Variational Message Passing for Multinomial and Binary Regression Supplementary materal for Non-conjugate Varatonal Message Passng for Multnomal and Bnary Regresson October 9, 011 1 Alternatve dervaton We wll focus on a partcular factor f a and varable x, wth the am

More information

Elements of Economic Analysis II Lecture VI: Industry Supply

Elements of Economic Analysis II Lecture VI: Industry Supply Elements of Economc Analyss II Lecture VI: Industry Supply Ka Hao Yang 10/12/2017 In the prevous lecture, we analyzed the frm s supply decson usng a set of smple graphcal analyses. In fact, the dscusson

More information

Price and Quantity Competition Revisited. Abstract

Price and Quantity Competition Revisited. Abstract rce and uantty Competton Revsted X. Henry Wang Unversty of Mssour - Columba Abstract By enlargng the parameter space orgnally consdered by Sngh and Vves (984 to allow for a wder range of cost asymmetry,

More information

Parallel Prefix addition

Parallel Prefix addition Marcelo Kryger Sudent ID 015629850 Parallel Prefx addton The parallel prefx adder presented next, performs the addton of two bnary numbers n tme of complexty O(log n) and lnear cost O(n). Lets notce the

More information

Cyclic Scheduling in a Job shop with Multiple Assembly Firms

Cyclic Scheduling in a Job shop with Multiple Assembly Firms Proceedngs of the 0 Internatonal Conference on Industral Engneerng and Operatons Management Kuala Lumpur, Malaysa, January 4, 0 Cyclc Schedulng n a Job shop wth Multple Assembly Frms Tetsuya Kana and Koch

More information

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers II. Random Varables Random varables operate n much the same way as the outcomes or events n some arbtrary sample space the dstncton s that random varables are smply outcomes that are represented numercally.

More information

Applications of Myerson s Lemma

Applications of Myerson s Lemma Applcatons of Myerson s Lemma Professor Greenwald 28-2-7 We apply Myerson s lemma to solve the sngle-good aucton, and the generalzaton n whch there are k dentcal copes of the good. Our objectve s welfare

More information

AC : THE DIAGRAMMATIC AND MATHEMATICAL APPROACH OF PROJECT TIME-COST TRADEOFFS

AC : THE DIAGRAMMATIC AND MATHEMATICAL APPROACH OF PROJECT TIME-COST TRADEOFFS AC 2008-1635: THE DIAGRAMMATIC AND MATHEMATICAL APPROACH OF PROJECT TIME-COST TRADEOFFS Kun-jung Hsu, Leader Unversty Amercan Socety for Engneerng Educaton, 2008 Page 13.1217.1 Ttle of the Paper: The Dagrammatc

More information

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost Tamkang Journal of Scence and Engneerng, Vol. 9, No 1, pp. 19 23 (2006) 19 Economc Desgn of Short-Run CSP-1 Plan Under Lnear Inspecton Cost Chung-Ho Chen 1 * and Chao-Yu Chou 2 1 Department of Industral

More information

MgtOp 215 Chapter 13 Dr. Ahn

MgtOp 215 Chapter 13 Dr. Ahn MgtOp 5 Chapter 3 Dr Ahn Consder two random varables X and Y wth,,, In order to study the relatonshp between the two random varables, we need a numercal measure that descrbes the relatonshp The covarance

More information

Solution of periodic review inventory model with general constrains

Solution of periodic review inventory model with general constrains Soluton of perodc revew nventory model wth general constrans Soluton of perodc revew nventory model wth general constrans Prof Dr J Benkő SZIU Gödöllő Summary Reasons for presence of nventory (stock of

More information

- contrast so-called first-best outcome of Lindahl equilibrium with case of private provision through voluntary contributions of households

- contrast so-called first-best outcome of Lindahl equilibrium with case of private provision through voluntary contributions of households Prvate Provson - contrast so-called frst-best outcome of Lndahl equlbrum wth case of prvate provson through voluntary contrbutons of households - need to make an assumpton about how each household expects

More information

3: Central Limit Theorem, Systematic Errors

3: Central Limit Theorem, Systematic Errors 3: Central Lmt Theorem, Systematc Errors 1 Errors 1.1 Central Lmt Theorem Ths theorem s of prme mportance when measurng physcal quanttes because usually the mperfectons n the measurements are due to several

More information

Risk and Return: The Security Markets Line

Risk and Return: The Security Markets Line FIN 614 Rsk and Return 3: Markets Professor Robert B.H. Hauswald Kogod School of Busness, AU 1/25/2011 Rsk and Return: Markets Robert B.H. Hauswald 1 Rsk and Return: The Securty Markets Lne From securtes

More information

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 9.1. (a) In a log-log model the dependent and all explanatory varables are n the logarthmc form. (b) In the log-ln model the dependent varable

More information

Maximum Likelihood Estimation of Isotonic Normal Means with Unknown Variances*

Maximum Likelihood Estimation of Isotonic Normal Means with Unknown Variances* Journal of Multvarate Analyss 64, 183195 (1998) Artcle No. MV971717 Maxmum Lelhood Estmaton of Isotonc Normal Means wth Unnown Varances* Nng-Zhong Sh and Hua Jang Northeast Normal Unversty, Changchun,Chna

More information

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics Unversty of Illnos Fall 08 ECE 586GT: Problem Set : Problems and Solutons Unqueness of Nash equlbra, zero sum games, evolutonary dynamcs Due: Tuesday, Sept. 5, at begnnng of class Readng: Course notes,

More information

Finite Math - Fall Section Future Value of an Annuity; Sinking Funds

Finite Math - Fall Section Future Value of an Annuity; Sinking Funds Fnte Math - Fall 2016 Lecture Notes - 9/19/2016 Secton 3.3 - Future Value of an Annuty; Snkng Funds Snkng Funds. We can turn the annutes pcture around and ask how much we would need to depost nto an account

More information

Test Problems for Large Scale Nonsmooth Minimization

Test Problems for Large Scale Nonsmooth Minimization Reports of the Department of Mathematcal Informaton Technology Seres B. Scentfc Computng No. B. 4/007 Test Problems for Large Scale Nonsmooth Mnmzaton Napsu Karmtsa Unversty of Jyväskylä Department of

More information

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization CS 234r: Markets for Networks and Crowds Lecture 4 Auctons, Mechansms, and Welfare Maxmzaton Sngle-Item Auctons Suppose we have one or more tems to sell and a pool of potental buyers. How should we decde

More information

0.1 Gradient descent for convex functions: univariate case

0.1 Gradient descent for convex functions: univariate case prnceton unv. F 16 cos 51: Advanced Algorthm Desgn Lecture 14: Gong wth the slope: offlne, onlne, and randomly Lecturer: Sanjeev Arora Scrbe:Sanjeev Arora hs lecture s about gradent descent, a popular

More information

New Distance Measures on Dual Hesitant Fuzzy Sets and Their Application in Pattern Recognition

New Distance Measures on Dual Hesitant Fuzzy Sets and Their Application in Pattern Recognition Journal of Artfcal Intellgence Practce (206) : 8-3 Clausus Scentfc Press, Canada New Dstance Measures on Dual Hestant Fuzzy Sets and Ther Applcaton n Pattern Recognton L Xn a, Zhang Xaohong* b College

More information

Random Variables. b 2.

Random Variables. b 2. Random Varables Generally the object of an nvestgators nterest s not necessarly the acton n the sample space but rather some functon of t. Techncally a real valued functon or mappng whose doman s the sample

More information

Fiera Capital s CIA Accounting Discount Rate Curve Implementation Note. Fiera Capital Corporation

Fiera Capital s CIA Accounting Discount Rate Curve Implementation Note. Fiera Capital Corporation Fera aptal s IA Accountng Dscount Rate urve Implementaton Note Fera aptal orporaton November 2016 Ths document s provded for your prvate use and for nformaton purposes only as of the date ndcated heren

More information

Fast Laplacian Solvers by Sparsification

Fast Laplacian Solvers by Sparsification Spectral Graph Theory Lecture 19 Fast Laplacan Solvers by Sparsfcaton Danel A. Spelman November 9, 2015 Dsclamer These notes are not necessarly an accurate representaton of what happened n class. The notes

More information

Evaluating Performance

Evaluating Performance 5 Chapter Evaluatng Performance In Ths Chapter Dollar-Weghted Rate of Return Tme-Weghted Rate of Return Income Rate of Return Prncpal Rate of Return Daly Returns MPT Statstcs 5- Measurng Rates of Return

More information

Linear Combinations of Random Variables and Sampling (100 points)

Linear Combinations of Random Variables and Sampling (100 points) Economcs 30330: Statstcs for Economcs Problem Set 6 Unversty of Notre Dame Instructor: Julo Garín Sprng 2012 Lnear Combnatons of Random Varables and Samplng 100 ponts 1. Four-part problem. Go get some

More information

OPERATIONS RESEARCH. Game Theory

OPERATIONS RESEARCH. Game Theory OPERATIONS RESEARCH Chapter 2 Game Theory Prof. Bbhas C. Gr Department of Mathematcs Jadavpur Unversty Kolkata, Inda Emal: bcgr.umath@gmal.com 1.0 Introducton Game theory was developed for decson makng

More information

Financial mathematics

Financial mathematics Fnancal mathematcs Jean-Luc Bouchot jean-luc.bouchot@drexel.edu February 19, 2013 Warnng Ths s a work n progress. I can not ensure t to be mstake free at the moment. It s also lackng some nformaton. But

More information

Note on Cubic Spline Valuation Methodology

Note on Cubic Spline Valuation Methodology Note on Cubc Splne Valuaton Methodology Regd. Offce: The Internatonal, 2 nd Floor THE CUBIC SPLINE METHODOLOGY A model for yeld curve takes traded yelds for avalable tenors as nput and generates the curve

More information

SIMPLE FIXED-POINT ITERATION

SIMPLE FIXED-POINT ITERATION SIMPLE FIXED-POINT ITERATION The fed-pont teraton method s an open root fndng method. The method starts wth the equaton f ( The equaton s then rearranged so that one s one the left hand sde of the equaton

More information

A Set of new Stochastic Trend Models

A Set of new Stochastic Trend Models A Set of new Stochastc Trend Models Johannes Schupp Longevty 13, Tape, 21 th -22 th September 2017 www.fa-ulm.de Introducton Uncertanty about the evoluton of mortalty Measure longevty rsk n penson or annuty

More information

Survey of Math Test #3 Practice Questions Page 1 of 5

Survey of Math Test #3 Practice Questions Page 1 of 5 Test #3 Practce Questons Page 1 of 5 You wll be able to use a calculator, and wll have to use one to answer some questons. Informaton Provded on Test: Smple Interest: Compound Interest: Deprecaton: A =

More information

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME Vesna Radonć Đogatovć, Valentna Radočć Unversty of Belgrade Faculty of Transport and Traffc Engneerng Belgrade, Serba

More information

Games and Decisions. Part I: Basic Theorems. Contents. 1 Introduction. Jane Yuxin Wang. 1 Introduction 1. 2 Two-player Games 2

Games and Decisions. Part I: Basic Theorems. Contents. 1 Introduction. Jane Yuxin Wang. 1 Introduction 1. 2 Two-player Games 2 Games and Decsons Part I: Basc Theorems Jane Yuxn Wang Contents 1 Introducton 1 2 Two-player Games 2 2.1 Zero-sum Games................................ 3 2.1.1 Pure Strateges.............................

More information

Data Mining Linear and Logistic Regression

Data Mining Linear and Logistic Regression 07/02/207 Data Mnng Lnear and Logstc Regresson Mchael L of 26 Regresson In statstcal modellng, regresson analyss s a statstcal process for estmatng the relatonshps among varables. Regresson models are

More information

Topics on the Border of Economics and Computation November 6, Lecture 2

Topics on the Border of Economics and Computation November 6, Lecture 2 Topcs on the Border of Economcs and Computaton November 6, 2005 Lecturer: Noam Nsan Lecture 2 Scrbe: Arel Procacca 1 Introducton Last week we dscussed the bascs of zero-sum games n strategc form. We characterzed

More information

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode. Part 4 Measures of Spread IQR and Devaton In Part we learned how the three measures of center offer dfferent ways of provdng us wth a sngle representatve value for a data set. However, consder the followng

More information

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9 Elton, Gruber, Brown, and Goetzmann Modern Portfolo Theory and Investment Analyss, 7th Edton Solutons to Text Problems: Chapter 9 Chapter 9: Problem In the table below, gven that the rskless rate equals

More information

The Integration of the Israel Labour Force Survey with the National Insurance File

The Integration of the Israel Labour Force Survey with the National Insurance File The Integraton of the Israel Labour Force Survey wth the Natonal Insurance Fle Natale SHLOMO Central Bureau of Statstcs Kanfey Nesharm St. 66, corner of Bach Street, Jerusalem Natales@cbs.gov.l Abstact:

More information

EDC Introduction

EDC Introduction .0 Introducton EDC3 In the last set of notes (EDC), we saw how to use penalty factors n solvng the EDC problem wth losses. In ths set of notes, we want to address two closely related ssues. What are, exactly,

More information

Creating a zero coupon curve by bootstrapping with cubic splines.

Creating a zero coupon curve by bootstrapping with cubic splines. MMA 708 Analytcal Fnance II Creatng a zero coupon curve by bootstrappng wth cubc splnes. erg Gryshkevych Professor: Jan R. M. Röman 0.2.200 Dvson of Appled Mathematcs chool of Educaton, Culture and Communcaton

More information

Doubly Random Parallel Stochastic Algorithms for Large Scale Learning

Doubly Random Parallel Stochastic Algorithms for Large Scale Learning 000 00 00 003 004 005 006 007 008 009 00 0 0 03 04 05 06 07 08 09 00 0 0 03 04 05 06 07 08 09 030 03 03 033 034 035 036 037 038 039 040 04 04 043 044 045 046 047 048 049 050 05 05 053 Doubly Random Parallel

More information

Problem Set 6 Finance 1,

Problem Set 6 Finance 1, Carnege Mellon Unversty Graduate School of Industral Admnstraton Chrs Telmer Wnter 2006 Problem Set 6 Fnance, 47-720. (representatve agent constructon) Consder the followng two-perod, two-agent economy.

More information

Still Simpler Way of Introducing Interior-Point method for Linear Programming

Still Simpler Way of Introducing Interior-Point method for Linear Programming Stll Smpler Way of Introducng Interor-Pont method for Lnear Programmng Sanjeev Saxena Dept. of Computer Scence and Engneerng, Indan Insttute of Technology, Kanpur, INDIA-08 06 October 9, 05 Abstract Lnear

More information

Project Management Project Phases the S curve

Project Management Project Phases the S curve Project lfe cycle and resource usage Phases Project Management Project Phases the S curve Eng. Gorgo Locatell RATE OF RESOURCE ES Conceptual Defnton Realzaton Release TIME Cumulated resource usage and

More information

Jeffrey Ely. October 7, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Jeffrey Ely. October 7, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. October 7, 2012 Ths work s lcensed under the Creatve Commons Attrbuton-NonCommercal-ShareAlke 3.0 Lcense. Recap We saw last tme that any standard of socal welfare s problematc n a precse sense. If we want

More information

Economics 1410 Fall Section 7 Notes 1. Define the tax in a flexible way using T (z), where z is the income reported by the agent.

Economics 1410 Fall Section 7 Notes 1. Define the tax in a flexible way using T (z), where z is the income reported by the agent. Economcs 1410 Fall 2017 Harvard Unversty Yaan Al-Karableh Secton 7 Notes 1 I. The ncome taxaton problem Defne the tax n a flexble way usng T (), where s the ncome reported by the agent. Retenton functon:

More information

A Case Study for Optimal Dynamic Simulation Allocation in Ordinal Optimization 1

A Case Study for Optimal Dynamic Simulation Allocation in Ordinal Optimization 1 A Case Study for Optmal Dynamc Smulaton Allocaton n Ordnal Optmzaton Chun-Hung Chen, Dongha He, and Mchael Fu 4 Abstract Ordnal Optmzaton has emerged as an effcent technque for smulaton and optmzaton.

More information

Finance 402: Problem Set 1 Solutions

Finance 402: Problem Set 1 Solutions Fnance 402: Problem Set 1 Solutons Note: Where approprate, the fnal answer for each problem s gven n bold talcs for those not nterested n the dscusson of the soluton. 1. The annual coupon rate s 6%. A

More information

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis Appled Mathematcal Scences, Vol. 7, 013, no. 99, 4909-4918 HIKARI Ltd, www.m-hkar.com http://dx.do.org/10.1988/ams.013.37366 Interval Estmaton for a Lnear Functon of Varances of Nonnormal Dstrbutons that

More information

Multifactor Term Structure Models

Multifactor Term Structure Models 1 Multfactor Term Structure Models A. Lmtatons of One-Factor Models 1. Returns on bonds of all maturtes are perfectly correlated. 2. Term structure (and prces of every other dervatves) are unquely determned

More information

Lecture Note 2 Time Value of Money

Lecture Note 2 Time Value of Money Seg250 Management Prncples for Engneerng Managers Lecture ote 2 Tme Value of Money Department of Systems Engneerng and Engneerng Management The Chnese Unversty of Hong Kong Interest: The Cost of Money

More information

Ch Rival Pure private goods (most retail goods) Non-Rival Impure public goods (internet service)

Ch Rival Pure private goods (most retail goods) Non-Rival Impure public goods (internet service) h 7 1 Publc Goods o Rval goods: a good s rval f ts consumpton by one person precludes ts consumpton by another o Excludable goods: a good s excludable f you can reasonably prevent a person from consumng

More information

UNIVERSITY OF NOTTINGHAM

UNIVERSITY OF NOTTINGHAM UNIVERSITY OF NOTTINGHAM SCHOOL OF ECONOMICS DISCUSSION PAPER 99/28 Welfare Analyss n a Cournot Game wth a Publc Good by Indraneel Dasgupta School of Economcs, Unversty of Nottngham, Nottngham NG7 2RD,

More information

>1 indicates country i has a comparative advantage in production of j; the greater the index, the stronger the advantage. RCA 1 ij

>1 indicates country i has a comparative advantage in production of j; the greater the index, the stronger the advantage. RCA 1 ij 69 APPENDIX 1 RCA Indces In the followng we present some maor RCA ndces reported n the lterature. For addtonal varants and other RCA ndces, Memedovc (1994) and Vollrath (1991) provde more thorough revews.

More information

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002 TO5 Networng: Theory & undamentals nal xamnaton Professor Yanns. orls prl, Problem [ ponts]: onsder a rng networ wth nodes,,,. In ths networ, a customer that completes servce at node exts the networ wth

More information

OCR Statistics 1 Working with data. Section 2: Measures of location

OCR Statistics 1 Working with data. Section 2: Measures of location OCR Statstcs 1 Workng wth data Secton 2: Measures of locaton Notes and Examples These notes have sub-sectons on: The medan Estmatng the medan from grouped data The mean Estmatng the mean from grouped data

More information

Robust Boosting and its Relation to Bagging

Robust Boosting and its Relation to Bagging Robust Boostng and ts Relaton to Baggng Saharon Rosset IBM T.J. Watson Research Center P. O. Box 218 Yorktown Heghts, NY 10598 srosset@us.bm.com ABSTRACT Several authors have suggested vewng boostng as

More information

The convolution computation for Perfectly Matched Boundary Layer algorithm in finite differences

The convolution computation for Perfectly Matched Boundary Layer algorithm in finite differences The convoluton computaton for Perfectly Matched Boundary Layer algorthm n fnte dfferences Herman Jaramllo May 10, 2016 1 Introducton Ths s an exercse to help on the understandng on some mportant ssues

More information

Financial Risk Management in Portfolio Optimization with Lower Partial Moment

Financial Risk Management in Portfolio Optimization with Lower Partial Moment Amercan Journal of Busness and Socety Vol., o., 26, pp. 2-2 http://www.ascence.org/journal/ajbs Fnancal Rsk Management n Portfolo Optmzaton wth Lower Partal Moment Lam Weng Sew, 2, *, Lam Weng Hoe, 2 Department

More information

Analysis of Variance and Design of Experiments-II

Analysis of Variance and Design of Experiments-II Analyss of Varance and Desgn of Experments-II MODULE VI LECTURE - 4 SPLIT-PLOT AND STRIP-PLOT DESIGNS Dr. Shalabh Department of Mathematcs & Statstcs Indan Insttute of Technology Kanpur An example to motvate

More information

Minimizing the number of critical stages for the on-line steiner tree problem

Minimizing the number of critical stages for the on-line steiner tree problem Mnmzng the number of crtcal stages for the on-lne stener tree problem Ncolas Thbault, Chrstan Laforest IBISC, Unversté d Evry, Tour Evry 2, 523 place des terrasses, 91000 EVRY France Keywords: on-lne algorthm,

More information

Discrete Dynamic Shortest Path Problems in Transportation Applications

Discrete Dynamic Shortest Path Problems in Transportation Applications 17 Paper No. 98-115 TRANSPORTATION RESEARCH RECORD 1645 Dscrete Dynamc Shortest Path Problems n Transportaton Applcatons Complexty and Algorthms wth Optmal Run Tme ISMAIL CHABINI A soluton s provded for

More information

A New Uniform-based Resource Constrained Total Project Float Measure (U-RCTPF) Roni Levi. Research & Engineering, Haifa, Israel

A New Uniform-based Resource Constrained Total Project Float Measure (U-RCTPF) Roni Levi. Research & Engineering, Haifa, Israel Management Studes, August 2014, Vol. 2, No. 8, 533-540 do: 10.17265/2328-2185/2014.08.005 D DAVID PUBLISHING A New Unform-based Resource Constraned Total Project Float Measure (U-RCTPF) Ron Lev Research

More information

Taxation and Externalities. - Much recent discussion of policy towards externalities, e.g., global warming debate/kyoto

Taxation and Externalities. - Much recent discussion of policy towards externalities, e.g., global warming debate/kyoto Taxaton and Externaltes - Much recent dscusson of polcy towards externaltes, e.g., global warmng debate/kyoto - Increasng share of tax revenue from envronmental taxaton 6 percent n OECD - Envronmental

More information

Centre for International Capital Markets

Centre for International Capital Markets Centre for Internatonal Captal Markets Dscusson Papers ISSN 1749-3412 Valung Amercan Style Dervatves by Least Squares Methods Maro Cerrato No 2007-13 Valung Amercan Style Dervatves by Least Squares Methods

More information

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of Module 8: Probablty and Statstcal Methods n Water Resources Engneerng Bob Ptt Unversty of Alabama Tuscaloosa, AL Flow data are avalable from numerous USGS operated flow recordng statons. Data s usually

More information

Теоретические основы и методология имитационного и комплексного моделирования

Теоретические основы и методология имитационного и комплексного моделирования MONTE-CARLO STATISTICAL MODELLING METHOD USING FOR INVESTIGA- TION OF ECONOMIC AND SOCIAL SYSTEMS Vladmrs Jansons, Vtaljs Jurenoks, Konstantns Ddenko (Latva). THE COMMO SCHEME OF USI G OF TRADITIO AL METHOD

More information

Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization

Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization Dscrete Event Dynamc Systems: Theory and Applcatons, 10, 51 70, 000. c 000 Kluwer Academc Publshers, Boston. Manufactured n The Netherlands. Smulaton Budget Allocaton for Further Enhancng the Effcency

More information

Numerical Analysis ECIV 3306 Chapter 6

Numerical Analysis ECIV 3306 Chapter 6 The Islamc Unversty o Gaza Faculty o Engneerng Cvl Engneerng Department Numercal Analyss ECIV 3306 Chapter 6 Open Methods & System o Non-lnear Eqs Assocate Pro. Mazen Abualtaye Cvl Engneerng Department,

More information

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da *

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da * Copyrght by Zh Da and Rav Jagannathan Teachng Note on For Model th a Ve --- A tutoral Ths verson: May 5, 2005 Prepared by Zh Da * Ths tutoral demonstrates ho to ncorporate economc ves n optmal asset allocaton

More information

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics Lmted Dependent Varable Models: Tobt an Plla N 1 CDS Mphl Econometrcs Introducton Lmted Dependent Varable Models: Truncaton and Censorng Maddala, G. 1983. Lmted Dependent and Qualtatve Varables n Econometrcs.

More information

Equilibrium in Prediction Markets with Buyers and Sellers

Equilibrium in Prediction Markets with Buyers and Sellers Equlbrum n Predcton Markets wth Buyers and Sellers Shpra Agrawal Nmrod Megddo Benamn Armbruster Abstract Predcton markets wth buyers and sellers of contracts on multple outcomes are shown to have unque

More information

Optimising a general repair kit problem with a service constraint

Optimising a general repair kit problem with a service constraint Optmsng a general repar kt problem wth a servce constrant Marco Bjvank 1, Ger Koole Department of Mathematcs, VU Unversty Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands Irs F.A. Vs Department

More information

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999 FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS by Rchard M. Levch New York Unversty Stern School of Busness Revsed, February 1999 1 SETTING UP THE PROBLEM The bond s beng sold to Swss nvestors for a prce

More information

Introduction to PGMs: Discrete Variables. Sargur Srihari

Introduction to PGMs: Discrete Variables. Sargur Srihari Introducton to : Dscrete Varables Sargur srhar@cedar.buffalo.edu Topcs. What are graphcal models (or ) 2. Use of Engneerng and AI 3. Drectonalty n graphs 4. Bayesan Networks 5. Generatve Models and Samplng

More information

Multiobjective De Novo Linear Programming *

Multiobjective De Novo Linear Programming * Acta Unv. Palack. Olomuc., Fac. rer. nat., Mathematca 50, 2 (2011) 29 36 Multobjectve De Novo Lnear Programmng * Petr FIALA Unversty of Economcs, W. Churchll Sq. 4, Prague 3, Czech Republc e-mal: pfala@vse.cz

More information

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem. Topcs on the Border of Economcs and Computaton December 11, 2005 Lecturer: Noam Nsan Lecture 7 Scrbe: Yoram Bachrach 1 Nash s Theorem We begn by provng Nash s Theorem about the exstance of a mxed strategy

More information

Raising Food Prices and Welfare Change: A Simple Calibration. Xiaohua Yu

Raising Food Prices and Welfare Change: A Simple Calibration. Xiaohua Yu Rasng Food Prces and Welfare Change: A Smple Calbraton Xaohua Yu Professor of Agrcultural Economcs Courant Research Centre Poverty, Equty and Growth Unversty of Göttngen CRC-PEG, Wlhelm-weber-Str. 2 3773

More information

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

Likelihood Fits. Craig Blocker Brandeis August 23, 2004 Lkelhood Fts Crag Blocker Brandes August 23, 2004 Outlne I. What s the queston? II. Lkelhood Bascs III. Mathematcal Propertes IV. Uncertantes on Parameters V. Mscellaneous VI. Goodness of Ft VII. Comparson

More information

Numerical Optimisation Applied to Monte Carlo Algorithms for Finance. Phillip Luong

Numerical Optimisation Applied to Monte Carlo Algorithms for Finance. Phillip Luong Numercal Optmsaton Appled to Monte Carlo Algorthms for Fnance Phllp Luong Supervsed by Professor Hans De Sterck, Professor Gregore Loeper, and Dr Ivan Guo Monash Unversty Vacaton Research Scholarshps are

More information

2) In the medium-run/long-run, a decrease in the budget deficit will produce:

2) In the medium-run/long-run, a decrease in the budget deficit will produce: 4.02 Quz 2 Solutons Fall 2004 Multple-Choce Questons ) Consder the wage-settng and prce-settng equatons we studed n class. Suppose the markup, µ, equals 0.25, and F(u,z) = -u. What s the natural rate of

More information

Problems to be discussed at the 5 th seminar Suggested solutions

Problems to be discussed at the 5 th seminar Suggested solutions ECON4260 Behavoral Economcs Problems to be dscussed at the 5 th semnar Suggested solutons Problem 1 a) Consder an ultmatum game n whch the proposer gets, ntally, 100 NOK. Assume that both the proposer

More information

Facility Location Problem. Learning objectives. Antti Salonen Farzaneh Ahmadzadeh

Facility Location Problem. Learning objectives. Antti Salonen Farzaneh Ahmadzadeh Antt Salonen Farzaneh Ahmadzadeh 1 Faclty Locaton Problem The study of faclty locaton problems, also known as locaton analyss, s a branch of operatons research concerned wth the optmal placement of facltes

More information

International ejournals

International ejournals Avalable onlne at www.nternatonalejournals.com ISSN 0976 1411 Internatonal ejournals Internatonal ejournal of Mathematcs and Engneerng 7 (010) 86-95 MODELING AND PREDICTING URBAN MALE POPULATION OF BANGLADESH:

More information

Tree-based and GA tools for optimal sampling design

Tree-based and GA tools for optimal sampling design Tree-based and GA tools for optmal samplng desgn The R User Conference 2008 August 2-4, Technsche Unverstät Dortmund, Germany Marco Balln, Gulo Barcarol Isttuto Nazonale d Statstca (ISTAT) Defnton of the

More information