EE/AA 578 Univ. of Washington, Fall Homework 8

Similar documents
PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA

Chapter 8. Markowitz Portfolio Theory. 8.1 Expected Returns and Covariance

Advanced Financial Economics Homework 2 Due on April 14th before class

Final exam solutions

ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

Is Greedy Coordinate Descent a Terrible Algorithm?

Financial Mathematics III Theory summary

IEOR E4004: Introduction to OR: Deterministic Models

(b) per capita consumption grows at the rate of 2%.

Problem 1: Random variables, common distributions and the monopoly price

Name Date Student id #:

CPSC 540: Machine Learning

CSCI 1951-G Optimization Methods in Finance Part 00: Course Logistics Introduction to Finance Optimization Problems

Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras

CPSC 540: Machine Learning

ECON 6022B Problem Set 2 Suggested Solutions Fall 2011

Problem 1: Random variables, common distributions and the monopoly price

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

CS 798: Homework Assignment 4 (Game Theory)

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints

CS Homework 4: Expectations & Empirical Distributions Due Date: October 9, 2018

Probability and Stochastics for finance-ii Prof. Joydeep Dutta Department of Humanities and Social Sciences Indian Institute of Technology, Kanpur

Chapter 7: Portfolio Theory

Probability. An intro for calculus students P= Figure 1: A normal integral

Convex-Cardinality Problems

Characterization of the Optimum

Yao s Minimax Principle

Midterm 1, Financial Economics February 15, 2010

Homework 3: Asset Pricing

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Solutions to Final Exam.

Log-Robust Portfolio Management

Lecture 2: Fundamentals of meanvariance

Equity correlations implied by index options: estimation and model uncertainty analysis

King s College London

Sensitivity Analysis with Data Tables. 10% annual interest now =$110 one year later. 10% annual interest now =$121 one year later

Math1090 Midterm 2 Review Sections , Solve the system of linear equations using Gauss-Jordan elimination.

Optimizing Portfolios

Financial Optimization ISE 347/447. Lecture 15. Dr. Ted Ralphs

IEOR 3106: Introduction to Operations Research: Stochastic Models SOLUTIONS to Final Exam, Sunday, December 16, 2012

Random Variables and Probability Distributions

King s College London

Hints on Some of the Exercises

THE CHINESE UNIVERSITY OF HONG KONG Department of Mathematics MMAT5250 Financial Mathematics Homework 2 Due Date: March 24, 2018

MAT 4250: Lecture 1 Eric Chung

Robust Optimization Applied to a Currency Portfolio

EE266 Homework 5 Solutions

2.1 Mean-variance Analysis: Single-period Model

Mathematics in Finance

Universal Portfolios

Final Examination December 14, Economics 5010 AF3.0 : Applied Microeconomics. time=2.5 hours

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Computer Exercise 2 Simulation

MA200.2 Game Theory II, LSE

Optimization Methods in Management Science

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.

Energy Systems under Uncertainty: Modeling and Computations

MATH 121 GAME THEORY REVIEW

Online Appendix (Not intended for Publication): Federal Reserve Credibility and the Term Structure of Interest Rates

Chapter 10 Inventory Theory

Chapter 5 Finite Difference Methods. Math6911 W07, HM Zhu

1 Dynamic programming

m

ROM Simulation with Exact Means, Covariances, and Multivariate Skewness

2.1 Mathematical Basis: Risk-Neutral Pricing

Q1. [?? pts] Search Traces

Roy Model of Self-Selection: General Case

PROBABILITY DISTRIBUTIONS

Econ 8602, Fall 2017 Homework 2

Simulating Continuous Time Rating Transitions

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Pricing Problems under the Markov Chain Choice Model

PAULI MURTO, ANDREY ZHUKOV. If any mistakes or typos are spotted, kindly communicate them to

Sublinear Time Algorithms Oct 19, Lecture 1

1 Shapley-Shubik Model

Lecture 3: Factor models in modern portfolio choice

(Practice Version) Midterm Exam 1

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price

Lecture 10: Performance measures

BARUCH COLLEGE MATH 2003 SPRING 2006 MANUAL FOR THE UNIFORM FINAL EXAMINATION

Microeconomic Theory II Preliminary Examination Solutions Exam date: June 5, 2017

Course information FN3142 Quantitative finance

Handout 4: Deterministic Systems and the Shortest Path Problem

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

The Assignment Problem

The Optimization Process: An example of portfolio optimization

Exact Inference (9/30/13) 2 A brief review of Forward-Backward and EM for HMMs

Decomposition Methods

Financial Engineering with FRONT ARENA

1.1 Interest rates Time value of money

Mathematics of Finance Final Preparation December 19. To be thoroughly prepared for the final exam, you should

Asymptotic methods in risk management. Advances in Financial Mathematics

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

3.2 No-arbitrage theory and risk neutral probability measure

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3

Econ 172A, W2002: Final Examination, Solutions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Portfolio theory and risk management Homework set 2

LECTURE 2: MULTIPERIOD MODELS AND TREES

Transcription:

EE/AA 578 Univ. of Washington, Fall 2016 Homework 8 1. Multi-label SVM. The basic Support Vector Machine (SVM) described in the lecture (and textbook) is used for classification of data with two labels. In this problem we explore an extension of SVM that can be used for classification with more than two labels. Our data consists of pairs (x i,y i ) R n {1,...,K}, i = 1,...,m, where x i is the feature vector and y i is the label of the ith data point. (So the labels can take the values 1,...,K.) Our classifier will use K affine functions, f k (x) = a T k x+b k, k = 1,...,K, which we also collect into affine function from R n into R K as f(x) = Ax + b. (The rows of A are a T k.) Given feature vector x, we guess the label ŷ = argmax k f k (x). We assume that exact ties never occur, or if they do, an arbitrary choice can be made. Note that if a multiple of 1 is added to b, the classifier does not change. Thus, without loss of generality, we can assume that 1 T b = 0. To correctly classify the data examples, we need f yi (x i ) > max k yi f k (x i ) for all i. This is a set of homogeneous strict inequalities in a k and b k, which are feasible if and only if the set of nonstrict inequalities f yi (x i ) 1+max k yi f k (x i ) are feasible. This motivates the loss function m ( ) L(A,b) = 1+maxf k (x i ) f yi (x i ), k y i + i=1 where (u) + = max{u,0}. The multi-label SVM chooses A and b to minimize L(A,b)+µ A 2 F, subject to 1 T b = 0, where µ > 0 is a regularization parameter. (Several variations on this are possible, such as regularizing b as well, or replacing the Frobenius norm squared with the sum of norms of the columns of A.) (a) Show how to find A and b using convex optimization. Be sure to justify convexity of the objective and constraints in your formulation. (b) Carry out multi-label SVM on the data given in multilabel_svm_data.m. Use the data given in x and y to fit the SVM model, for a range of values of µ. This data set includes an additional set of data, xtest and ytest, that you can use to test the SVM models. Train the classifier for 10 values of µ spaced uniformly on a log scale from 10 1 to 10 2. Jointly plot the train set and test set classification error rates (i.e., the fraction of data examples in the train or test set for which ŷ y) versus µ. 2. Router placement in a computer lab. A system administrator for an academic research group istryingtodeterminetheoptimal placement of aset of wiredroutersinthelab. ThereareN graduate students, who have desks at fixed locations in the lab, and M undergraduates, who will sit and work wherever they are told to. Each of the K routers will form a sub-network of those students computers that are attached to it, i.e., there will be K different (and potentially overlapping) networks in the lab. For his or her research, each student (both 1

grad and undergrad) will need to be connected to one or more of these different networks (via the appropriate router; no computer to computer connections). We assume the cost of an ethernet cable is proportional to the square of its length. Each connection between a router and a computer is made by draping an ethernet cable across the floor in a straight line between the router and the computer. The administrator knows where the graduate students sit (in 2d space) and which students need to be connected to which networks. The system administrator needs to simultaneously decide where to place the K routers (in 2d space), and where to place the M undergraduate students (in 2d space) such that the total cost of ethernet cable needed is minimized. (a) Formulate this problem as a convex optimization problem and implement using the numerical data provided in the file placement1.m. Make a plot showing the optimal lab layout including the placement of the students and the routers. Hint: You might need to use the square_pos command in CVX. (b) As another variation, consider the case where there are no undergrads and ethernet cables only run along the x and y axes in 2d space (that is, only horizontally or vertically) on the lab floor. The routers should be placed in order to minimize the cord needed. Formulate this problem as a convex optimization problem and implement using the same numerical data as in part (a). Make a plot of the result. (c) Now consider a more general placement problem that covers the previous examples as a special case. We consider a graph with m nodes and assume coordinate vectors x j R p, j = 1,...,m, are associated with the nodes. We store the vectors x j as columns of the matrix X R p m. Some nodes are fixed with given coordinate vectors x j, while the other nodes are free (and their coordinate vectors are the optimization variables). In addition, we are given subsets of nodes denoted by S. We use X S to denote the submatrix of X with columns associated with the nodes in subset S. This problem is concerned with different measures of size and notions of center for the subsets and for the graph. We define f S (X) = inf y X S y1 T. (1) as the size of subset S, where is any norm, y is in R p, and 1 is a vector of ones of length S. Show that the optimization problem minimize S f S (X) is convex in the free node coordinates x j. Finding minimum wire cost in a sub-network in part (a) corresponds to minimizing f S (X) in equation (1) for which norm? Can you give a geometric interpretation for the optimal y s you found in that part? 3. Gram matrices, Laplacians, and Markov chains. In many areas of information processing, data are given in a high dimensional space, but the intrinsic complexity and dimension are typically much lower. Given a set of points x 1,...,x n in a high dimensional space R d (denotedbyx = [x 1,...,x n ] R d n ), wewanttocomputealowdimensionalrepresentation Y = [y 1,...,y n ] R r n, where r d. Suppose that the points y i are centered at the 2

origin, i.e., n i=1 y i = 0. To represent the connection between x 1,...,x n, we construct an undirected graph G = (V,E), by connecting each x i to its k-nearest neighbors, where k is an integer. Here V denotes the set of vertices, and {i,j} E means vertices i and j are connected by an edge. We want the low dimensional representation to preserve the local distances between the high dimensional data, y i y j 2 = d ij for {i,j} E, where d ij = x i x j 2 is the distance between x i and x j. At the same time, we want the low dimensional representation to maximize the total variance n i=1 y i 2 in order to place the points y i as far away from the origin as possible (this tends to lower the dimension by flattening the point cloud, but you don t need to worry about why and how). This problem can be formulated as the follows, maximize n i=1 y i 2 subject to n i=1 y i = 0 y i y j 2 = d ij, {i,j} E (2) with variables y i,i = 1,...,n. Note that problem (2) is not convex. However, if we know the Gram matrix Gassociated withy, which is definedas G = Y T Y (so G 0andG ij = yi Ty j), thelowdimensionalrepresentationcanbecalculatedbyy i = [ λ 1 (v 1 ) i,..., λ r (v r ) i ] T, i = 1,...,n,wherev 1,...,v r areeigenvectors associatedwithnonzeroeigenvaluesofg, λ 1,...λ r. We can express problem (2) as a convex problem with the Gram matrix G as the optimization variable, maximize Tr G subject to 1 T G1 = 0 G = G T (3) 0 G ii +G jj 2G ij = d ij, {i,j} E. From the optimal solution G, we can calculate the optimal low dimensional representation Y. Finally, here is what you need to solve: (a) Derive the dual of the convex problem (3). For convenience, the last set of equality constraints can be written as G ii +G jj 2G ij = TrGI {i,j} = d ij, {i,j} E where I {i,j} R n n has all zero entries expect for I {i,j} ii 1. = I {i,j} jj = 1, I {i,j} ij = I {i,j} ji = (b) In the graph G, we can assign weights W ij = W ji 0 to each edge {i,j} E, then the weighted Laplacian L S n + of the graph G is defined as W ij if i j, {i,j} E L ij = W ik if i = j k:{i,k} E 0 otherwise Note that L = {i,j} E W iji {i,j}. Let λ 1,...,λ n be the eigenvalues of L, where λ 1 λ n. Using the fact that L 0, show that i. λ n (L) = 0 and λ n 1 (L) is concave in W, 3

ii. λ n 1 (L) 1 if and only if L I (1/n)11 T. (c) Now let s look at another problem. Consider a Markov chain on the same undirected graph structure G = (V,E). Each vertex i V represents a state of this Markov chain and each edge {i, j} E corresponds to an allowed transition. The transition rate betweenvertices iandj isgiven byw ij 0, andratesarelimited by {i,j} E C ijw ij c, where C ij > 0 is a known cost on edge {i,j} E and c > 0 is a known constant. Let π(t) R n denote the distribution of the state at time t. Starting from the initial distribution π(0), the Markov chain converges to its equilibrium when dπ(t)/dt = 0. The evolution is given by dπ(t) dt = Lπ(t), where L is the weighted Laplacian defined in part 3b. The uniform distribution (1/n)1 is an equilibrium distribution. It is known that the rate of convergence to the uniform distribution is determined by λ n 1 (L): the larger λ n 1 (L), the faster the convergence. Finally, here stheproblem: WewanttofindtheoptimaltransitionratesW ij, {i,j} E that give the fastest convergence for the Markov chain. Formulate this problem as a convex optimization problem. (d) Now we are to find a surprising connection between these two seemingly different problems: Consider the dual problem you derived in part 3a, and add new constraints that the dual variables corresponding to the constraints G ii +G jj 2G ij = d ij, {i,j} E, are nonnegative. Show that this problem is equivalent to the convex problem that you formulated in part 3c. 4. Worst case probability of loss. Two investments are made, with random returns R 1 and R 2. The total return for the two investments is R 1 +R 2, and the probability of a loss (including breaking even, i.e., R 1 + R 2 = 0) is p loss = Prob(R 1 + R 2 0). The goal is to find the worst-case (i.e., maximum possible) value of p loss, consistent with the following information. Both R 1 and R 2 have Gaussian marginal distributions, with known means µ 1 and µ 2 and knownstandard deviations σ 1 and σ 2. Inaddition, it is knownthat R 1 andr 2 are correlated with correlation coefficient ρ, i.e., E(R 1 µ 1 )(R 2 µ 2 ) = ρσ 1 σ 2. Your job is to find the worst-case p loss over any joint distribution of R 1 and R 2 consistent with the given marginals and correlation coefficient. We will consider the specific case with data µ 1 = 8, µ 2 = 20, σ 1 = 6, σ 2 = 17.5, ρ = 0.25. We can compare the results to the case when R 1 and R 2 are jointly Gaussian. In this case we have R 1 +R 2 N(µ 1 +µ 2,σ 2 1 +σ 2 2 +2ρσ 1 σ 2 ), which for the data given above gives p loss = 0.050. Your job is to see how much larger p loss can possibly be. This is an infinite-dimensional optimization problem, since you must maximize p loss over an infinite-dimensional set of joint distributions. To (approximately) solve it, we discretize the values that R 1 and R 2 can take on, to n = 100 values r 1,...,r n, uniformly spaced from 4

r 1 = 30 to r n = +70. We use the discretized marginals p (1) and p (2) for R 1 and R 2, given by p (k) exp ( (r i µ k ) 2 /(2σk 2 i = Prob(R k = r i ) = )) n j=1 exp( (r j µ k ) 2 /(2σk 2)), for k = 1,2, i = 1,...,n. Formulate the (discretized) problem as a convex optimization problem, and solve it. Report the maximum value of p loss you find. Plot the joint distribution that yields the maximum value of p loss using the Matlab commands mesh and contour. Remark. You might be surprised at both the maximum value of p loss, and the joint distribution that achieves it. 5. Optimal investment to fund an expense stream. An organization knows its operating expenses over the next T periods, denoted E 1,...,E T. (Normally these are positive, but we can have negative E t, which corresponds to income.) These expenses will be funded by a combination of investment income, from a mixture of bonds purchased at t = 0, and a cash account. The bonds generate investment income, denoted I 1,...,I T. The cash balance is denoted B 0,...,B T, where B 0 0 is the amount of the initial deposit into the cash account. We can have B t < 0 for t = 1,...,T, which represents borrowing. After paying for the expenses using investment income and cash, in period t, we are left with B t E t +I t in cash. If this amount is positive, it earns interest at the rate r + > 0; if it is negative, we must pay interest at rate r, where r r +. Thus the expenses, investment income, and cash balances are linked as follows: { (1+r+ )(B B t+1 = t E t +I t ) B t E t +I t 0 (1+r )(B t E t +I t ) B t E t +I t < 0, for t = 1,...,T 1. We take B 1 = (1 + r + )B 0, and we require that B T E T + I T = 0, which means the final cash balance, plus income, exactly covers the final expense. The initial investment will be a mixture of bonds, labeled 1,...,n. Bond i has a price P i > 0, a payment C i > 0, and a maturity M i {1,...,T}. Bond i generates an income stream given by C i t < M i a (i) t = C i +1 t = M i 0 t > M i, for t = 1,...,T. If x i is the number of units of bond i purchased (at t = 0), the total investment cash flow is I t = x 1 a (1) t + +x n a (n) t, t = 1,...,T. We will require x i 0. (The x i can be fractional; they do not need to be integers.) The total initial investment required to purchase the bonds, and fund the initial cash balance at t = 0, is x 1 P 1 + +x n P n +B 0. 5

(a) Explain how to choose x and B 0 to minimize the total initial investment required to fund the expense stream. Hint: Show that the balance propagation equations can be written as B t+1 = min{(1+r + )(B t E t +I t ),(1+r )(B t E t +I t )}. t = 1,...,T. Relax these constraints to convex ones (and show that the problem with relaxed constraints is equivalent to the original one). (b) Solve the problem instance given in opt_funding_data.m. Give optimal values of x and B 0. Give the optimal total initial investment, and compare it to the initial investment required if no bonds were purchased (which would mean that all the expenses were funded from the cash account). Plot the cash balance (versus period) with optimal bond investment, and with no bond investment. 6