You are expected to apply dual theory and KKT

Machine Problem (due 11:59PM, April 26):
Goal: You are expected to apply dual theory and KKT conditions to solve an
optimization problem in a closed form; you will also apply the subgradient projection
method to implement an iterative optimization problem which does not admit closedform solution; in case of large-scale problem where you cannot load the entire dataset
in physical memory, you will learn to use incremental subgradient method to solve the
problem with one data point in memory each time.
Background (linear regression problem): Suppose we are given a set of n examples
in p-dimensional feature space 𝑋 = [π‘₯1 , β‹― π‘₯𝑛 ] ∈ 𝑅 𝑝×𝑛 , and their corresponding
responses π‘Œ = [𝑦1 , β‹― 𝑦𝑛 ] ∈ 𝑅 π‘ž×𝑛 of dimension q. You want to use a linear
regression model to reveal the relations between xi and yi, for i=1, …, n. In other
words, you need to find a transformation matrix π‘Š ∈ 𝑅 π‘ž×𝑝 , so that the least square
errors can be minimized to find the optimal solution to W,
2
min 𝐹(π‘Š) = ||π‘Œ βˆ’ π‘Šπ‘‹||𝐹
W
To improve the robustness, a proper regularizer on W can be used to constrain the model
in addition to the above objective function. The most useful regularizers include l1
and l2 norms. To make the problem interesting, we focus on l1 norm instead. Then
we have the following optimization problem:
2
min 𝐹(π‘Š) = ||π‘Œ βˆ’ π‘Šπ‘‹||𝐹
W
𝑠. 𝑑. , ||π‘Š||1 ≀ 𝑐
where c is a preset parameter which controls how sparse the transformation matrices
would be.
Problems:
Part 1 (Projection to the constraint set). Using dual problem (by forming Lagrangian
function) and KKT condition to derive the projection of a point W0 onto the convex
constrain set ||π‘Š||1 ≀ 𝑐. That is to solve the following optimization problem:
min||π‘Š βˆ’ π‘Š0 ||
2
π‘Š
𝑠. 𝑑. , ||π‘Š||1 ≀ 𝑐
Note that this optimization problem can be solved in a closed form.
Part 2 (subgradient method). Randomly generate a set of data example X, and the target
response values Y by a linear model with some noise added. For example, suppose
you know W, then generate Y as π‘Šπ‘‹ + πœ– where πœ– is Gaussian noises. Then apply
subgradient projection method to solve the l1-regularized least square problem with
different c. You will need to use the projection derived in Part 1. Compare the
obtained W with the groundtruth W you use to generate the data. In this part, we can try
different methods to specify the stepsize used in subgradient projection method –
constant stepsize, varying stepsize, and compare the convergence.
Part 3 (incremental subgraident method). When the number of examples n is very large,
it is computationally demanding to compute the subgradient of the objective function
in each iteration to update W. In this case, we can decompose the objective function
F(W) into a sum of terms each on a single data. Then we can apply the incremental
subgradient projection methods to solve the problem. Another advantage of this
method is when the data is too large to fit in the physical memory, you only need to
upload a single data point in the memory and compute the corresponding subgradient.
This makes the algorithm scalable to solving very large scale optimization problem.
What to submit:
Part 1, you are supposed to submit how you derive the solution in hard copy.
Part 2/3: you need to submit your source code as well as a brief report on your results
(e.g., the curves showing the progress of the optimization algorithm over iterations, and
the accuracy you can achieve finally with different c). Also compare the convergence
rate with the theoretical result we derive in the class. Note that you can implement
the algorithm in whatever programming languages you choose (e.g, Matlab, Phython,
Java, C/C++, C#). Send your source code and report to [email protected].
Timeline: You are given four weeks to submit the result. For part 1, if you can submit
your derivation in the first two weeks, you will get bonus. In two weeks, I will refer
you to a paper where you can find the solution
(It is OK if you can dig the paper out, learn it and re-derive the solution in your own
language in the first two weeks. You still can get the bonus. But do not ask me which
paper it is in the first two weeks. Note that finding proper references is very important
skill in the research. That’s why we call it re-search. )