Conference on Learning Theory

Conference on Learning Theory
Hristijan Jankulovski 61360413
Algorithms
April 4, 2017
Conference summary The Conference on Learning Theory (COLT) is addressing theoretical aspects of machine learning and related topics. They strongly support a broad
definition of learning theory, including, but not limited to: ”Design and analysis of learning
algorithms”, ”Statistical and computational complexity of learning”, ”Optimization models
and algorithms for learning”, ”Unsupervised, semi-supervised, and active learning”, ”Online
learning”, ”Artificial neural networks, including deep learning”, ”Learning with large-scale
datasets”, ”Decision making under uncertainty”, ”Bayesian methods in learning”, ”High dimensional and non-parametric statistical inference”, ”Planning and control, including reinforcement learning”, ”Learning with additional constraints: e.g. privacy, memory or communication budget”, ”Learning in other settings: e.g. social, economic, and game-theoretic” and
”Analysis and applications of learning theory in related fields: natural language processing,
neuroscience, bioinformatics, privacy and security, machine vision, information retrieval”.
The proceedings contain the 63 papers accepted to and presented at the 29th Conference on
Learning Theory (COLT), held in New York, USA on June 23-26, 2016. These papers were
selected by the program committee with additional help from external expert reviewers from
199 submissions.
Papers review
Online Sparse Linear Regression In this paper, they model the problem of prediction
with limited access to features in the most natural and basic manner as an online sparse
linear regression problem. In this problem, a learner makes predictions for the labels of
examples arriving sequentially over a number of rounds. Each example has d features that
can be potentially accessed by the learner. However, in each round, the learner is restricted
to choosing an arbitrary subset of features of size at most k. The learner then acquires the
values of the subset of features, and then makes its prediction, at which point the true label
of the example is revealed to the learner. The learner suffers a loss for making an incorrect
prediction. The goal of the learner is to make predictions with total loss comparable to the
loss of the best sparse linear regressor. To measure the performance of the online learner, they
use the standard notion of regret, which is the difference between the total loss of the online
learner and the total loss of the best sparse linear regressor. Thus the goal is to minimize
regret with respect to the best sparse linear regressor, where prediction accuracy is measured
by square loss. For this problem, they give an inefficient algorithm that obtains regret
1
√
e T ) after T prediction rounds. They complement this result by showing
bounded by O(
that no algorithm running in polynomial time per iteration can achieve regret bounded by
O(T 1−δ ) for any constant δ > 0 unless NP ⊆ BPP. This hardness result holds even if the
algorithm is allowed to access more features than the best sparse linear regressor up to a
logarithmic factor in the dimension.
Online Isotonic Regression Similarly to the previous paper, this paper consider the
online version of the isotonic regression problem. A problem of sequential prediction in the
class of isotonic functions is proposed. A learner is given a set of T linearly ordered points,
over the course of T trials, the adversary picks a new (as of yet unlabeled) point and the
learner predicts a label from [0, 1] for that point. Then, the true label (also from [0, 1])
is revealed, and the learner suffers the squared error loss. After T rounds the learner is
evaluated by means of the regret, which is its total squared loss minus the loss of the best
isotonic function in hindsight. They survey several standard online learning algorithms and
show that none of them achieve the optimal regret exponent; in fact, most of them (including
Online Gradient Descent, Follow the Leader and Exponential Weights) incur linear regret.
They then prove that the Exponential Weights algorithm played over a covering net of
isotonic functions has a regret bounded by O(T 1/3 log 2/3 (T )) and present a matching Ω(T 1/3 )
lower bound on regret. They provide a computationally efficient version of this algorithm.
They also analyze the noise-free case, in which the revealed labels are isotonic, and show
that the bound can be improved to O(log T ) or even to O(1).
An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Unlike the first two papers, in this paper a multi-armed bandit problem
is presented, which is the most basic example of a sequential decision problem with an
exploration-exploitation trade-off. Specifically, in an n-armed bandit problem, there are n
different options that can be chosen, or n different arms that can be pulled, on each trial;
each of these arms, when pulled, generates some reward (or loss). On each trial, the learner
selects one arm to pull, and observes the reward associated with only the pulled arm; the
rewards associated with the other arms remain hidden from the learner (it is in this sense
that the learner receives only partial information). The goal of the learner is accumulate
as much reward as possible over a sequence of trials, e.g. compared to the best fixed arm
in hindsight. In practice, it is difficult to work with the expected regret due to expensive
computability, therefore an simplified version of the expected regret is used, pseudo-regret,
which is upper bounded by the expected regret. They present an algorithm that achieves
almost optimal pseudo-regret bounds against√adversarial and stochastic bandits. Against
adversarial bandits P
the pseudo-regret is O(K n log n) and against stochastic bandits the
pseudo-regret is O( (log n)/∆i ).
i
2