Introduction to Online Learning

CSCI 699, Fall 2017

Haipeng Luo

When: TuTh 2:00-3:50
Where: SGM 601
Office Hours: By appointment
TA: Chen-Yu Wei (chenyu dot wei at usc dot edu)

Overview: This course focuses on the foundation and advances of the theory of online learning/online convex optimization/sequential decision making, which has been playing a crucial role in machine learning and many real-life applications. The main theme of the course is to study algorithms whose goal is to minimize "regret" when facing against a possibly adversarial environment, and to understand their theoretical guarantees. Special attention will be paid to more adaptive, efficient and practical algorithms. Some connections to game theory, boosting and other learning problems will also be covered.

Learning Objectives: At a high-level, through this course you will have a concrete idea of what online learning is about, what the state-of-the-art is, and what the open problems are. Specifically, you will learn about classic algorithms such as exponential weights, online mirror descent, UCB, EXP3 and more recent advanced algorithms, as well as general techniques for proving regret upper and lower bounds. The hope is that after this course you will think about machine learning in a more rigorous and principled way and have the ability to design provable and practical machine learning algorithms.

  • 4 problem sets, each of which consists of several theory questions on algorithm design and analysis. Collaboration is allowed but must be stated. Grades are based on correctness. Must be written in Latex. 40% of course grade.
  • A final project. 50% of course grade.
  • Participation. Include regular attendance and a 50-min presentation of a paper. 10% of course grade.

Late homework policy: You are given 4 late days for the problem sets (no late days for the final project), to be used in integer amounts and distributed as you see fit. Additional late days will each result in a deduction of 10% of the grade of the corresponding assignment.

Prerequisites: Familiarity with probability, convex analysis, calculus, and analysis of algorithms. Some basic understandings of machine learning would be very helpful.

Readings: Their is no official textbook for this course, but the following books/surveys are very helpful in general:


Date Topics Recommended Reading Homework
08/22 Introduction;
online Learning;
statistical learning theory;
online-to-batch conversion
Lecture notes;
Chapter 1 and 9 of Hazan's survey;
Chapter 1 of Bubeck's lecture notes;
08/24 the expert problem and Hedge;
Lower bounds;
Follow the Regularized Leader
Lecture notes;
classic paper on the expert problem and Hedge;
Chapter 3.7 of Cesa-Bianchi and Lugosi's book;
Chapter 5.1-5.4 of Hazan's survey
08/29 Online Gradient Descent;
Follow the Perturbed Leader;
Combinatorial problems
Lecture notes;
Chapter 5.5 of Hazan's survey;
Chapter 6 of Bubeck's survey
08/31 Adaptive regret bounds;
"small-loss" bounds;
quantile bounds
Lecture notes;
Chapter 2.4 of Cesa-Bianchi and Lugosi's book
(a different learning rate schedule for small-loss)
09/05 Second order bounds;
Squint algorithm
Lecture notes;
The Squint paper by Koolen and Van Erven
09/07 Variation bounds;
Optimistic FTRL;
Lecture notes;
A different proof for variation bounds;
proof of Optimistic FTRL is from this paper
09/12 Connection to game theory;
minimax theorem;
fast convergence via adaptivity
Lecture notes;
See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for
a general minimax theorem (with similar proof)
09/14 Connection to boosting;
margin theory;
uniform margin bounds via adaptivity
Lecture notes;
Schapire's slides: toy example of AdaBoost;
resistance to overfitting; the margin "movie"
09/19 Non-stationary environments;
interval regrets;
sleeping experts
Lecture notes;
See Sec 2 of this paper for a different efficient implementation.
09/21 switching/tracking regret;
dynamic regret
Lecture notes;
Homework1 due
09/26 Fixed-share algorithm Lecture notes;
09/28 Multi-armed Bandits (MAB);
Exp3 algorithm;
lower bounds
Lecture notes;
See Chapter 6.4-6.6 of Cesa-Bianchi and Lugosi's book
for general partial information problems
10/03 Optimal MAB algorithms;
FTRL/OMD with Tasllis entropy;
high probability bounds
Lecture notes;
See Lemma 1 of this paper for the proof of the
high probability lemma
10/05 Stochastic MAB;
UCB algorithm;
optimism in face of uncertainty
Lecture notes;
See Sec 2.3 of this survey for a lower bound on stochastic MAB
10/10 Stochastic linear bandits;
Lecture notes;
See Theorem 2 of this paper for the proof of confidence ellipsoid
10/12 Adversarial Linear Bandit;
Exp2 algorithm;
Combinatorial bandits
Lecture notes;
See Sec 5 of this paper for more examples of combinatorial bandits
Homework2 due
10/17 FTRL for linear bandit;
Lecture notes;
See the original paper for efficient implementation and
discussions on the online-shortest-path problem
10/19 Bandit Convex Optimization Lecture notes;
See this paper for an L2 ball sampling scheme with gradient descent
10/24 Contextual bandit;
Exp4 algorithm;
Oracle-efficient Algorithms
Lecture notes;
See this paper for the impossibility of oracle-efficiency in general
10/26 Epsilon-Greedy;
policy elimination
Lecture notes
10/31 Optimal and oracle-efficient
algorithm: "minimonster"
Lecture notes;
See the original paper for the very efficient implementation
Project proposal due
11/02 Contextual bandits with
Adversarial Loss;
Relaxation-based approach
Lecture notes;
See this paper for an improved algorithm
11/07 Students' presentations Universal Portfolios With and Without Transaction Costs
presented by Mehdi Jafarnia Jahromi
Logarithmic Regret Algorithms for Online Convex Optimization (**Sec 1-3.2**)
presented by Daoud Burghal
11/09 Students' presentations Optimal Strategies and Minimax Lower Bounds for Online Convex Games
presented by Guangyu Li
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
presented by Liyu Chen
Homework3 due
11/14 No class ML Seminar by Rob Schapire on contextual bandits (3:30pm, SAL 101)
11/16 Students' presentations Online Optimization : Competing with Dynamic Comparators
presented by Jason Gregory
Projection-free Online Learning
presented by Zhiyun Lu
11/21 Students' presentations Regret Bounds for Sleeping Experts and Bandits
presented by He Jiang
Best Arm Identification in Multi-Armed Bandits
presented by Anastasia Voloshinov
11/23 Thanksgiving
11/28 Students' presentations One Practical Algorithm for Both Stochastic and Adversarial Bandits
presented by Michael Conway
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
presented by Kien Nguyen
11/30 Students' presentations Online Learning with Switching Costs and Other Adaptive Adversaries
presented by Ke Zhang
Better Rates for Any Adversarial Deterministic MDP
presented by Karishma Sharma
Homework4 due