Date 
Topics 
Recommended Reading 
Homework 
08/22 
Introduction; online Learning; statistical learning theory; onlinetobatch conversion 
Lecture notes;
Chapter 1 and 9 of Hazan's survey; Chapter 1 of Bubeck's lecture notes;


08/24 
the expert problem and Hedge; Lower bounds; Follow the Regularized Leader 
Lecture notes;
classic paper on the expert problem and Hedge;
Chapter 3.7 of CesaBianchi and Lugosi's book;
Chapter 5.15.4 of Hazan's survey 

08/29 
Online Gradient Descent; Follow the Perturbed Leader; Combinatorial problems 
Lecture notes;
Chapter 5.5 of Hazan's survey; Chapter 6 of Bubeck's survey 

08/31 
Adaptive regret bounds; "smallloss" bounds; quantile bounds 
Lecture notes;
Chapter 2.4 of CesaBianchi and Lugosi's book (a different learning rate schedule for smallloss) 

09/05 
Second order bounds; Squint algorithm 
Lecture notes;
The Squint paper by Koolen and Van Erven 

09/07 
Variation bounds; Optimistic FTRL; 
Lecture notes;
A different proof for variation bounds;
proof of Optimistic FTRL is from this
paper 
Homework1 
09/12 
Connection to game theory; minimax theorem; fast convergence via adaptivity 
Lecture notes;
See Chapter 7.2 of CesaBianchi and Lugosi's book for a general minimax theorem (with similar proof) 

09/14 
Connection to boosting; AdaBoost; margin theory; uniform margin bounds via adaptivity 
Lecture notes;
Schapire's slides: toy example of AdaBoost;
resistance to overfitting;
the margin "movie"


09/19 
Nonstationary environments; interval regrets; sleeping experts 
Lecture notes;
See Sec 2 of this paper for a different efficient implementation.


09/21 
switching/tracking regret; dynamic regret 
Lecture notes;

Homework1 due 
09/26 
Fixedshare algorithm 
Lecture notes;


09/28 
Multiarmed Bandits (MAB); Exp3 algorithm; lower bounds 
Lecture notes;
See Chapter 6.46.6 of CesaBianchi and Lugosi's book
for general partial information problems

Homework2 
10/03 
Optimal MAB algorithms; FTRL/OMD with Tasllis entropy; high probability bounds 
Lecture notes;
See Lemma 1 of this paper
for the proof of the high probability lemma


10/05 
Stochastic MAB; Explorethenexploit; UCB algorithm; optimism in face of uncertainty 
Lecture notes;
See Sec 2.3 of this survey for a lower bound on stochastic MAB 

10/10 
Stochastic linear bandits; LinUCB 
Lecture notes;
See Theorem 2 of this
paper for the proof of confidence ellipsoid


10/12 
Adversarial Linear Bandit; Exp2 algorithm; Combinatorial bandits 
Lecture notes;
See Sec 5 of this paper for more examples of combinatorial bandits 
Homework2 due 
10/17 
FTRL for linear bandit; SCRiBLe 
Lecture notes;
See the original paper for efficient implementation
and discussions on the onlineshortestpath problem 

10/19 
Bandit Convex Optimization 
Lecture notes;
See this paper for an L2 ball sampling scheme with gradient descent

Homework3 
10/24 
Contextual bandit; Exp4 algorithm; Oracleefficient Algorithms 
Lecture notes;
See this paper for the impossibility of oracleefficiency in general


10/26 
EpsilonGreedy; policy elimination 
Lecture notes


10/31 
Optimal and oracleefficient algorithm: "minimonster" 
Lecture notes;
See the original paper for the very efficient implementation

Project proposal due 
11/02 
Contextual bandits with Adversarial Loss; Relaxationbased approach 
Lecture notes;
See this paper
for an improved algorithm


11/07 
Students' presentations 
Universal Portfolios With and Without Transaction Costs
presented by Mehdi Jafarnia Jahromi
Logarithmic Regret Algorithms for Online Convex Optimization (**Sec 13.2**)
presented by Daoud Burghal


11/09 
Students' presentations 
Optimal Strategies and Minimax Lower Bounds for Online Convex Games
presented by Guangyu Li
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
presented by Liyu Chen

Homework3 due 
11/14 
No class 
ML Seminar by Rob Schapire on contextual bandits (3:30pm, SAL 101) 

11/16 
Students' presentations 
Online Optimization : Competing with Dynamic Comparators
presented by Jason Gregory
Projectionfree Online Learning
presented by Zhiyun Lu

Homework4 
11/21 
Students' presentations 
Regret Bounds for Sleeping Experts and Bandits
presented by He Jiang
Best Arm Identification in MultiArmed Bandits
presented by Anastasia Voloshinov


11/23 
Thanksgiving 


11/28 
Students' presentations 
One Practical Algorithm for Both Stochastic and Adversarial Bandits
presented by Michael Conway
Online Bandit Learning against an Adaptive Adversary:
from Regret to Policy Regret presented by Kien Nguyen


11/30 
Students' presentations 
Online Learning with Switching Costs and Other Adaptive Adversaries
presented by Ke Zhang
Better Rates for Any Adversarial Deterministic MDP
presented by Karishma Sharma

Homework4 due 