CS 295: Optimal Control and Reinforcement Learning

Winter 2020

General information

Time: Tuesdays and Thursdays, 3:30–4:50pm
Location: Physical Sciences Classroom Building (PSCB) 240
Instructor: Prof. Roy Fox
- Office hours: Fridays 9–11am, DBH 4064
Announcements and forum: Piazza Class

Resources

Courses
- Sergey Levine (Berkeley)
- Pieter Abbeel (Berkeley)
- Dimitri Bertsekas (MIT; also available in book form; also see 2017 book)
- David Silver (UCL)
Books
RL libraries

Schedule

Week	Tuesday	Thursday
(1) Jan 6	Introduction	Imitation learning
(2) Jan 13	Optimal control	Stochastic optimal control
(3) Jan 20	Temporal-difference methods	Policy-gradient methods Assignment 1 due
(4) Jan 27	Actor–critic methods	Advanced model-free methods
(5) Feb 3	Planning Assignment 2 due	Model-based methods
(6) Feb 10	– cancelled –	Partial-observability methods
(7) Feb 17	Advanced partial-observability methods Assignment 3 due	Exploration
(8) Feb 24	Inverse RL	Control as inference
(9) Mar 2	Structured control	Multi-agent systems Guest Lecturer: Stephen McAleer
(10) Mar 9	Multi-task learning	– cancelled – Final assignment due

Assignments

Assignment 1; due Thursday, January 23 2020, 11pm.
Assignment 2; due Tuesday, February 4 2020, 11pm.
Assignment 3; due Tuesday, February 18 2020, 11pm.
Assignment 4; due Monday, March 23 2020, 11pm.

Imitation learning
- Behavior Cloning with Deep Learning Bojarski et al., CVPR 2016
- DAgger Ross et al., AISTATS 2011
- Goal-conditioned imitation learning Ding et al., NeurIPS 2019
- DART Laskey et al., CoRL 2017
- HVIL Fox et al., Infer2Control @ NeurIPS 2018
Temporal-difference methods
- DQN Mnih et al., Nature 2015
- Double Q-learning van Hasselt et al., NeurIPS 2010
- Double DQN van Hasselt et al., AAAI 2016
- Clipped double Q-learning Fujimoto et al., ICML 2018
- Dueling networks Wang et al., ICML 2016
- NAF Gu et al., ICML 2016
Policy-gradient methods
- REINFORCE Williams, ML 1992
- Policy-gradient theorem Sutton et al., NeurIPS 2000
- DPG Silver et al., ICML 2014
- DDPG Lillicrap et al., ICLR 2016
- Off-policy policy evaluation Precup et al., ICML 2000
- Off-policy policy gradient Liu et al., UAI 2018
- TRPO Schulman et al., ICML 2015
- PPO Schulman et al., arXiv 2017
Actor–critic methods
- A3C Mnih et al., ICML 2016
- GAE Schulman et al., ICLR 2015
Model-based methods
- MCTS Kocsis and Szepesvári, ECML 2006
- ILQR Li and Todorov, ICINCO 2004
- DDP Mayne, IJC 1966
- Dyna Sutton, ACM SIGART Bulletin, 1991
- E³ Kearns and Singh, ML 2002
- R-max Brafman and Tennenholtz, JMLR 2002
- Local models Levine and Abbeel, NeurIPS 2014
Partial-observability methods
- PBVI Pineau et al., IJCAI 2003
- PSR Littman and Sutton, NeurIPS 2002
- Spectral learning of PSR Boots et al., IJRR 2011
Exploration
- UCB in MAB Auer et al., ML 2002
- Thompson sampling in MAB Agrawal and Goyal, COLT 2012
- Count-based exploration Bellemare et al., NeurIPS 2016
- Thompson sampling in RL Gopalan and Mannor, COLT 2015
Inverse reinforcement learning
- Feature matching Abbeel and Ng, ICML 2004
- MaxEnt IRL Ziebart et al., AAAI 2008
- GAIL Ho and Ermon, NeurIPS 2016
Control as inference
- LMDP Todorov, NeurIPS 2007
- Full-controllability duality Todorov, CDC 2008
- Information–value tradeoff Rubin et al., Decision Making with Imperfect Decision Makers 2012
- VI duality Levine, arXiv 2018
- Soft Q-learning Fox et al., UAI 2016
- Deep SQL Haarnoja et al., ICML 2017
- SAC Haarnoja et al., ICML 2018
Structured control
- Options framework Precup et al., AI 1999
- Option–critic method Bacon et al., AAAI 2017
- Skill trees Konidaris et al., NeurIPS 2010
- Spectral method Mahadevan and Maggioni, JMLR 2007
- Exact inference Fox et al., arXiv 2017
- PHP Fox et al., ICLR 2018
- FuN Vezhnevets et al., ICML 2017

CS 295: Optimal Control and Reinforcement Learning

Winter 2020

General information

Resources

Schedule

Assignments

Further reading