CS 277: Control and Reinforcement Learning
Winter 2021
Note: This course was previously offered as CS 295.
Course logistics
- When: Tuesdays and Thursdays at 5–6:20pm
- Lectures will be recorded and added to this playlist with access for uci.edu accounts.
- Where: zoom
- Announcements and forum: piazza
- Important course announcements will be made on the forum.
- Please post on the forum, publicly or privately, all course-related questions (no emails please).
- Assignments: gradescope
- Published on this page biweekly.
- Instructor: Prof. Roy Fox
- Office hours: calendly
- Enrolled students are welcome to:
- schedule 15-minute slots (more than once if needed);
- give at least 4-hour notice;
- attend individually or with friends.
Grading policy
- Assignments: 88%
- 4 best of 5 assignments count for 22% each.
- No late submission.
- Participation: 5%
- Bonus: 7%
Schedule
Assignments
- Assignment 1; due Friday, January 22, 2021 (Pacific Time). Solution
- Assignment 2; due Friday, January 29, 2021 (Pacific Time). Solution
- Assignment 3; due Tuesday, February 16, 2021 (Pacific Time). Solution
- Assignment 4; due Friday, February 26, 2021 (Pacific Time). Solution
- Assignment 5; due Friday, March 12, 2021 (Pacific Time).
Resources
Courses
- Sergey Levine (Berkeley)
- Pieter Abbeel (Berkeley)
- Dimitri Bertsekas (MIT; also available in book form; also see 2017 book)
- David Silver (UCL)
Books
- Vincent François-Lavet et al., An Introduction to Deep Reinforcement Learning
- Richard Sutton & Andrew Barto, Reinforcement Learning: An Introduction (2nd edition)
- Csaba Szepesvári, Algorithms for Reinforcement Learning
RL libraries
More resources
Further reading
Imitation Learning
- Behavior Cloning with Deep Learning Bojarski et al., CVPR 2016
- DAgger Ross et al., AISTATS 2011
- Goal-conditioned imitation learning Ding et al., NeurIPS 2019
- DART Laskey et al., CoRL 2017
- HVIL Fox et al., Infer2Control @ NeurIPS 2018
Temporal-difference methods
- DQN Mnih et al., Nature 2015
- Double Q-learning van Hasselt et al., NeurIPS 2010
- Double DQN van Hasselt et al., AAAI 2016
- Clipped double Q-learning Fujimoto et al., ICML 2018
- Dueling networks Wang et al., ICML 2016
- NAF Gu et al., ICML 2016
Policy-gradient methods
- REINFORCE Williams, ML 1992
- Policy-Gradient Theorem Sutton et al., NeurIPS 2000
- DPG Silver et al., ICML 2014
- DDPG Lillicrap et al., ICLR 2016
- Off-policy policy evaluation Precup et al., ICML 2000
- Off-policy policy gradient Liu et al., UAI 2018
- TRPO Schulman et al., ICML 2015
- PPO Schulman et al., arXiv 2017
Actor–critic methods
Model-based methods
- MCTS Kocsis and Szepesvári, ECML 2006
- ILQR Li and Todorov, ICINCO 2004
- DDP Mayne, IJC 1966
- Dyna Sutton, ACM SIGART Bulletin, 1991
- E3 Kearns and Singh, ML 2002
- R-max Brafman and Tennenholtz, JMLR 2002
- Local models Levine and Abbeel, NeurIPS 2014
- PBVI Pineau et al., IJCAI 2003
Exploration
- UCB in MAB Auer et al., ML 2002
- Thompson sampling in MAB Agrawal and Goyal, COLT 2012
- Count-based exploration Bellemare et al., NeurIPS 2016
- Thompson sampling in RL Gopalan and Mannor, COLT 2015
Inverse Reinforcement Learning
- Feature matching Abbeel and Ng, ICML 2004
- MaxEnt IRL Ziebart et al., AAAI 2008
- GAIL Ho and Ermon, NeurIPS 2016
Control as Inference
- LMDP Todorov, NeurIPS 2007
- Full-controllability duality Todorov, CDC 2008
- Information–value tradeoff Rubin et al., Decision Making with Imperfect Decision Makers 2012
- VI duality Levine, arXiv 2018
- Soft Q-learning Fox et al., UAI 2016
- Deep SQL Haarnoja et al., ICML 2017
- SAC Haarnoja et al., ICML 2018
Structured Control
- Options framework Precup et al., AI 1999
- Option–critic method Bacon et al., AAAI 2017
- Skill trees Konidaris et al., NeurIPS 2010
- Spectral method Mahadevan and Maggioni, JMLR 2007
- Exact inference Fox et al., arXiv 2017
- PHP Fox et al., ICLR 2018
- FuN Vezhnevets et al., ICML 2017
Multi-Task Learning
- Fine-tuning SQL Haarnoja et al., ICML 2017
- Learning with pre-trained perceptual features Levine et al., JMLR 2016
- Learning with model transfer Fu et al., IROS 2016
- Domain randomization Peng et al., ICRA 2018
- Domain adaptation Bousmalis et al., ICRA 2018
- Goal generation for curriculum learning Florensa et al., ICML 2018
- Multi-task policy distillation Teh et al., NeurIPS 2017
- Multi-task hierarchical imitation learning Fox et al., CASE 2019
Multi-Agent RL
- MADDPG Lowe et al., NeurIPS 2017
- NFSP Heinrich and Silver, DRL @ NeurIPS 2016
- Double Oracle McMahan et al., ICML 2003
- PSRO Lanctot et al., NeurIPS 2017