CS 277: Control and Reinforcement Learning
Winter 2024
Schedule
Note: the planned schedule is subject to change.
Course logistics
- When: Tuesdays and Thursdays at 2pm–3:20.
- Where: DBH 1200.
- Format:
- Lectures: there will be a lecture each class covering topics in control and reinforcement learning. Lectures will be recorded when possible and the videos linked on this page (access requires a uci.edu account). For some topics, there will also be videos from previous years. Attendance is optional but recommended (see reasons below).
- Quizzes: most weeks, there will be a quiz about that week’s topics, due by the following Monday. Quizzes consist of multiple-choice questions intended to encourage you to think more deeply about the topics, and are only graded for completion, not correctness: half the score for submitting a complete quiz, and half the score for doing better than random guess. Week 1’s quiz is about useful background concepts in math, algorithms, and machine learning.
- Exercises: there will be 5 exercises, due roughly every other week. Only the best 4 exercises will be averaged for the final grade, but a bonus will be given for scoring at least 50% on every exercise.
- Class discussions: we will discuss each quiz and exercise in a class following its deadline. There will also be recaps, deep dives, and freeform discussions.
- Ed Discussion:
- Please use the forum for course-related discussions.
- Important course announcements will be posted there as well (not on Canvas).
- Please use the forum (not email) to privately message course staff about course-related matters.
- Please note that the identity of anonymous posters is visible to the course staff.
- Gradescope:
- Quizzes and exercises will be posted on this page and submitted on Gradescope.
- We encourage submitting PDF files, and particularly writing them in LaTeX.
- Instructor: Prof. Roy Fox
- Office hours can be scheduled here.
- Welcome to schedule 15-minute slots (more than once if needed) with 4-hour notice.
- Welcome to attend in person (DBH 4064) or by zoom, individually or with classmates.
- Please let me know if you cannot make any of the available slots.
- Teaching assistant: Armin Karamzade
- Office hours can be scheduled here
Grading policy
- Exercises: 80% (+5% bonus)
- Best 4 exercises: 20% each.
- Score at least 50% on every exercise: 5% bonus.
- Late submission policy: 5 grace days total for all exercises.
- Quizzes: 16% (+2% bonus)
- 6 quizzes: 3% each.
- Deadlines on Mondays (end of day). No late submissions allowed.
- This grading policy may change if we add more quizzes.
- Participation: 4%
- Class or forum participation: 2%.
- To get full points, occasionally participate in class discussions or office hours by asking thoughtful on-topic questions or sharing quiz answers.
- Alternatively, post on the forum at least a few on-topic (not administrative) questions, answers, thoughts, or useful links.
- Course evaluation: 2%.
- Class or forum participation: 2%.
Compute Resources
Students enrolled in the course have GPU quota on the HPC3 cluster.
RL Resources
Courses
- Sergey Levine (Berkeley)
- Pieter Abbeel (Berkeley)
- Dimitri Bertsekas (MIT; also available in book form; also see 2017 book)
- David Silver (UCL)
Books
- Vincent François-Lavet et al., An Introduction to Deep Reinforcement Learning
- Richard Sutton & Andrew Barto, Reinforcement Learning: An Introduction (2nd edition)
- Csaba Szepesvári, Algorithms for Reinforcement Learning
RL libraries
- RLlib: industry-oriented library
- Spinning Up: RL introductory material and code
- Acme: research-oriented library
- MushroomRL: another research-oriented library
More resources
- Awesome Deep RL: miscellaneous great resources
Further reading
Imitation Learning
- Behavior Cloning with Deep Learning, Bojarski et al., CVPR 2016
- DAgger, Ross et al., AISTATS 2011
- Goal-conditioned imitation learning, Ding et al., NeurIPS 2019
- DART, Laskey et al., CoRL 2017
Temporal-difference methods
- Q-Learning, Watkins and Dayan, ML 1992
- Sampling-based Fitted Value Iteration, Munos and Szepesvári, JMLR 2008
- DQN, Mnih et al., Nature 2015
- Double Q-learning, van Hasselt, NeurIPS 2010
- Double DQN, van Hasselt et al., AAAI 2016
- Clipped double Q-learning, Fujimoto et al., ICML 2018
- Prioritized Experience Replay, Schaul et al., ICLR 2016
- Dueling networks, Wang et al., ICML 2016
- Rainbow DQN, Hessel et al., AAAI 2018
- NAF Gu et al., ICML 2016
- R2D2 Kapturowski et al., ICLR 2019
Policy-gradient methods
- REINFORCE Williams, ML 1992
- Policy-Gradient Theorem Sutton et al., NeurIPS 2000
- DPG Silver et al., ICML 2014
- DDPG Lillicrap et al., ICLR 2016
- A3C Mnih et al., ICML 2016
- GAE Schulman et al., ICLR 2015
- Off-policy policy evaluation Precup et al., ICML 2000
- Off-policy policy gradient Liu et al., UAI 2018
- TRPO Schulman et al., ICML 2015
- PPO Schulman et al., arXiv 2017
Exploration
- UCB in MAB Auer et al., ML 2002
- Thompson sampling in MAB Agrawal and Goyal, COLT 2012
- Count-based exploration Bellemare et al., NeurIPS 2016
- Thompson sampling in RL Gopalan and Mannor, COLT 2015
Model-based methods
- MCTS Kocsis and Szepesvári, ECML 2006
- ILQR Li and Todorov, ICINCO 2004
- DDP Mayne, IJC 1966
- Dyna Sutton, ACM SIGART Bulletin, 1991
- E3 Kearns and Singh, ML 2002
- R-max Brafman and Tennenholtz, JMLR 2002
- Local models Levine and Abbeel, NeurIPS 2014
Inverse Reinforcement Learning
- Feature matching Abbeel and Ng, ICML 2004
- MaxEnt IRL Ziebart et al., AAAI 2008
- GAIL Ho and Ermon, NeurIPS 2016
Bounded Reinforcement Learning
- LMDP Todorov, NeurIPS 2007
- Full-controllability duality Todorov, CDC 2008
- Information–value tradeoff Rubin et al., Decision Making with Imperfect Decision Makers 2012
- Variational Inference duality Levine, arXiv 2018
- Soft Q-learning Fox et al., UAI 2016
- Deep SQL Haarnoja et al., ICML 2017
- Soft Actor–Critic Haarnoja et al., ICML 2018
Structured Control
- Options framework Precup et al., AI 1999
- Option–critic method Bacon et al., AAAI 2017
- Skill trees Konidaris et al., NeurIPS 2010
- Spectral method Mahadevan and Maggioni, JMLR 2007
- FuN Vezhnevets et al., ICML 2017
Offline RL
- The Bitter Lesson Sutton, 2019
- Doubly robust reinforcement learning Jiang and Li, ICML 2016
- GenDICE Zhang et al., ICLR 2020
- Explicit policy constraining Wu et al., arXiv 2019
- Implicit policy constraining (AWR) Peng et al., arXiv 2019
- IQL Kostrikov et al., ICLR 2022
- CQL Kumar et al., NeurIPS 2020
Multi-Task Learning
- Fine-tuning SQL Haarnoja et al., ICML 2017
- Learning with pre-trained perceptual features Levine et al., JMLR 2016
- Learning with model transfer Fu et al., IROS 2016
- Domain randomization Peng et al., ICRA 2018
- Domain adaptation Bousmalis et al., ICRA 2018
- Goal generation for curriculum learning Florensa et al., ICML 2018
- Multi-task policy distillation Teh et al., NeurIPS 2017
- Multi-task hierarchical imitation learning Fox et al., CASE 2019
Multi-Agent RL
- MADDPG Lowe et al., NeurIPS 2017
- NFSP Heinrich and Silver, DRL @ NeurIPS 2016
- Double Oracle McMahan et al., ICML 2003
- PSRO Lanctot et al., NeurIPS 2017
- XDO McAleer et al., NeurIPS 2021