CS 277: Control and Reinforcement Learning
Winter 2022
Course logistics
- When: Tuesdays and Thursdays at 11am–12:20.
- Where: Zoom.
- Format:
- Lectures: for most topics, lecture videos will be uploaded to this page ahead of the scheduled class for that topic. Access requires a uci.edu account.
- Class discussions: most Tuesdays and Thursdays at 11am–12:20, we will have a virtual class discussion including a recap of the topic, any questions or advanced aspects participants bring up, and solution of any recently due assignment. Attendance in these discussions is optional and they will be recorded and uploaded to this page.
- Quizzes: every week, there will be a quiz about that week’s topics, due by the end of the week. Week 1’s quiz is about background concepts in math, algorithms, and machine learning.
- Assignments: there will be 5 assignments, due roughly every other week. Only the best 4 assignments will be averaged for the final grade, but a bonus will be given for scoring at least 50% on all assignments.
- Announcements and discussion forum:
- We will be using Ed Discussion for important course announcements and course-related discussions.
- Please post on the forum, publicly or privately, all course-related questions.
- Please note that the identity of anonymous posters is visible to the course staff.
- To help us keep things in order, please do not email course staff, except for personal matters unrelated to the course. We often reply with “Please use the forum for course-related matters.”
- Quizzes and assignments:
- Quizzes and assignments will also be uploaded to this page and submitted on Gradescope.
- We won’t be using the course’s Canvas page.
- Instructor: Prof. Roy Fox
- Office hours: Calendly
- Enrolled students are welcome to:
- Schedule 15-minute slots (more than once if needed);
- Give at least 4-hour notice;
- Attend individually or with classmates.
- Teaching assistant: Tiancheng Xu
Grading policy
- Assignments: 80% (+5% bonus)
- Best 4 assignments: 20% each.
- Score at least 50% on each of 5 assignments: 5% bonus.
- Late submission policy: 5 grace days total for all assignments.
- Quizzes: 16% (+2% bonus)
- 6 quizzes: 3% each.
- Deadline on Fridays (end of day).
- Late submission policy: up to 2 submissions allowed by Monday (end of day).
- Participation: 4%
- Forum participation: 2%.
- Post on the forum at least a few on-topic (excluding administrative) questions, answers, thoughts, or useful links.
- Course evaluations: 2%.
- Forum participation: 2%.
Schedule
Note: the planned schedule is subject to change.
Compute Resources
RL Resources
Courses
- Sergey Levine (Berkeley)
- Pieter Abbeel (Berkeley)
- Dimitri Bertsekas (MIT; also available in book form; also see 2017 book)
- David Silver (UCL)
Books
- Vincent François-Lavet et al., An Introduction to Deep Reinforcement Learning
- Richard Sutton & Andrew Barto, Reinforcement Learning: An Introduction (2nd edition)
- Csaba Szepesvári, Algorithms for Reinforcement Learning
RL libraries
More resources
Further reading
Imitation Learning
- Behavior Cloning with Deep Learning, Bojarski et al., CVPR 2016
- DAgger, Ross et al., AISTATS 2011
- Goal-conditioned imitation learning, Ding et al., NeurIPS 2019
- DART, Laskey et al., CoRL 2017
Temporal-difference methods
- Q-Learning, Watkins and Dayan, ML 1992
- Sampling-based Fitted Value Iteration, Munos and Szepesvári, JMLR 2008
- DQN, Mnih et al., Nature 2015
- Double Q-learning, van Hasselt, NeurIPS 2010
- Double DQN, van Hasselt et al., AAAI 2016
- Clipped double Q-learning, Fujimoto et al., ICML 2018
- Prioritized Experience Replay, Schaul et al., ICLR 2016
- Dueling networks, Wang et al., ICML 2016
- Rainbow DQN, Hessel et al., AAAI 2018
- NAF Gu et al., ICML 2016
- R2D2 Kapturowski et al., ICLR 2019
Policy-gradient methods
- REINFORCE Williams, ML 1992
- Policy-Gradient Theorem Sutton et al., NeurIPS 2000
- DPG Silver et al., ICML 2014
- DDPG Lillicrap et al., ICLR 2016
- A3C Mnih et al., ICML 2016
- GAE Schulman et al., ICLR 2015
- Off-policy policy evaluation Precup et al., ICML 2000
- Off-policy policy gradient Liu et al., UAI 2018
- TRPO Schulman et al., ICML 2015
- PPO Schulman et al., arXiv 2017
Exploration
- UCB in MAB Auer et al., ML 2002
- Thompson sampling in MAB Agrawal and Goyal, COLT 2012
- Count-based exploration Bellemare et al., NeurIPS 2016
- Thompson sampling in RL Gopalan and Mannor, COLT 2015
Model-based methods
- MCTS Kocsis and Szepesvári, ECML 2006
- ILQR Li and Todorov, ICINCO 2004
- DDP Mayne, IJC 1966
- Dyna Sutton, ACM SIGART Bulletin, 1991
- E3 Kearns and Singh, ML 2002
- R-max Brafman and Tennenholtz, JMLR 2002
- Local models Levine and Abbeel, NeurIPS 2014
Inverse Reinforcement Learning
- Feature matching Abbeel and Ng, ICML 2004
- MaxEnt IRL Ziebart et al., AAAI 2008
- GAIL Ho and Ermon, NeurIPS 2016
Bounded Reinforcement Learning
- LMDP Todorov, NeurIPS 2007
- Full-controllability duality Todorov, CDC 2008
- Information–value tradeoff Rubin et al., Decision Making with Imperfect Decision Makers 2012
- Variational Inference duality Levine, arXiv 2018
- Soft Q-learning Fox et al., UAI 2016
- Deep SQL Haarnoja et al., ICML 2017
- Soft Actor–Critic Haarnoja et al., ICML 2018
Structured Control
- Options framework Precup et al., AI 1999
- Option–critic method Bacon et al., AAAI 2017
- Skill trees Konidaris et al., NeurIPS 2010
- Spectral method Mahadevan and Maggioni, JMLR 2007
- FuN Vezhnevets et al., ICML 2017
Multi-Task Learning
- Fine-tuning SQL Haarnoja et al., ICML 2017
- Learning with pre-trained perceptual features Levine et al., JMLR 2016
- Learning with model transfer Fu et al., IROS 2016
- Domain randomization Peng et al., ICRA 2018
- Domain adaptation Bousmalis et al., ICRA 2018
- Goal generation for curriculum learning Florensa et al., ICML 2018
- Multi-task policy distillation Teh et al., NeurIPS 2017
- Multi-task hierarchical imitation learning Fox et al., CASE 2019
Academic honesty
Don’t cheat. Academic honesty is a requirement for passing this class. Compromising the academic integrity of this course is subject to a failing grade. The work you submit must be your own. Academic dishonesty includes, among other things, partially copying answers from other students or online resources, allowing other students to partially copy your answers, communicating information about exam answers to other students during an exam, or attempting to use disallowed notes or other aids during an exam. If you do so, you will be in violation of the UCI Policy on Academic Honesty and the ICS Policy on Academic Honesty. It is your responsibility to read and understand these policies, in light of UCI’s definitions and examples of academic misconduct. Note that any instance of academic dishonesty will be reported to the Academic Integrity Administrative Office for disciplinary action, and may fail the course.