CS 277: Control and Reinforcement Learning
Winter 2021
Note: This course was previously offered as CS 295.
Course logistics
- When: Tuesdays and Thursdays at 5–6:20pm
- Lectures will be recorded and added to this playlist with access for uci.edu accounts.
- Where: zoom
- Announcements and forum: piazza
- Important course announcements will be made on the forum.
- Please post on the forum, publicly or privately, all course-related questions (no emails please).
- Assignments: gradscope
- Published on this page biweekly.
- Instructor: Prof. Roy Fox
- Office hours: calendly
- Enrolled students are welcome to:
- schedule 15-minute slots (more than once if needed);
- give at least 4-hour notice;
- attend individually or with friends.
Grading policy
- Assignments: 66%
- 3 best of 4 assignments count for 22% each.
- No late submission.
- Project: 29%
- Participation: 5%
Schedule
(subject to changes)
Assignments
- Assignment 1; due Friday, January 22, 2021 (Pacific Time). Solution
- Assignment 2; due Friday, January 29, 2021 (Pacific Time). Solution
- Assignment 3; due Tuesday, February 16, 2021 (Pacific Time). Solution
- Assignment 4; due Friday, February 26, 2021 (Pacific Time).
Resources
Courses
- Sergey Levine (Berkeley)
- Pieter Abbeel (Berkeley)
- Dimitri Bertsekas (MIT; also available in book form; also see 2017 book)
- David Silver (UCL)
Books
- Vincent François-Lavet et al., An Introduction to Deep Reinforcement Learning
- Richard Sutton & Andrew Barto, Reinforcement Learning: An Introduction (2nd edition)
- Csaba Szepesvári, Algorithms for Reinforcement Learning
RL libraries
More resources
Further reading
Imitation Learning
- Behavior Cloning with Deep Learning Bojarski et al., CVPR 2016
- DAgger Ross et al., AISTATS 2011
- Goal-conditioned imitation learning Ding et al., NeurIPS 2019
- DART Laskey et al., CoRL 2017
- HVIL Fox et al., Infer2Control @ NeurIPS 2018
Temporal-difference methods
- DQN Mnih et al., Nature 2015
- Double Q-learning van Hasselt et al., NeurIPS 2010
- Double DQN van Hasselt et al., AAAI 2016
- Clipped double Q-learning Fujimoto et al., ICML 2018
- Dueling networks Wang et al., ICML 2016
- NAF Gu et al., ICML 2016
Policy-gradient methods
- REINFORCE Williams, ML 1992
- Policy-Gradient Theorem Sutton et al., NeurIPS 2000
- DPG Silver et al., ICML 2014
- DDPG Lillicrap et al., ICLR 2016
- Off-policy policy evaluation Precup et al., ICML 2000
- Off-policy policy gradient Liu et al., UAI 2018
- TRPO Schulman et al., ICML 2015
- PPO Schulman et al., arXiv 2017
Actor–critic methods
Model-based methods
- MCTS Kocsis and Szepesvári, ECML 2006
- ILQR Li and Todorov, ICINCO 2004
- DDP Mayne, IJC 1966
- Dyna Sutton, ACM SIGART Bulletin, 1991
- E3 Kearns and Singh, ML 2002
- R-max Brafman and Tennenholtz, JMLR 2002
- Local models Levine and Abbeel, NeurIPS 2014
- PBVI Pineau et al., IJCAI 2003
Academic honesty
Don’t cheat. Academic honesty is a requirement for passing this class. Compromising the academic integrity of this course is subject to a failing grade. The work you submit must be your own. Academic dishonesty includes, among other things, copying answers from other students or online resources, allowing other students to copy your answers, communicating exam answers to other students during an exam, or attempting to use notes or other aids during an exam. If you do so, you will be in violation of the UCI Policy on Academic Honesty and the ICS Policy on Academic Honesty. It is your responsibility to read and understand these policies, in light of UCI’s definitions and examples of academic misconduct. Note that any instance of academic dishonesty will be reported to the Academic Integrity Administrative Office for disciplinary action, and may fail the course.