CS 277: Control and Reinforcement Learning

Winter 2024

Schedule

(Week) Dates Monday Tuesday Thursday
(1) Jan 9, 11Intro
 Slides: 
 Recording: 
IL
 Slides: 
 Recording: 
(2) Jan 15, 16, 18Quiz 1 (Wed): TD Methods
 Slides: 
 Recording: 
DQN
 Slides: 
 Recording: 
(3) Jan 22, 23, 25Exercise 1: 
 Quiz 2: 
PG Methods
 Slides: 
 Recording: 
Advanced MFRL
 Slides: 
 Recording: 
 Trust-Region Methods: 
(4) Jan 29, 30, Feb 1Quiz 3: Optimal Control
 Slides: 
 Recording: 
Stochastic Optimal Control
 Slides: 
 Recording: 
(5) Feb 5, 6, 8Exercise 2: 
 Quiz 4: 
Planning
 Slides: 
 Recording: 
MBRL
 Slides: 
 Recording: 
(6) Feb 12, 13, 15Quiz 5: Exploration
 Slides: 
 Recording: 
Partial Observability
 Slides: 
 Recording: 
(7) Feb 19, 20, 22Exercise 3 (Tue): 
 Quiz 6 (Wed): 
Guest Lecture: RLHF
 Kolby Nottingham
 Slides: 
 Recording: 
Inverse RL
 Slides: 
 Recording: 
(8) Feb 27, 29Bounded RL
 Slides: 
 Recording: 
Structured Control
 Slides: 
 Recording: 
(9) Mar 4, 5, 7Quiz 7 (Wed): Offline RL
 Slides: 
 Recording: 
Multi-Task RL
 Slides: 
 Recording: 
(10) Mar 11, 12, 14Exercise 4: 
 Quiz 8 (Wed): 
Multi-Agent RL
 Slides: 
 Recording: 
Open Questions
 Slides: 
 Recording: 
(11) Mar 18Exercise 5: 

Note: the planned schedule is subject to change.

Course logistics

  • When: Tuesdays and Thursdays at 2pm–3:20.
  • Where: DBH 1200.
  • Format:
    • Lectures: there will be a lecture each class covering topics in control and reinforcement learning. Lectures will be recorded when possible and the videos linked on this page (access requires a uci.edu account). For some topics, there will also be videos from previous years. Attendance is optional but recommended (see reasons below).
    • Quizzes: most weeks, there will be a quiz about that week’s topics, due by the following Monday. Quizzes consist of multiple-choice questions intended to encourage you to think more deeply about the topics, and are only graded for completion, not correctness: half the score for submitting a complete quiz, and half the score for doing better than random guess. Week 1’s quiz is about useful background concepts in math, algorithms, and machine learning.
    • Exercises: there will be 5 exercises, due roughly every other week. Only the best 4 exercises will be averaged for the final grade, but a bonus will be given for scoring at least 50% on every exercise.
    • Class discussions: we will discuss each quiz and exercise in a class following its deadline. There will also be recaps, deep dives, and freeform discussions.
  • Ed Discussion:
    • Please use the forum for course-related discussions.
    • Important course announcements will be posted there as well (not on Canvas).
    • Please use the forum (not email) to privately message course staff about course-related matters.
    • Please note that the identity of anonymous posters is visible to the course staff.
  • Gradescope:
    • Quizzes and exercises will be posted on this page and submitted on Gradescope.
    • We encourage submitting PDF files, and particularly writing them in LaTeX.
  • Instructor: Prof. Roy Fox
    • Office hours can be scheduled here.
    • Welcome to schedule 15-minute slots (more than once if needed) with 4-hour notice.
    • Welcome to attend in person (DBH 4064) or by zoom, individually or with classmates.
    • Please let me know if you cannot make any of the available slots.
  • Teaching assistant: Armin Karamzade
    • Office hours can be scheduled here

Grading policy

  • Exercises: 80% (+5% bonus)
    • Best 4 exercises: 20% each.
    • Score at least 50% on every exercise: 5% bonus.
    • Late submission policy: 5 grace days total for all exercises.
  • Quizzes: 16% (+2% bonus)
    • 6 quizzes: 3% each.
    • Deadlines on Mondays (end of day). No late submissions allowed.
    • This grading policy may change if we add more quizzes.
  • Participation: 4%
    • Class or forum participation: 2%.
      • To get full points, occasionally participate in class discussions or office hours by asking thoughtful on-topic questions or sharing quiz answers.
      • Alternatively, post on the forum at least a few on-topic (not administrative) questions, answers, thoughts, or useful links.
    • Course evaluation: 2%.

Compute Resources

Students enrolled in the course have GPU quota on the HPC3 cluster.

RL Resources

Courses
Books
RL libraries
  • RLlib: industry-oriented library
  • Spinning Up: RL introductory material and code
  • Acme: research-oriented library
  • MushroomRL: another research-oriented library
More resources

Further reading

Imitation Learning
Temporal-difference methods
Policy-gradient methods
Exploration
Model-based methods
Inverse Reinforcement Learning
Bounded Reinforcement Learning
Structured Control
Offline RL
Multi-Task Learning
Multi-Agent RL