CS 277: Control and Reinforcement Learning

Winter 2024

Schedule

(Week) Dates Monday Tuesday Thursday
(1) Jan 9, 11 Intro
 Slides: 
 Recording: 
IL
 Slides: 
 Recording: 
(2) Jan 15, 16, 18 Quiz 1:  TD Methods
 Slides: 
 Recording: 
DQL
 Slides: 
 Recording: 
(3) Jan 22, 23, 25 Exercise 1: 
 Quiz 2: 
PG Methods
 Slides: 
 Recording: 
Advanced MFRL
 Slides: 
 Recording: 
 Trust-Region Methods: 
(4) Jan 29, 30, Feb 1 Quiz 3:  Optimal Control
 Slides: 
 Recording: 
Stochastic Optimal Control
 Slides: 
 Recording: 
(5) Feb 5, 6, 8 Exercise 2: 
 Quiz 4: 
Planning
 Slides: 
 Recording: 
MBRL
 Slides: 
 Recording: 
(6) Feb 12, 13, 15 Quiz 5:  Exploration
 Slides: 
 Recording: 
Partial Observability
 Slides: 
 Recording: 
(7) Feb 19, 20, 22 Exercise 3 (Tuesday): 
 Quiz 6 (Wednesday): 
RLHF
 Recording: 
Inverse RL
 Slides: 
(8) Feb 26, 27, Mar 1 Quiz 7 (Wednesday) Bounded RL Offline RL
(9) Mar 6, 8 Structured Control Multi-Task RL
(10) Mar 12, 13, 15 Exercise 4
 Quiz 8
Multi-Agent RL Open Questions
(11) Mar 19 Exercise 5

Note: the planned schedule is subject to change.

Course logistics

  • When: Tuesdays and Thursdays at 2pm–3:20.
  • Where: DBH 1200.
  • Format:
    • Lectures: there will be a lecture each class covering topics in control and reinforcement learning. Lectures will be recorded when possible and the videos linked on this page (access requires a uci.edu account). For some topics, there will also be videos from previous years. Attendance is optional but recommended (see reasons below).
    • Quizzes: most weeks, there will be a quiz about that week’s topics, due by the following Monday. Quizzes consist of multiple-choice questions intended to encourage you to think more deeply about the topics, and are only graded for completion, not correctness: half the score for submitting a complete quiz, and half the score for doing better than random guess. Week 1’s quiz is about useful background concepts in math, algorithms, and machine learning.
    • Exercises: there will be 5 exercises, due roughly every other week. Only the best 4 exercises will be averaged for the final grade, but a bonus will be given for scoring at least 50% on every exercise.
    • Class discussions: we will discuss each quiz and exercise in a class following its deadline. There will also be recaps, deep dives, and freeform discussions.
  • Ed Discussion:
    • Please use the forum for course-related discussions.
    • Important course announcements will be posted there as well (not on Canvas).
    • Please use the forum (not email) to privately message course staff about course-related matters.
    • Please note that the identity of anonymous posters is visible to the course staff.
  • Gradescope:
    • Quizzes and exercises will be posted on this page and submitted on Gradescope.
    • We encourage submitting PDF files, and particularly writing them in LaTeX.
  • Instructor: Prof. Roy Fox
    • Office hours can be scheduled here.
    • Welcome to schedule 15-minute slots (more than once if needed) with 4-hour notice.
    • Welcome to attend in person (DBH 4064) or by zoom, individually or with classmates.
    • Please let me know if you cannot make any of the available slots.
  • Teaching assistant: Armin Karamzade
    • Office hours can be scheduled here

Grading policy

  • Exercises: 80% (+5% bonus)
    • Best 4 exercises: 20% each.
    • Score at least 50% on every exercise: 5% bonus.
    • Late submission policy: 5 grace days total for all exercises.
  • Quizzes: 16% (+2% bonus)
    • 6 quizzes: 3% each.
    • Deadlines on Mondays (end of day). No late submissions allowed.
    • This grading policy may change if we add more quizzes.
  • Participation: 4%
    • Class or forum participation: 2%.
      • To get full points, occasionally participate in class discussions or office hours by asking thoughtful on-topic questions or sharing quiz answers.
      • Alternatively, post on the forum at least a few on-topic (not administrative) questions, answers, thoughts, or useful links.
    • Course evaluation: 2%.

Compute Resources

Students enrolled in the course have GPU quota on the HPC3 cluster.

RL Resources

Courses
Books
RL libraries
  • RLlib: industry-oriented library
  • Spinning Up: RL introductory material and code
  • Acme: research-oriented library
  • MushroomRL: another research-oriented library
More resources

Further reading

Imitation Learning
Temporal-difference methods
Policy-gradient methods

Academic integrity

Don’t cheat. Academic dishonesty includes partially copying answers from other students or online resources, allowing other students to partially copy your answers, communicating information about exam answers to other students during an exam, or attempting to use disallowed notes or other aids during an exam. The biggest downside to such behavior is that it necessarily becomes part of who you are — a dishonest person. Additionally painful, such behavior is easier to identify than you think, and consequences can be severe, including failing this course. Trust me, it's not worth it.