CS 277: Control and Reinforcement Learning

Winter 2024

Schedule

(Week) Dates	Monday	Tuesday	Thursday
(1) Jan 9, 11		Intro Slides: Recording:	IL Slides: Recording:
(2) Jan 15, 16, 18	Quiz 1 (Wed):	TD Methods Slides: Recording:	DQN Slides: Recording:
(3) Jan 22, 23, 25	Exercise 1: Quiz 2:	PG Methods Slides: Recording:	Advanced MFRL Slides: Recording: Trust-Region Methods:
(4) Jan 29, 30, Feb 1	Quiz 3:	Optimal Control Slides: Recording:	Stochastic Optimal Control Slides: Recording:
(5) Feb 5, 6, 8	Exercise 2: Quiz 4:	Planning Slides: Recording:	MBRL Slides: Recording:
(6) Feb 12, 13, 15	Quiz 5:	Exploration Slides: Recording:	Partial Observability Slides: Recording:
(7) Feb 19, 20, 22	Exercise 3 (Tue): Quiz 6 (Wed):	Guest Lecture: RLHF Kolby Nottingham Slides: Recording:	Inverse RL Slides: Recording:
(8) Feb 27, 29		Bounded RL Slides: Recording:	Structured Control Slides: Recording:
(9) Mar 4, 5, 7	Quiz 7 (Wed):	Offline RL Slides: Recording:	Multi-Task RL Slides: Recording:
(10) Mar 11, 12, 14	Exercise 4: Quiz 8 (Wed):	Multi-Agent RL Slides: Recording:	Open Questions Slides: Recording:
(11) Mar 18	Exercise 5:

Note: the planned schedule is subject to change.

Course logistics

When: Tuesdays and Thursdays at 2pm–3:20.
Where: DBH 1200.
Format:
- Lectures: there will be a lecture each class covering topics in control and reinforcement learning. Lectures will be recorded when possible and the videos linked on this page (access requires a uci.edu account). For some topics, there will also be videos from previous years. Attendance is optional but recommended (see reasons below).
- Quizzes: most weeks, there will be a quiz about that week’s topics, due by the following Monday. Quizzes consist of multiple-choice questions intended to encourage you to think more deeply about the topics, and are only graded for completion, not correctness: half the score for submitting a complete quiz, and half the score for doing better than random guess. Week 1’s quiz is about useful background concepts in math, algorithms, and machine learning.
- Exercises: there will be 5 exercises, due roughly every other week. Only the best 4 exercises will be averaged for the final grade, but a bonus will be given for scoring at least 50% on every exercise.
- Class discussions: we will discuss each quiz and exercise in a class following its deadline. There will also be recaps, deep dives, and freeform discussions.
Ed Discussion:
- Please use the forum for course-related discussions.
- Important course announcements will be posted there as well (not on Canvas).
- Please use the forum (not email) to privately message course staff about course-related matters.
- Please note that the identity of anonymous posters is visible to the course staff.
Gradescope:
- Quizzes and exercises will be posted on this page and submitted on Gradescope.
- We encourage submitting PDF files, and particularly writing them in LaTeX.
Instructor: Prof. Roy Fox
- Office hours can be scheduled here.
- Welcome to schedule 15-minute slots (more than once if needed) with 4-hour notice.
- Welcome to attend in person (DBH 4064) or by zoom, individually or with classmates.
- Please let me know if you cannot make any of the available slots.
Teaching assistant: Armin Karamzade
- Office hours can be scheduled here

Grading policy

Exercises: 80% (+5% bonus)
- Best 4 exercises: 20% each.
- Score at least 50% on every exercise: 5% bonus.
- Late submission policy: 5 grace days total for all exercises.
Quizzes: 16% (+2% bonus)
- 6 quizzes: 3% each.
- Deadlines on Mondays (end of day). No late submissions allowed.
- This grading policy may change if we add more quizzes.
Participation: 4%
- Class or forum participation: 2%.
  - To get full points, occasionally participate in class discussions or office hours by asking thoughtful on-topic questions or sharing quiz answers.
  - Alternatively, post on the forum at least a few on-topic (not administrative) questions, answers, thoughts, or useful links.
- Course evaluation: 2%.

Compute Resources

Students enrolled in the course have GPU quota on the HPC3 cluster.

RL Resources

Courses

Sergey Levine (Berkeley)
Pieter Abbeel (Berkeley)
Dimitri Bertsekas (MIT; also available in book form; also see 2017 book)
David Silver (UCL)

Books

Vincent François-Lavet et al., An Introduction to Deep Reinforcement Learning
Richard Sutton & Andrew Barto, Reinforcement Learning: An Introduction (2nd edition)
Csaba Szepesvári, Algorithms for Reinforcement Learning

RL libraries

RLlib: industry-oriented library
Spinning Up: RL introductory material and code
Acme: research-oriented library
MushroomRL: another research-oriented library

More resources

Awesome Deep RL: miscellaneous great resources

Academic integrity

Don’t cheat. Academic dishonesty includes partially copying answers from other students or online resources, allowing other students to partially copy your answers, communicating information about exam answers to other students during an exam, or attempting to use disallowed notes or other aids during an exam. The biggest downside to such behavior is that it necessarily becomes part of who you are — a dishonest person. Additionally painful, such behavior is easier to identify than you think, and consequences can be severe, including failing this course. Trust me, it's not worth it.

CS 277: Control and Reinforcement Learning

Winter 2024

Schedule

Course logistics

Grading policy

Compute Resources

RL Resources

Courses

Books

RL libraries

More resources

Further reading

Imitation Learning

Temporal-difference methods

Policy-gradient methods

Exploration

Model-based methods

Inverse Reinforcement Learning

Bounded Reinforcement Learning

Structured Control

Offline RL

Multi-Task Learning

Multi-Agent RL

Academic integrity