CS 295: Optimal Control and Reinforcement Learning
Winter 2020
General information
- Time: Tuesdays and Thursdays, 3:30–4:50pm
- Location: Physical Sciences Classroom Building (PSCB) 240
- Instructor: Prof. Roy Fox
- Office hours: Fridays 9–11am, DBH 4064
- Announcements and forum: Piazza Class
Resources
- Courses
- Sergey Levine (Berkeley)
- Pieter Abbeel (Berkeley)
- Dimitri Bertsekas (MIT; also available in book form; also see 2017 book)
- David Silver (UCL)
- Books
- RL libraries
Schedule
Assignments
- Assignment 1; due Thursday, January 23 2020, 11pm.
- Assignment 2; due Tuesday, February 4 2020, 11pm.
- Assignment 3; due Tuesday, February 18 2020, 11pm.
- Assignment 4; due Monday, March 23 2020, 11pm.
Further reading
- Imitation learning
- Behavior Cloning with Deep Learning Bojarski et al., CVPR 2016
- DAgger Ross et al., AISTATS 2011
- Goal-conditioned imitation learning Ding et al., NeurIPS 2019
- DART Laskey et al., CoRL 2017
- HVIL Fox et al., Infer2Control @ NeurIPS 2018
- Temporal-difference methods
- DQN Mnih et al., Nature 2015
- Double Q-learning van Hasselt et al., NeurIPS 2010
- Double DQN van Hasselt et al., AAAI 2016
- Clipped double Q-learning Fujimoto et al., ICML 2018
- Dueling networks Wang et al., ICML 2016
- NAF Gu et al., ICML 2016
- Policy-gradient methods
- REINFORCE Williams, ML 1992
- Policy-gradient theorem Sutton et al., NeurIPS 2000
- DPG Silver et al., ICML 2014
- DDPG Lillicrap et al., ICLR 2016
- Off-policy policy evaluation Precup et al., ICML 2000
- Off-policy policy gradient Liu et al., UAI 2018
- TRPO Schulman et al., ICML 2015
- PPO Schulman et al., arXiv 2017
- Actor–critic methods
- Model-based methods
- Partial-observability methods
- PBVI Pineau et al., IJCAI 2003
- PSR Littman and Sutton, NeurIPS 2002
- Spectral learning of PSR Boots et al., IJRR 2011
- Exploration
- UCB in MAB Auer et al., ML 2002
- Thompson sampling in MAB Agrawal and Goyal, COLT 2012
- Count-based exploration Bellemare et al., NeurIPS 2016
- Thompson sampling in RL Gopalan and Mannor, COLT 2015
- Inverse reinforcement learning
- Feature matching Abbeel and Ng, ICML 2004
- MaxEnt IRL Ziebart et al., AAAI 2008
- GAIL Ho and Ermon, NeurIPS 2016
- Control as inference
- LMDP Todorov, NeurIPS 2007
- Full-controllability duality Todorov, CDC 2008
- Information–value tradeoff Rubin et al., Decision Making with Imperfect Decision Makers 2012
- VI duality Levine, arXiv 2018
- Soft Q-learning Fox et al., UAI 2016
- Deep SQL Haarnoja et al., ICML 2017
- SAC Haarnoja et al., ICML 2018
- Structured control
- Options framework Precup et al., AI 1999
- Option–critic method Bacon et al., AAAI 2017
- Skill trees Konidaris et al., NeurIPS 2010
- Spectral method Mahadevan and Maggioni, JMLR 2007
- Exact inference Fox et al., arXiv 2017
- PHP Fox et al., ICLR 2018
- FuN Vezhnevets et al., ICML 2017