CS980: Advanced ML: Reinforcement Learning
Please see the class homepage for schedule and other up-to-date information.
Where
Kingsbury N233
When
MW 3:40pm - 5:00pm
What
In this seminar, we will cover reinforcement learning, or how to make good decisions driven by data. The goal in reinforcement learning is to learn how to act while interacting with a dynamic and complex environment. Reinforcement learning methods can be applied in various domains, such as when managing ecosystems, optimizing website, to robotics, and healthcare.
Our focus will be on reinforcement learning that can learn from batch data sets without interacting with the environment. The algorithms have to learn how to interact with the environment based on a historical data. These batch methods are important when the cost of failure is high, such as in healthcare of agriculture, when using trial and error is impractical.
Some of the topics that we will cover are:
- Markov decision processes: Policy iteration, Value iteration, Linear programming
- Policy evaluation (online and offline): TD, LSTD
- Batch policy improvement: Q-learning, LSPI
- Convex optimization: Linear programming
The focus of the class will be on depth rather than breadth. The main goal of the class is delivering an interesting reinforcement learning group project. As much as possible we will work on the same code-base using a subset of C++, Python, and R. The class will require independent study of reading materials and in-class group problem solving and paper discussions.
Textbooks
Follow the links for free online versions.
Our main textbook will be:
- Szepesvári, C. (2010). Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 4(1), 1–103.
Additional material can be found in:
- Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc.
- Sutton, R. S., & Barto, A. (1998). Reinforcement learning. MIT Press.
Pre-requisites
Statistics, some linear algebra and ideally familiarity with machine learning.