Lecture 19  Reward Model & Linear Dynamical System  Stanford CS229: Machine Learning (Autumn 2018)  Summary and Q&A
TL;DR
This analysis explores reinforcement learning and Markov Decision Processes (MDPs), highlighting the generalizations of MDPs to stateaction rewards and finite horizon MDPs, as well as the application of linear dynamical systems. It also discusses the grading system for CS229 and the effectiveness of LQR in solving MDPs.
Questions & Answers
Q: What are the advantages of generalizing MDPs to stateaction rewards and finite horizon MDPs?
Generalizing MDPs to stateaction rewards and finite horizon MDPs makes it easier to model specific types of problems, such as problems with different costs for different actions or problems with a specific time limit.
Q: What is the significance of linear dynamical systems in solving MDPs?
Linear dynamical systems allow for solving MDPs with infinite or continuous state spaces by using functions and value iterations. This eliminates the need for function approximation and allows for computing the exact value function.
Q: What is the role of noise in MDPs, and why is it important to consider?
Noise is often added to MDPs to account for uncertainties and to reflect realworld situations. While the exact details of the noise may not matter as much, ensuring that some noise is included is crucial for robust learning and policy formation.
Q: How does LQR differ from other reinforcement learning algorithms?
LQR is a specific reinforcement learning algorithm that works specifically for MDPs with linear dynamical systems and quadratic cost functions. It allows for the computation of the optimal policy and does not involve any approximation.
Summary & Key Takeaways

The analysis covers the generalizations of MDPs to stateaction rewards and finite horizon MDPs, making it easier to model certain types of problems.

Linear dynamical systems are discussed, specifically in the context of fitted value iteration, which allows for solving MDPs with infinite or continuous state spaces.

The importance of properly modeling and adding noise to MDPs is emphasized, as well as the convenience and effectiveness of LQR in solving MDPs.