Stanford CS229: Machine Learning | Summer 2019 | Lecture 15 - Reinforcement Learning - II | Summary and Q&A

TL;DR
This analysis explores the use of reinforcement learning in continuous settings, including learning models, discretization, and value function approximation.
Key Insights
- ❓ Learning models in reinforcement learning involves estimating the transition probabilities when they are not given.
- 👾 Discretization is a common approach to handle continuous state spaces, but it can limit generalization and result in a high number of discrete states.
- 💁 Value function approximation is used to represent value functions in continuous settings, using a parametric form such as linear regression with feature mapping.
- ❓ In value function approximation, the value function is updated iteratively to minimize the error between predicted values and observed values.
Transcript
back everyone so today we're going to start lecture number 15 and the topic for today is basically uh what's left of reinforcement learning for the purposes of this course we're going to wrap up our reinforcement learning today so first we're going to discuss learnt models we're going to do this first and then then talk about extensions of the meth... Read More
Questions & Answers
Q: What is the difference between a policy and a value function in reinforcement learning?
A policy is a rule book that determines the action to take in a given state, while a value function represents the expected long-term accumulated reward based on a policy and the discount factor gamma.
Q: How is the optimal value function related to the optimal policy in reinforcement learning?
The optimal value function (v star) represents the best possible long-term value achievable by following a suitable policy (pi star). The two are closely related as the optimal policy maximizes the value function.
Q: What is the difference between value iteration and policy iteration algorithms in reinforcement learning?
Value iteration is an iterative algorithm that updates the value function by repeatedly applying the Bellman backup operator. Policy iteration, on the other hand, alternates between policy evaluation (computing the value function for a given policy) and policy improvement (modifying the policy based on the value function).
Q: How does discretization influence the analysis of reinforcement learning in continuous settings?
Discretization involves dividing the continuous state space into discrete parts for easier analysis. However, discretization can lead to the curse of dimensionality, where the number of discrete states increases exponentially with the number of continuous state components.
Summary & Key Takeaways
-
Reinforcement learning involves making sequential decisions over time, with an agent interacting with an environment.
-
Markov Decision Processes (MDPs) are a formalism used to describe this setting, including states, actions, transition probabilities, immediate rewards, policies, and value functions.
-
Learning models in reinforcement learning involves estimating the transition probabilities (psa) when they are not given, which can be achieved through running trials and using maximum likelihood estimates.
Share This Summary 📚
Explore More Summaries from Stanford Online 📚





