Temporal-Difference Learning: Combining Dynamic Programming and Monte Carlo Methods for Reinforcement Learning | by Oliver S | Oct, 2024

0


Milestones of RL: Q-Learning and Double Q-Learning

We continue our deep dive of Sutton’s book “Reinforcement Learning: An Introduction” [1], and in this post introduce Temporal-Difference (TD) Learning, which is Chapter 6 of said work.

TD learning can be viewed as a combination of Dynamic Programming (DP) and Monte Carlo (MC) methods, which we introduced in the previous two posts, and marks an important milestone in the field of Reinforcement Learning (RL) — combining the strength of aforementioned methods: TD learning does not need a model and learns from experience alone, similar to MC, but also “bootstraps” — uses previously established estimates — similar to DP.

Photo by Brooke Campbell on Unsplash

Here, we will introduce this family of methods, both from a theoretical standpoint but also showing relevant practical algorithms, such as Q-learning — accompanied with Python code. As usual, all code can be found on GitHub.

We begin with an introduction and motivation, and then start with the prediction problem — similar to the previous posts. Then, we dive deeper in the theory and discuss which solution TD learning finds. Following that, we move to the control problem, and present a…

Leave a Reply

Your email address will not be published. Required fields are marked *