Temporal-Difference Learning: Combining Dynamic Programming and Monte Carlo Methods for Reinforcement Learning | by Oliver S | Oct, 2024

0

[ad_1]

Milestones of RL: Q-Learning and Double Q-Learning

We continue our deep dive of Sutton’s book “Reinforcement Learning: An Introduction” [1], and in this post introduce Temporal-Difference (TD) Learning, which is Chapter 6 of said work.

TD learning can be viewed as a combination of Dynamic Programming (DP) and Monte Carlo (MC) methods, which we introduced in the previous two posts, and marks an important milestone in the field of Reinforcement Learning (RL) — combining the strength of aforementioned methods: TD learning does not need a model and learns from experience alone, similar to MC, but also “bootstraps” — uses previously established estimates — similar to DP.

Photo by Brooke Campbell on Unsplash

Here, we will introduce this family of methods, both from a theoretical standpoint but also showing relevant practical algorithms, such as Q-learning — accompanied with Python code. As usual, all code can be found on GitHub.

We begin with an introduction and motivation, and then start with the prediction problem — similar to the previous posts. Then, we dive deeper in the theory and discuss which solution TD learning finds. Following that, we move to the control problem, and present a…

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *