Temporal-Difference Learning: Combining Dynamic Programming and Monte Carlo Methods for Reinforcement Learning | by Oliver S | Oct, 2024

[ad_1]
We continue our deep dive of Sutton’s book “Reinforcement Learning: An Introduction” [1], and in this post introduce Temporal-Difference (TD) Learning, which is Chapter 6 of said work.
TD learning can be viewed as a combination of Dynamic Programming (DP) and Monte Carlo (MC) methods, which we introduced in the previous two posts, and marks an important milestone in the field of Reinforcement Learning (RL) — combining the strength of aforementioned methods: TD learning does not need a model and learns from experience alone, similar to MC, but also “bootstraps” — uses previously established estimates — similar to DP.
Here, we will introduce this family of methods, both from a theoretical standpoint but also showing relevant practical algorithms, such as Q-learning — accompanied with Python code. As usual, all code can be found on GitHub.
We begin with an introduction and motivation, and then start with the prediction problem — similar to the previous posts. Then, we dive deeper in the theory and discuss which solution TD learning finds. Following that, we move to the control problem, and present a…
[ad_2]