Model-free reinforcement learning part three: digging into Q-learning, SARSA, double Q-learning more practically
Data Science Blog
MARCH 26, 2023
That might sound odd if you have patiently read my articles so far because I was repeatedly explaining that RL and DP interactively update values and policies , which is called generalized policy iteration (GPI). And I also roughly explained that the convergence comes from a mathematical fact derived from stochastic approximation.
Let's personalize your content