0
443views
Write short note on Temporal Difference Learning.
1 Answer
written 16 months ago by |
Solution:
Temporal-Difference (TD) Learning:
a combination of DP and MC methods
updates estimates based on other learned estimates (i.e., bootstraps), (as DP methods) does not require a model; learns from raw experience as MC methods.
constitutes a basis for reinforcement learning.
Convergence to $\mathrm{V}^\pi$ is guaranteed (asymptotically as in MC …