Refresh Rate:
Discount Factor:
Forget what was learned:
# Value Iteration
Above you're seeing a Value Iteration algorithm solve a maze or what's sometimes called a "Grid World" problem. The color of each cell represents the payoff it provides. The explorer is rewarded for visiting green cells, and punished for red cells. The darker the color, the stronger the payoff. You'll also see arrows that indicate the explorer's "policy", which is the decision it makes in each cell. There are two sliders, one for the refresh rate (ie frame rate), and the other for changing the *Discount Factor*. The goal of this maze problem is to maximize the sum of rewards, ∑i=1ri, where ri is the reward at the ith step. That goal is somewhat ill-defined because the summation may go to infinity. To ensure that the summation converges, we include a discount factor, 0 <= γ < 1, so that the goal is to maximize the sum of discounted rewards, ∑i=1γiri. You could see by the ratio test that the summation converges (the ratio will be γ). Let's go through why the discount factor is important, intuitively. - If one explorer got 1 coin every step, and the other got 2, which is preferred? If it weren't for the discount factor, they'd both have a sum of rewards of ∞, meaning that, in a sense, they're kinda the same...weird. However, with a discount factor, their sums are finite. With a discount factor of γ=0.9, the first explorer's sum of *discounted rewards* would be 20, and the second explorer's would be 10. - The discount factor establishes a trade-off between short-term and long-term rewards. If you didn't discount rewards, you'd be indifferent between recieving a reward now or 50 years from now.