written 8.5 years ago by |
Bellman’s principle of optimality: An optimal policy (set of decisions) has the property that whatever the initial state and decisions are, the remaining decisions must constitute and optimal policy with regard to the state resulting from the first decision.
Mathematically, this can be written as:
$f_N(x) = max. [r(d_n) +f_N-1{T(x ,d_n)}] d_n∈ {x}$
where $f_N(x) = \text{the optimal return from an N-stage process when initial state is x} \\ r(d_n) = \text{immediate return due to decision} d_n \\ T(x ,d_n) = \text{the transfer function which gives the resulting state} \\ \{x\} = \text{set of admissible decisions}$
This equation is also known as a dynamic programming equation. It represents a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices. This breaks a dynamic optimization problem into simpler subproblems.