A controlled dynamic system has inputs which can steer the evolution of the state of the system.
xk+1=f(xk,uk) | (1) |
The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.
xk+1=f(xk,π(xk))=˜f(xk) | (2) |
Given a cost of operation for the dynamic system,
J(x0)=∑∞k=0(αk⋅c(xk)), | (3) |
a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is
Vπ(x,π(x))=c(x,)+αk⋅Vπ(f(x,π(x))) | (4) |
One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If
c(x,u)+αk⋅Vπ(f(x,u))≤Vπ(x) | (5) |
then the choice of u at x is an improvement on the policy π and will reduce the operating costs.
Other key ideas:
- Markov Decision Problems (MDPs) are controlled dynamic systems.