A controlled dynamic system has inputs which can steer the evolution of the state of the system.
\(x_{k+1}=f\left(x_{k},u_{k}\right)\) | (1) |
The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.
\(x_{k+1}=f\left(x_{k},\pi\left(x_{k}\right)\right)=\widetilde{f}\left(x_{k}\right)\) | (2) |
Given a cost of operation for the dynamic system,
\(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\), | (3) |
a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is
\(V^{\pi}\left(x,\pi\left(x\right)\right)=c\left(x,\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\) | (4) |
One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If
\(c\left(x,u\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,u\right)\right)\leq V^{\pi}\left(x\right)\) | (5) |
then the choice of \(u\) at \(x\) is an improvement on the policy \(\pi\) and will reduce the operating costs.
Other key ideas:
- Markov Decision Problems (MDPs) are controlled dynamic systems.