## Sunday, July 31, 2011

### Notes on Policy Improvement and Controlled Dynamic Systems

A controlled dynamic system has inputs which can steer the evolution of the state of the system.

 $$x_{k+1}=f\left(x_{k},u_{k}\right)$$ (1)

The inputs to the dynamic system can be determined by a policy, $$\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.  \(x_{k+1}=f\left(x_{k},\pi\left(x_{k}\right)\right)=\widetilde{f}\left(x_{k}\right)$$ (2)

Given a cost of operation for the dynamic system,

 $$J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)$$, (3)

a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is

 $$V^{\pi}\left(x,\pi\left(x\right)\right)=c\left(x,\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)$$ (4)

One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If

 $$c\left(x,u\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,u\right)\right)\leq V^{\pi}\left(x\right)$$ (5)

then the choice of $$u$$ at $$x$$ is an improvement on the policy $$\pi$$ and will reduce the operating costs.