Sunday, July 31, 2011

Notes on Policy Improvement and Controlled Dynamic Systems

See introductory background.

A controlled dynamic system has inputs which can steer the evolution of the state of the system.




The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.




Given a cost of operation for the dynamic system,


\(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\),


a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is


\(V^{\pi}\left(x,\pi\left(x\right)\right)=c\left(x,\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)


One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If


\(c\left(x,u\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,u\right)\right)\leq V^{\pi}\left(x\right)\)


then the choice of \(u\) at \(x\) is an improvement on the policy \(\pi\) and will reduce the operating costs.

Other key ideas:

This work is licensed under a Creative Commons Attribution By license.

No comments:

Post a Comment