Ex Numerus: Notes on Policy Improvement and Controlled Dynamic Systems

Sunday, July 31, 2011

Notes on Policy Improvement and Controlled Dynamic Systems

A controlled dynamic system has inputs which can steer the evolution of the state of the system.

\(x_{k+1}=f\left(x_{k},u_{k}\right)\)

(1)

The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.

\(x_{k+1}=f\left(x_{k},\pi\left(x_{k}\right)\right)=\widetilde{f}\left(x_{k}\right)\)

(2)

Given a cost of operation for the dynamic system,

\(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\),

(3)

a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is

\(V^{\pi}\left(x,\pi\left(x\right)\right)=c\left(x,\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

(4)

One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If

\(c\left(x,u\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,u\right)\right)\leq V^{\pi}\left(x\right)\)

(5)

then the choice of \(u\) at \(x\) is an improvement on the policy \(\pi\) and will reduce the operating costs.

Other key ideas:

Markov Decision Problems (MDPs) are controlled dynamic systems.

This work is licensed under a Creative Commons Attribution By license.

Ex Numerus

Sunday, July 31, 2011

Notes on Policy Improvement and Controlled Dynamic Systems

Other key ideas:

No comments:

Post a Comment

Contact Info

Search This Blog

Blog Archive

Pages

Labels

Development Tools

Tool Links

Visualization Tools

Other Links

Followers

About Me

Rendering

Ex Numerus

Sunday, July 31, 2011

Notes on Policy Improvement and Controlled Dynamic Systems

Other key ideas:

No comments:

Post a Comment

Contact Info

Subscribe To

Search This Blog

Blog Archive

Pages

Labels

Development Tools

Tool Links

Visualization Tools

Other Links

Followers

About Me

Rendering