Processing math: 100%

Sunday, July 31, 2011

Notes on Policy Improvement and Controlled Dynamic Systems

See introductory background.

A controlled dynamic system has inputs which can steer the evolution of the state of the system.

 

xk+1=f(xk,uk)

(1)

The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.

 

xk+1=f(xk,π(xk))=˜f(xk)

(2)

Given a cost of operation for the dynamic system,

 

J(x0)=k=0(αkc(xk)),

(3)

a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is

 

Vπ(x,π(x))=c(x,)+αkVπ(f(x,π(x)))

(4)

One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If

 

c(x,u)+αkVπ(f(x,u))Vπ(x)

(5)

then the choice of u at x is an improvement on the policy π and will reduce the operating costs.

Other key ideas:

This work is licensed under a Creative Commons Attribution By license.

No comments:

Post a Comment