Pages

Sunday, July 31, 2011

Notes on Policy Improvement and Controlled Dynamic Systems

See introductory background.

A controlled dynamic system has inputs which can steer the evolution of the state of the system.

 

\(x_{k+1}=f\left(x_{k},u_{k}\right)\)

(1)

The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.

 

\(x_{k+1}=f\left(x_{k},\pi\left(x_{k}\right)\right)=\widetilde{f}\left(x_{k}\right)\)

(2)

Given a cost of operation for the dynamic system,

 

\(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\),

(3)

a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is

 

\(V^{\pi}\left(x,\pi\left(x\right)\right)=c\left(x,\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

(4)

One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If

 

\(c\left(x,u\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,u\right)\right)\leq V^{\pi}\left(x\right)\)

(5)

then the choice of \(u\) at \(x\) is an improvement on the policy \(\pi\) and will reduce the operating costs.

Other key ideas:

This work is licensed under a Creative Commons Attribution By license.

Notes on Dynamic Systems and the Value Function

 

Dynamic systems can be described by differential and difference equations. Without a loss of generality, consider a dynamic system represented by a difference equation: \(x_{k+1}=f\left(x_{k}\right)\). The state of the system is represented by \(x\) and the function \(f\) is the mapping from one state to the next. One way to characterize a dynamic system is with an additive cost: \(J\). An additive cost summarizes the cost of operation for the system from some initial state, \(x_{0}\). To ensure that the sums are finite an exponential weighting factor, \(alpha\), is introduced. This factor has a value between between 0 and 1.  Under some circumstances, \(alpha\) can be equal to one. One special case is where the system is guaranteed to eventually enter a zero cost state. However, in general, it will need to be less than one. This additive cost is a function of each initial condition:

 

\(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\).

(1)

The value of the additive cost can be solved using the dynamic programming equations:

 

\(V\left(x\right)=c\left(x\right)+\alpha^{k}\cdot V\left(f\left(x\right)\right)\).

(2)

The function \(V\) is referred to as the value function.

This work is licensed under a Creative Commons Attribution By license.

Wednesday, July 27, 2011

Organizational Incentives and Responsibilities: An Observation

In a large organization, properly aligning incentives and responsibilities is always a challenge. However, sometimes it all comes together like a grand plan. Saw a situation today that demonstrated this. A year ago, a particular facility was very undesirable:  old paint, old carpets, intermittent wireless, worn furniture, etc. In a large organization, fixing mundane issues like this can be a royal challenge: forms, approvals, policy, and more. By chance (at least to a distant observer), the machinery of the organization moved people responsible for facilities into the area. One year later… new paint, new carpets, new furniture. The other groups in the area had their productivity and satisfaction improved. Responsibilities met incentives and action was taken.

This work is licensed under a Creative Commons Attribution By license.

Thursday, July 7, 2011

Where to find the Octave Grammar

Problem Statement:

Where is a grammar for Octave?

Discussion:

Octave implements an open source language which is very close to Matlab’s language. Octave is not an exact clone, but is close enough to make conversion of Matlab scripts into Octave relatively easy. Since this grammar is defined and available in an open source form, it can be reused in other projects.

See:

Also answers:

  • Where can I find a Matlab grammar
This work is licensed under a Creative Commons Attribution By license.