Showing posts with label Dynamic Programming. Show all posts
Showing posts with label Dynamic Programming. Show all posts

Monday, August 1, 2011

Draft: Notes on dynamic programming equations which solve cost models for dynamic systems

 

Deterministic Cost Models

Description

Cost Model

Dynamic Programming Equations

Restrictions

Finite Horizon Total Cost

\(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)\)

\( V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\),\(\forall k\in\left\{ 0,\cdots,K-1\right\}  \)

\(V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)\)

\(0\leq\alpha<1\)

Infinite Horizon Total Cost

\(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

\(V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

\(0\leq\alpha<1\)

Finite Horizon Shortest Path

\(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)\)

\( V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\),\(\forall k\in\left\{ 0,\cdots,K-1\right\} \)

\(V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)\)

\(0\leq\alpha\leq1\)

\(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

Infinite Horizon Shortest Path

\(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

\(V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

\(0\leq\alpha\leq1\)

\(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

Average Cost

\(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

\(V^{\pi}\left(x\right)+\lambda=c\left(x,\pi\left(x\right)\right)+V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

\(0\leq\alpha<1\)

\(V^{\pi}\left(x_{ref}\right)=0\) for some \(x_{ref}\in\chi\)

 

Stochastic Cost Models

Description

Cost Model

Dynamic Programming Equations

Restrictions

Finite Horizon Total Cost

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]\)

\(0\leq\alpha<1\)

Infinite Horizon Total Cost

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(0\leq\alpha<1\)

Finite Horizon Shortest Path

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]\)

\(0\leq\alpha\leq1\)

\(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

Infinite Horizon Shortest Path

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(0\leq\alpha\leq1\)

\(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

Average Cost

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V^{\pi}\left(x\right)+\lambda=E\left[c\left(x,\pi\left(x\right),w\right)+V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(0\leq\alpha<1\)

\(V^{\pi}\left(x_{ref}\right)=0\) for some \(x_{ref}\in\chi\)

 

Risk Aware/Averse Stochastic Cost Models

Description

Cost Model

Dynamic Programming Equations

Restrictions

Certainty Equivalence with exponential utility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot\frac{1}{\gamma}\cdot\ln\left(E^{W}\left[\exp\left(\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)\)    
Mean-Variance      
       
       
       

 

Cost Models That don’t work or have issues

Description

Cost Model

Issues

Expected exponential disutility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot E^{W}\left[\textrm{sgn}\left(\gamma\right)\cdot\exp\left(\gamma\cdot\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\) Does not discriminate among policies
Different version of expected exponential disutility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{\gamma}\cdot\log\left(E^{W}\left[\exp\left(\gamma\cdot\frac{\gamma}{K}\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)\) Generally reduces to cost average
     
     
     

 

References

This work is licensed under a Creative Commons Attribution By license.

Sunday, July 31, 2011

Notes on Policy Improvement and Controlled Dynamic Systems

See introductory background.

A controlled dynamic system has inputs which can steer the evolution of the state of the system.

 

\(x_{k+1}=f\left(x_{k},u_{k}\right)\)

(1)

The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.

 

\(x_{k+1}=f\left(x_{k},\pi\left(x_{k}\right)\right)=\widetilde{f}\left(x_{k}\right)\)

(2)

Given a cost of operation for the dynamic system,

 

\(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\),

(3)

a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is

 

\(V^{\pi}\left(x,\pi\left(x\right)\right)=c\left(x,\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

(4)

One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If

 

\(c\left(x,u\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,u\right)\right)\leq V^{\pi}\left(x\right)\)

(5)

then the choice of \(u\) at \(x\) is an improvement on the policy \(\pi\) and will reduce the operating costs.

Other key ideas:

This work is licensed under a Creative Commons Attribution By license.

Notes on Dynamic Systems and the Value Function

 

Dynamic systems can be described by differential and difference equations. Without a loss of generality, consider a dynamic system represented by a difference equation: \(x_{k+1}=f\left(x_{k}\right)\). The state of the system is represented by \(x\) and the function \(f\) is the mapping from one state to the next. One way to characterize a dynamic system is with an additive cost: \(J\). An additive cost summarizes the cost of operation for the system from some initial state, \(x_{0}\). To ensure that the sums are finite an exponential weighting factor, \(alpha\), is introduced. This factor has a value between between 0 and 1.  Under some circumstances, \(alpha\) can be equal to one. One special case is where the system is guaranteed to eventually enter a zero cost state. However, in general, it will need to be less than one. This additive cost is a function of each initial condition:

 

\(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\).

(1)

The value of the additive cost can be solved using the dynamic programming equations:

 

\(V\left(x\right)=c\left(x\right)+\alpha^{k}\cdot V\left(f\left(x\right)\right)\).

(2)

The function \(V\) is referred to as the value function.

This work is licensed under a Creative Commons Attribution By license.