Pages

Monday, August 1, 2011

Draft: Notes on dynamic programming equations which solve cost models for dynamic systems

 

Deterministic Cost Models

Description

Cost Model

Dynamic Programming Equations

Restrictions

Finite Horizon Total Cost

\(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)\)

\( V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\),\(\forall k\in\left\{ 0,\cdots,K-1\right\}  \)

\(V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)\)

\(0\leq\alpha<1\)

Infinite Horizon Total Cost

\(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

\(V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

\(0\leq\alpha<1\)

Finite Horizon Shortest Path

\(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)\)

\( V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\),\(\forall k\in\left\{ 0,\cdots,K-1\right\} \)

\(V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)\)

\(0\leq\alpha\leq1\)

\(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

Infinite Horizon Shortest Path

\(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

\(V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

\(0\leq\alpha\leq1\)

\(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

Average Cost

\(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\)

\(V^{\pi}\left(x\right)+\lambda=c\left(x,\pi\left(x\right)\right)+V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

\(0\leq\alpha<1\)

\(V^{\pi}\left(x_{ref}\right)=0\) for some \(x_{ref}\in\chi\)

 

Stochastic Cost Models

Description

Cost Model

Dynamic Programming Equations

Restrictions

Finite Horizon Total Cost

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]\)

\(0\leq\alpha<1\)

Infinite Horizon Total Cost

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(0\leq\alpha<1\)

Finite Horizon Shortest Path

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]\)

\(0\leq\alpha\leq1\)

\(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

Infinite Horizon Shortest Path

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(0\leq\alpha\leq1\)

\(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \)

Average Cost

\(J^{\pi}\left(x_{0}\right)=E^{W}\left[\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\)

\(V^{\pi}\left(x\right)+\lambda=E\left[c\left(x,\pi\left(x\right),w\right)+V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\)

\(0\leq\alpha<1\)

\(V^{\pi}\left(x_{ref}\right)=0\) for some \(x_{ref}\in\chi\)

 

Risk Aware/Averse Stochastic Cost Models

Description

Cost Model

Dynamic Programming Equations

Restrictions

Certainty Equivalence with exponential utility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot\frac{1}{\gamma}\cdot\ln\left(E^{W}\left[\exp\left(\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)\)    
Mean-Variance      
       
       
       

 

Cost Models That don’t work or have issues

Description

Cost Model

Issues

Expected exponential disutility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot E^{W}\left[\textrm{sgn}\left(\gamma\right)\cdot\exp\left(\gamma\cdot\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\) Does not discriminate among policies
Different version of expected exponential disutility \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{\gamma}\cdot\log\left(E^{W}\left[\exp\left(\gamma\cdot\frac{\gamma}{K}\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)\) Generally reduces to cost average
     
     
     

 

References

This work is licensed under a Creative Commons Attribution By license.

No comments:

Post a Comment