Deterministic Cost Models
Description | Cost Model | Dynamic Programming Equations | Restrictions |
Finite Horizon Total Cost | \(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)\) | \( V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\),\(\forall k\in\left\{ 0,\cdots,K-1\right\} \) \(V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)\) | \(0\leq\alpha<1\) |
Infinite Horizon Total Cost | \(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\) | \(V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\) | \(0\leq\alpha<1\) |
Finite Horizon Shortest Path | \(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)\) | \( V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\),\(\forall k\in\left\{ 0,\cdots,K-1\right\} \) \(V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)\) | \(0\leq\alpha\leq1\) \(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \) |
Infinite Horizon Shortest Path | \(J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\) | \(V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\) | \(0\leq\alpha\leq1\) \(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \) |
Average Cost | \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)\) | \(V^{\pi}\left(x\right)+\lambda=c\left(x,\pi\left(x\right)\right)+V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\) | \(0\leq\alpha<1\) \(V^{\pi}\left(x_{ref}\right)=0\) for some \(x_{ref}\in\chi\) |
Stochastic Cost Models
Description | Cost Model | Dynamic Programming Equations | Restrictions |
Finite Horizon Total Cost | \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\) | \(V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\) \(V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]\) | \(0\leq\alpha<1\) |
Infinite Horizon Total Cost | \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\) | \(V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\) | \(0\leq\alpha<1\) |
Finite Horizon Shortest Path | \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\) | \(V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\) \(V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]\) | \(0\leq\alpha\leq1\) \(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \) |
Infinite Horizon Shortest Path | \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\) | \(V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\) | \(0\leq\alpha\leq1\) \(\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\} \) |
Average Cost | \(J^{\pi}\left(x_{0}\right)=E^{W}\left[\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]\) | \(V^{\pi}\left(x\right)+\lambda=E\left[c\left(x,\pi\left(x\right),w\right)+V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]\) | \(0\leq\alpha<1\) \(V^{\pi}\left(x_{ref}\right)=0\) for some \(x_{ref}\in\chi\) |
Risk Aware/Averse Stochastic Cost Models
Description | Cost Model | Dynamic Programming Equations | Restrictions |
Certainty Equivalence with exponential utility | \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot\frac{1}{\gamma}\cdot\ln\left(E^{W}\left[\exp\left(\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)\) | ||
Mean-Variance | |||
Cost Models That don’t work or have issues
Description | Cost Model | Issues |
Expected exponential disutility | \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot E^{W}\left[\textrm{sgn}\left(\gamma\right)\cdot\exp\left(\gamma\cdot\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\) | Does not discriminate among policies |
Different version of expected exponential disutility | \(J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{\gamma}\cdot\log\left(E^{W}\left[\exp\left(\gamma\cdot\frac{\gamma}{K}\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)\) | Generally reduces to cost average |
References
- B. Deourny, D. Ernst, and L. Wehenkel, Risk-Aware Decision Making and Dynamic Programming
- J. Harney and P. Doshi, Risk-Sensitive Querying for Adaptive Service Compositions
- G. Avila-Godoy, Modularity Results and Risk Sensitive Controller Markov Chains: A Case Study
- R. Cavazos-Cadena and R. Montes-De-Oca, Optimal Stationary Policies in Risk Sensitive Dynamic Programs with Finite State Space and Non-negative Rewards, Application Mathematicae, 27, 2 (2000), pp 167-185.
- A. Brau and E. Fernandez-Gaucherand, Controlled Markov Chains with Risk-Sensitive Exponential Average Criteria, Proceedings of 36th Conference on Decision and Control, San Diego, Calif, Dec 1997.
- M. Koenig and J. Meissner, Risk Minimizing Strategies for Revenue Mangement Problems with Target Values.
- M. Koenig and J. Meissner, Value at Risk Optimal Policies for RM Problems
No comments:
Post a Comment