Deterministic Cost Models
Description | Cost Model | Dynamic Programming Equations | Restrictions |
Finite Horizon Total Cost | Jπ(x0)=∑Kk=0αk⋅ck(xk,π(xk)) | Vπk(x)=ck(x,π(x))+α⋅Vπk+1(f(x,π(x))),∀k∈{0,⋯,K−1} VπK(x)=cK(x,π(x)) | 0≤α<1 |
Infinite Horizon Total Cost | Jπ(x0)=∑∞k=0αk⋅c(xk,π(xk)) | Vπ(x)=c(x,π(x))+α⋅Vπ(f(x,π(x))) | 0≤α<1 |
Finite Horizon Shortest Path | Jπ(x0)=∑Kk=0αk⋅ck(xk,π(xk)) | Vπk(x)=ck(x,π(x))+α⋅Vπk+1(f(x,π(x))),∀k∈{0,⋯,K−1} VπK(x)=cK(x,π(x)) | 0≤α≤1 {x∈χ|c(x,π(x))=0}≠{⊘} |
Infinite Horizon Shortest Path | Jπ(x0)=∑∞k=0αk⋅c(xk,π(xk)) | Vπ(x)=c(x,π(x))+α⋅Vπ(f(x,π(x))) | 0≤α≤1 {x∈χ|c(x,π(x))=0}≠{⊘} |
Average Cost | Jπ(x0)=limK→∞1K∑Kk=0αk⋅c(xk,π(xk)) | Vπ(x)+λ=c(x,π(x))+Vπ(f(x,π(x))) | 0≤α<1 Vπ(xref)=0 for some xref∈χ |
Stochastic Cost Models
Description | Cost Model | Dynamic Programming Equations | Restrictions |
Finite Horizon Total Cost | Jπ(x0)=EW[∑Kk=0αk⋅ck(xk,π(xk),w)] | Vπk(x)=EW[ck(x,π(x),w)+α⋅Vπk+1(f(x,π(x),w))] VπK(x)=EW[cK(x,π(x))] | 0≤α<1 |
Infinite Horizon Total Cost | Jπ(x0)=EW[∑∞k=0αk⋅c(xk,π(xk),w)] | Vπ(x)=EW[c(x,π(x),w)+α⋅Vπ(f(x,π(x),w))] | 0≤α<1 |
Finite Horizon Shortest Path | Jπ(x0)=EW[∑Kk=0αk⋅ck(xk,π(xk),w)] | Vπk(x)=EW[ck(x,π(x),w)+α⋅Vπk+1(f(x,π(x),w))] VπK(x)=EW[cK(x,π(x))] | 0≤α≤1 {x∈χ|c(x,π(x))=0}≠{⊘} |
Infinite Horizon Shortest Path | Jπ(x0)=EW[∑∞k=0αk⋅c(xk,π(xk),w)] | Vπ(x)=EW[c(x,π(x),w)+α⋅Vπ(f(x,π(x),w))] | 0≤α≤1 {x∈χ|c(x,π(x))=0}≠{⊘} |
Average Cost | Jπ(x0)=EW[limK→∞1K∑Kk=0αk⋅c(xk,π(xk),w)] | Vπ(x)+λ=E[c(x,π(x),w)+Vπ(f(x,π(x),w))] | 0≤α<1 Vπ(xref)=0 for some xref∈χ |
Risk Aware/Averse Stochastic Cost Models
Description | Cost Model | Dynamic Programming Equations | Restrictions |
Certainty Equivalence with exponential utility | Jπ(x0)=lim supK→∞1K⋅1γ⋅ln(EW[exp(∑K−1k=0c(x,π(x),w))]) | ||
Mean-Variance | |||
Cost Models That don’t work or have issues
Description | Cost Model | Issues |
Expected exponential disutility | Jπ(x0)=lim supK→∞1K⋅EW[sgn(γ)⋅exp(γ⋅∑K−1k=0c(x,π(x),w))] | Does not discriminate among policies |
Different version of expected exponential disutility | Jπ(x0)=lim supK→∞1γ⋅log(EW[exp(γ⋅γK∑K−1k=0c(x,π(x),w))]) | Generally reduces to cost average |
References
- B. Deourny, D. Ernst, and L. Wehenkel, Risk-Aware Decision Making and Dynamic Programming
- J. Harney and P. Doshi, Risk-Sensitive Querying for Adaptive Service Compositions
- G. Avila-Godoy, Modularity Results and Risk Sensitive Controller Markov Chains: A Case Study
- R. Cavazos-Cadena and R. Montes-De-Oca, Optimal Stationary Policies in Risk Sensitive Dynamic Programs with Finite State Space and Non-negative Rewards, Application Mathematicae, 27, 2 (2000), pp 167-185.
- A. Brau and E. Fernandez-Gaucherand, Controlled Markov Chains with Risk-Sensitive Exponential Average Criteria, Proceedings of 36th Conference on Decision and Control, San Diego, Calif, Dec 1997.
- M. Koenig and J. Meissner, Risk Minimizing Strategies for Revenue Mangement Problems with Target Values.
- M. Koenig and J. Meissner, Value at Risk Optimal Policies for RM Problems
No comments:
Post a Comment