Safe Haskell | None |
---|---|
Language | Haskell98 |
This module shows how to solve several example problems using this library.
A discounted problem
We consider the problem defined in Algorithms.MDP.Examples.Ex_3_1; this example comes from Bersekas p. 22.
We will solve this problem using regular value iteration. Having
constructed the MDP, we can do this using the valueIteration
function.
import Algorithms.MDP.Examples.Ex_3_1 import Algorithms.MDP.ValueIteration iterations :: [CF State Control Double] iterations = valueIteration mdp
The iterates returned contain estimates of the cost of being at each state. To see the costs of the state A over the first 10 iterations, we could do
estimates :: [Double] estimates = map (cost A) (take 10 iterations)
A discounted problem with error bounds
We consider the same example as above, but this time we use relative value iteration to compute error bounds on the costs. This will allow us to use fewer iterations to obtain an accurate cost estimate.
Since we have already defined the problem, we do this via the
relativeValueIteration
function.
import Algorithms.MDP.Examples.Ex_3_1 import Algorithms.MDP.ValueIteration iterations :: [CFBounds State Control Double] iterations = relativeValueIteration mdp
The iterates returned contain estimates of the cost of being at each state, along with associated error bounds. To see the costs of the state A over the first 10 iterations adjusted for the error bounds, we could do
estimate state (CFBounds cf lb ub) = (z + lb, z + ub) where z = cost state cf estimates :: [(Double, Double)] estimates = map (estimate A) (take 10 iterations)
Note that the lower- and upper-bounds returned in the first iteration are always +/-Infinity, and so it can be useful to consider only the tail of the iterations.
An average cost problem
We consider the problem defined in Algorithms.MDP.Examples.Ex_3_2; this example comes from Bersekas p. 210.
Here we are interested in computing the long-run average cost of an
undiscounted MDP. For this we use the
undiscountedRelativeValueIteration
function.
import Algorithms.MDP.Examples.Ex_3_2 import Algorithms.MDP.ValueIteration iterations :: [CFBounds State Control Double] iterations = undiscountedRelativeValueIteration mdp
We can compute cost estimates in the same fashion as above.
estimate state (CFBounds cf lb ub) = (lb, ub) estimates :: [(Double, Double)] estimates = map (estimate A) (take 10 iterations)
It is important to note that in this problem the cost function
returned in each CFBounds
object is not to be interpreted as a
vector of costs, but rather as a differential cost vector; however,
the estimates above retrain the same interpretation.
A continuous-time undiscounted problem
We now consider a family of problems described by Sennot p. 248.
Here we are interested in first converting a CTMDP to an MDP via uniformization, and then computing the long-run average cost of the optimal policy.
To begin, we construct one of the scenarios provided (each scenario is
just an instance of the problem with certain parameters). We then
convert the scenario to an MDP using the uniformize
function.
import Algorithms.MDP.Examples.MM1 import Algorithms.MDP.CTMDP import Algorithms.MDP.ValueIteration scenario :: CTMDP State Action Double scenario = mkInstance scenario1 mdp :: MDP State Action Double mdp = uniformize scenario
As above, we can use the undiscountedRelativeValueIteration
function to compute cost estimates.
iterations :: [CFBounds State Action Double] iterations = undiscountedRelativeValueIteration mdp estimate state (CFBounds _ lb ub) = (lb, ub) estimates :: [(Double, Double)] estimates = map (estimate A) (take 10 iterations)