Safe Haskell | None |
---|---|
Language | Haskell98 |
This module provides several flavors of the value iteration algorithm for solving MDPs.
- valueIteration :: (Ord t, Num t) => MDP a b t -> [CF a b t]
- relativeValueIteration :: (Read t, Ord t, Fractional t) => MDP a b t -> [CFBounds a b t]
- undiscountedRelativeValueIteration :: (Ord t, Fractional t, Read t) => MDP a b t -> [CFBounds a b t]
- valueIterate :: (Ord t, Num t) => MDP a b t -> CF a b t -> CF a b t
- relativeValueIterate :: (Ord t, Fractional t) => MDP a b t -> CFBounds a b t -> CFBounds a b t
- undiscountedRVI :: (Ord t, Fractional t) => MDP a b t -> Int -> CFBounds a b t -> CFBounds a b t
Value iteration algorithms
Compute an infinite sequence of estimates of cost functions converging to the true cost function.
This method should only be used on discounted MDPs (e.g. an MDP with a discount factor less than one).
relativeValueIteration Source #
:: (Read t, Ord t, Fractional t) | |
=> MDP a b t | The MDP to solve |
-> [CFBounds a b t] | A converging sequence of cost functions. |
An implementation of value iteration that computes monotonic error bounds.
The error bounds provided at each iteration are additive in each
state. That is, given a cost estimate c
for a given state and
lower and upper bounds lb
and ub
, the true cost is guaranteed
to be in the interval [c + lb, c + ub].
undiscountedRelativeValueIteration Source #
:: (Ord t, Fractional t, Read t) | |
=> MDP a b t | The MDP to solve |
-> [CFBounds a b t] | A converging sequence of cost functions |
Relative value iteration for undiscounted MDPs.
Helper functions for value iteration
:: (Ord t, Num t) | |
=> MDP a b t | The MDP to solve |
-> CF a b t | The current cost function estimate |
-> CF a b t | The next cost function estimate |
Computes the next estimate of the cost function.
relativeValueIterate :: (Ord t, Fractional t) => MDP a b t -> CFBounds a b t -> CFBounds a b t Source #
Computes the next estimate of the cost function and associated error bounds.
undiscountedRVI :: (Ord t, Fractional t) => MDP a b t -> Int -> CFBounds a b t -> CFBounds a b t Source #
Performs a single iterate of relative value iteration for the undiscounted problem.