Safe Haskell | None |
---|---|
Language | Haskell98 |
Algorithms.MDP.ValueIteration
Description
This module provides several flavors of the value iteration algorithm for solving MDPs.
- valueIteration :: (Ord t, Num t) => MDP a b t -> [CF a b t]
- relativeValueIteration :: (Read t, Ord t, Fractional t) => MDP a b t -> [CFBounds a b t]
- undiscountedRelativeValueIteration :: (Ord t, Fractional t, Read t) => MDP a b t -> [CFBounds a b t]
- valueIterate :: (Ord t, Num t) => MDP a b t -> CF a b t -> CF a b t
- relativeValueIterate :: (Ord t, Fractional t) => MDP a b t -> CFBounds a b t -> CFBounds a b t
- undiscountedRVI :: (Ord t, Fractional t) => MDP a b t -> Int -> CFBounds a b t -> CFBounds a b t
Value iteration algorithms
Arguments
:: (Ord t, Num t) | |
=> MDP a b t | The MDP to solve |
-> [CF a b t] | An converging sequence of cost functions |
Compute an infinite sequence of estimates of cost functions converging to the true cost function.
This method should only be used on discounted MDPs (e.g. an MDP with a discount factor less than one).
Arguments
:: (Read t, Ord t, Fractional t) | |
=> MDP a b t | The MDP to solve |
-> [CFBounds a b t] | A converging sequence of cost functions. |
An implementation of value iteration that computes monotonic error bounds.
The error bounds provided at each iteration are additive in each
state. That is, given a cost estimate c
for a given state and
lower and upper bounds lb
and ub
, the true cost is guaranteed
to be in the interval [c + lb, c + ub].
undiscountedRelativeValueIteration Source
Arguments
:: (Ord t, Fractional t, Read t) | |
=> MDP a b t | The MDP to solve |
-> [CFBounds a b t] | A converging sequence of cost functions |
Relative value iteration for undiscounted MDPs.
Helper functions for value iteration
Arguments
:: (Ord t, Num t) | |
=> MDP a b t | The MDP to solve |
-> CF a b t | The current cost function estimate |
-> CF a b t | The next cost function estimate |
Computes the next estimate of the cost function.
relativeValueIterate :: (Ord t, Fractional t) => MDP a b t -> CFBounds a b t -> CFBounds a b t Source
Computes the next estimate of the cost function and associated error bounds.
undiscountedRVI :: (Ord t, Fractional t) => MDP a b t -> Int -> CFBounds a b t -> CFBounds a b t Source
Performs a single iterate of relative value iteration for the undiscounted problem.