mdp-0.1.0.0: Tools for solving Markov Decision Processes.

Safe HaskellNone
LanguageHaskell98

Algorithms.MDP.ValueIteration

Contents

Description

This module provides several flavors of the value iteration algorithm for solving MDPs.

Synopsis

Value iteration algorithms

valueIteration Source

Arguments

:: (Ord t, Num t) 
=> MDP a b t

The MDP to solve

-> [CF a b t]

An converging sequence of cost functions

Compute an infinite sequence of estimates of cost functions converging to the true cost function.

This method should only be used on discounted MDPs (e.g. an MDP with a discount factor less than one).

relativeValueIteration Source

Arguments

:: (Read t, Ord t, Fractional t) 
=> MDP a b t

The MDP to solve

-> [CFBounds a b t]

A converging sequence of cost functions.

An implementation of value iteration that computes monotonic error bounds.

The error bounds provided at each iteration are additive in each state. That is, given a cost estimate c for a given state and lower and upper bounds lb and ub, the true cost is guaranteed to be in the interval [c + lb, c + ub].

undiscountedRelativeValueIteration Source

Arguments

:: (Ord t, Fractional t, Read t) 
=> MDP a b t

The MDP to solve

-> [CFBounds a b t]

A converging sequence of cost functions

Relative value iteration for undiscounted MDPs.

Helper functions for value iteration

valueIterate Source

Arguments

:: (Ord t, Num t) 
=> MDP a b t

The MDP to solve

-> CF a b t

The current cost function estimate

-> CF a b t

The next cost function estimate

Computes the next estimate of the cost function.

relativeValueIterate :: (Ord t, Fractional t) => MDP a b t -> CFBounds a b t -> CFBounds a b t Source

Computes the next estimate of the cost function and associated error bounds.

undiscountedRVI :: (Ord t, Fractional t) => MDP a b t -> Int -> CFBounds a b t -> CFBounds a b t Source

Performs a single iterate of relative value iteration for the undiscounted problem.