Orbital library

orbital.algorithm.template
Class RealTimeDynamicProgramming

java.lang.Object
  extended by orbital.algorithm.template.MarkovDecisionProcess
      extended by orbital.algorithm.template.MarkovDecisionProcess.DynamicProgramming
          extended by orbital.algorithm.template.RealTimeDynamicProgramming
All Implemented Interfaces:
java.io.Serializable, AlgorithmicTemplate, EvaluativeAlgorithm, HeuristicAlgorithm

public class RealTimeDynamicProgramming
extends MarkovDecisionProcess.DynamicProgramming
implements HeuristicAlgorithm

Real-Time Dynamic Programming (RTDP).

Real-Time Dynamic Programming is a variant of asynchronous Dynamic Programming performed concurrently with the control process. It uses concurrent value iteration.

If the heuristic function h is admissible h ≤ h*, then the greedy policy v will eventually become optimal after several cycles of repeated trials. If h is good, very large problems can be solved.

RTDP permanently uses a real-time dynamic programming variant of value iteration for the utility function U (alias state-value function V).

U(s) := mina∈A(s) QU(s,a) = mina∈A(s) (c(s,a) + γ*∑t∈S Pa(t|s) * U(t))
depending upon a discount factor γ∈[0,1]. The formula is a dynamic programming update derived from the condition of the Bellman Optimality Equation. The key fact is that a necessary and sufficient condition for a policy π* to be optimal is that the expected costs U*(s) that result from starting in state s∈S and acting according to π* must satisfy a form of the Bellman Optimality Equation:
U(s) = mina∈A(s) QU(s,a) = mina∈A(s) (c(s,a) + γ*∑t∈S Pa(t|s) * U(t))

RTDP can as well be considered a reinforcement learning technique with the costs being a negative reward R(t(s,a)) = -c(s,a) made dependent on both, the state s∈S and action a∈A(s), and the task being to minimize costs U(s) instead of maximize utilities U(s). However, there is a multitude of possibilities of defining the costs in terms of the reward.

RTDP is the stochastic generalization of Learning Real Time Search (LRTA*). For deterministic actions and discounting γ=1 RTDP collapses to LRTA*.

Author:
André Platzer
See Also:
Greedy, DynamicProgrammingProblem, "A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.", "Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.", Serialized Form
Invariants:
getDiscount()∈[0,1]

Nested Class Summary
 
Nested classes/interfaces inherited from class orbital.algorithm.template.MarkovDecisionProcess
MarkovDecisionProcess.DynamicProgramming
 
Nested classes/interfaces inherited from interface orbital.algorithm.template.HeuristicAlgorithm
HeuristicAlgorithm.Configuration, HeuristicAlgorithm.PatternDatabaseHeuristic
 
Nested classes/interfaces inherited from interface orbital.algorithm.template.EvaluativeAlgorithm
EvaluativeAlgorithm.EvaluationComparator
 
Constructor Summary
RealTimeDynamicProgramming(Function heuristic)
           
 
Method Summary
 Function complexity()
          Measure for the asymptotic time complexity of the central solution operation in O-notation.
protected  Function plan()
          Run the planning.
 Function spaceComplexity()
          Measure for the asymptotic space complexity of the central solution operation in O-notation.
 
Methods inherited from class orbital.algorithm.template.MarkovDecisionProcess.DynamicProgramming
createMap, getActionValue, getDiscount, getEvaluation, getGreedyPolicy, getHeuristic, maximumExpectedUtility, setDiscount, setHeuristic
 
Methods inherited from class orbital.algorithm.template.MarkovDecisionProcess
getProblem, solve
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface orbital.algorithm.template.HeuristicAlgorithm
getEvaluation, getHeuristic, setHeuristic
 
Methods inherited from interface orbital.algorithm.template.AlgorithmicTemplate
solve
 

Constructor Detail

RealTimeDynamicProgramming

public RealTimeDynamicProgramming(Function heuristic)
Method Detail

plan

protected Function plan()
Description copied from class: MarkovDecisionProcess
Run the planning.

Specified by:
plan in class MarkovDecisionProcess
Returns:
the solution policy function S→A found, or null if no solution could be found.

complexity

public Function complexity()
Description copied from interface: AlgorithmicTemplate
Measure for the asymptotic time complexity of the central solution operation in O-notation.

Specified by:
complexity in interface AlgorithmicTemplate
Returns:
the function f for which the solve() method of this algorithm runs in O(f(n)) assuming the algorithmic problem hook to run in O(1).
See Also:
AlgorithmicTemplate.solve(AlgorithmicProblem)

spaceComplexity

public Function spaceComplexity()
Description copied from interface: AlgorithmicTemplate
Measure for the asymptotic space complexity of the central solution operation in O-notation.

Specified by:
spaceComplexity in interface AlgorithmicTemplate
Returns:
the function f for which the solve() method of this algorithm consumes memory with an amount in O(f(n)) assuming the algorithmic problem hook uses space in O(1).
See Also:
AlgorithmicTemplate.solve(AlgorithmicProblem)

Orbital library
1.3.0: 11 Apr 2009

Copyright © 1996-2009 André Platzer
All Rights Reserved.