|
Orbital library | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object orbital.algorithm.template.MarkovDecisionProcess orbital.algorithm.template.MarkovDecisionProcess.DynamicProgramming
public abstract static class MarkovDecisionProcess.DynamicProgramming
Abstract base class for Markov decision processes solved per dynamic programming.
DynamicProgramming
,
"A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.",
"Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.",
Serialized FormNested Class Summary |
---|
Nested classes/interfaces inherited from class orbital.algorithm.template.MarkovDecisionProcess |
---|
MarkovDecisionProcess.DynamicProgramming |
Nested classes/interfaces inherited from interface orbital.algorithm.template.HeuristicAlgorithm |
---|
HeuristicAlgorithm.Configuration, HeuristicAlgorithm.PatternDatabaseHeuristic |
Nested classes/interfaces inherited from interface orbital.algorithm.template.EvaluativeAlgorithm |
---|
EvaluativeAlgorithm.EvaluationComparator |
Constructor Summary | |
---|---|
MarkovDecisionProcess.DynamicProgramming(Function heuristic)
|
|
MarkovDecisionProcess.DynamicProgramming(Function heuristic,
double gamma)
Deprecated. convenience constructor, prefer to use ValueFactory.valueOf(double) .. |
|
MarkovDecisionProcess.DynamicProgramming(Function heuristic,
Real gamma)
|
Method Summary | |
---|---|
protected MutableFunction |
createMap()
Create a mapping. |
protected BinaryFunction |
getActionValue(Function U)
Get the action-value cost function of an action and state. |
Real |
getDiscount()
Get the discount factor γ. |
Function |
getEvaluation()
f(s) = h(s). |
protected Function |
getGreedyPolicy(BinaryFunction Q)
Get a greedy policy with respect to an action-value cost function Q. |
Function |
getHeuristic()
Get the heuristic function used. |
protected Pair |
maximumExpectedUtility(BinaryFunction Q,
java.lang.Object state)
Calculate the maximum expected utility (MEU) action. |
void |
setDiscount(Real gamma)
Set the discount factor γ. |
void |
setHeuristic(Function heuristic)
Set the heuristic function to use. |
Methods inherited from class orbital.algorithm.template.MarkovDecisionProcess |
---|
getProblem, plan, solve |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface orbital.algorithm.template.AlgorithmicTemplate |
---|
complexity, solve, spaceComplexity |
Constructor Detail |
---|
public MarkovDecisionProcess.DynamicProgramming(Function heuristic, Real gamma)
heuristic
- the heuristic function to use.gamma
- The discount factor γ describes, how much immediate results are
preferred over future results.setHeuristic(Function)
,
setDiscount(Real)
public MarkovDecisionProcess.DynamicProgramming(Function heuristic, double gamma)
ValueFactory.valueOf(double)
..
public MarkovDecisionProcess.DynamicProgramming(Function heuristic)
Method Detail |
---|
public void setDiscount(Real gamma)
gamma
- The discount factor γ describes, how much immediate results are
preferred over future results.
The higher the factor, the more balanced preference, the lower the factor, the more
preference is taken for immediate results.
For γ=0, immediate costs are considered, only.
For γ=1, the undiscounted case, additional assumptions are required to produce
a well-defined decision problem and ensure convergence.public Real getDiscount()
public Function getHeuristic()
HeuristicAlgorithm
getHeuristic
in interface HeuristicAlgorithm
public void setHeuristic(Function heuristic)
Note that the new heuristic function will only apply to unknown future states
for bootstrapping.
States that have already been estimated with the old heuristic function will not be updated.
Nevertheless its always safe to set the heuristic function immediately
before a call to MarkovDecisionProcess.plan()
.
setHeuristic
in interface HeuristicAlgorithm
heuristic
- the heuristic cost function h:S→R estimating h*.
h will be embedded in the evaluation function f
.protected MutableFunction createMap()
Overwrite to implement another lookup table than hash maps. f.ex. neural networks etc. However, beware of implicit function approximization and generalization techniques for U, that might disturb the convergence of RTDP.
public Function getEvaluation()
getEvaluation
in interface EvaluativeAlgorithm
getEvaluation
in interface HeuristicAlgorithm
protected Pair maximumExpectedUtility(BinaryFunction Q, java.lang.Object state)
Q
- the action-value cost function Q:S×A(s)→R
evaluating the expected utility of actions in states.state
- the state s∈S in which to take an action.
protected BinaryFunction getActionValue(Function U)
U
- The evaluation function U:S→R mapping states to expected cost sum.
protected Function getGreedyPolicy(BinaryFunction Q)
Q
- an action-value cost function Q:S×A(s)→R.
Greedy
,
getActionValue(Function)
|
Orbital library 1.3.0: 11 Apr 2009 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |