|
Orbital library | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
A
- the type of transition actions.S
- the type of transition states.M
- the class representing transitions.public interface MarkovDecisionProblem
Hook class for MarkovDecisionProcess algorithm. Objects implementing this interface represent a Markov decision process model.
A Markov decision problem (MDP) for a Markov decision process is a mathematical model for making sense of some classes of problems. An MDP is a special kind of sequential decision problem. It is a combinatorical optimization problem as long as the state space is discrete. It is characterized by
transition model
with a transition relation.
Where the transition relation satisfies the Markov property
for states,
action costs
c(s,a)>0 for taking the action a∈A(s) in the state s∈S.
If the immediate costs are bounded random numbers depending on state and action, then
c(s,a) denotes the expected immediate cost, instead.
Also note that there is no strict requirement for c(s,a)>0. Instead there are several
criterions that ensure convergence even for c(s,a)∈[0,∞).
However, they have even more restrictive constraints, like finite state sets (see Barto et al, 1995).
These state and action dependent costs ensure that the utility function is separable.
goal states
. More often this is implied by an accessible description of the goal states.A Partially Observable Markov decision problem (POMDP) is one kind of a Markov decision problem with incomplete information which are an extension to MDPs. POMDPs do not rely on an accessible environment, but work on an inaccessible environment where some percepts might not be enough to determine the state, and thus we have incomplete information about the state. Utilitarian ideas have been around in artificial intelligence for a long time. For ideas of a non-observing agent see (Lem 1971).
For POMDPs, the Markov property does not hold for percepts, but only for states. In the absence of feedback, a POMDP is a deterministic search in belief state space. In the presence of feedback, a POMDP is an MDP over belief space (Astrom, 1965). However, a single belief state is a probability distribution over physical states, then. In fact, solving a POMDP can be quite complex.
MarkovDecisionProcess
,
Hector Geffner. Modelling and Problem Solving,
"D. P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs, NJ, 1989.",
"S. Ross. Introduction to Stochastic Dynamic Programming. Academic Press, New York, 1983.",
"A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.",
"Stanisław Lem. Doktor Diagoras in: Sterntagebücher, suhrkamp p491, 1978. (original edition 1971)"Nested Class Summary | |
---|---|
static class |
MarkovDecisionProblem.DefaultTransition
Default implementation of transition options for Markov decision processes. |
static interface |
MarkovDecisionProblem.Transition
Represents a transition option during a Markov decision process. |
Method Summary | |
---|---|
boolean |
isSolution(java.lang.Object state)
Check whether the given state is a goal state (a valid solution to the problem). |
Methods inherited from interface orbital.algorithm.template.TransitionModel |
---|
actions, states, transition |
Method Detail |
---|
boolean isSolution(java.lang.Object state)
Optional variation: Search algorithms generally seek out for one single solution. In order to find all solutions to a problem, simply let this method store solutions and return false instead, until enough solutions have occurred. However, expanding solution nodes should result in an empty list at some time to ensure termination, then.
state
- the state s∈S to check for being a goal state.
|
Orbital library 1.3.0: 11 Apr 2009 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |