MarkovDecisionProblem (Orbital)

A Markov decision problem (MDP) for a Markov decision process is a mathematical model for making sense of some classes of problems. An MDP is a special kind of sequential decision problem. It is a combinatorical optimization problem as long as the state space is discrete. It is characterized by

a transition model with a transition relation. Where the transition relation satisfies the Markov property for states, P(S_t+1=s'|S_t=s,A_t=a,S_t-1,A_t-1,…,S₀,A₀) = P(S_t+1=s'|S_t=s,A_t=a) i.e. the transition depends only on the current state s∈S and the action a∈A(s) taken, and is independent from the previous history. (Note that for infinite state spaces, this is not a true restriction.) This transition model defines a stochastic process called Markov process, or - in case of a finite state space - Markov chain.
immediate action costs c(s,a)>0 for taking the action a∈A(s) in the state s∈S. If the immediate costs are bounded random numbers depending on state and action, then c(s,a) denotes the expected immediate cost, instead. Also note that there is no strict requirement for c(s,a)>0. Instead there are several criterions that ensure convergence even for c(s,a)∈[0,∞). However, they have even more restrictive constraints, like finite state sets (see Barto et al, 1995). These state and action dependent costs ensure that the utility function is separable.
a set G⊆S of goal states. More often this is implied by an accessible description of the goal states.

Its solution is a policy π:S→A(s) telling which action to take in each state. A solution π is optimal if it has minimum expected cost.

A Partially Observable Markov decision problem (POMDP) is one kind of a Markov decision problem with incomplete information which are an extension to MDPs. POMDPs do not rely on an accessible environment, but work on an inaccessible environment where some percepts might not be enough to determine the state, and thus we have incomplete information about the state. Utilitarian ideas have been around in artificial intelligence for a long time. For ideas of a non-observing agent see (Lem 1971).

For POMDPs, the Markov property does not hold for percepts, but only for states. In the absence of feedback, a POMDP is a deterministic search in belief state space. In the presence of feedback, a POMDP is an MDP over belief space (Astrom, 1965). However, a single belief state is a probability distribution over physical states, then. In fact, solving a POMDP can be quite complex.

Author:: André Platzer
See Also:: MarkovDecisionProcess, Hector Geffner. Modelling and Problem Solving, "D. P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs, NJ, 1989.", "S. Ross. Introduction to Stochastic Dynamic Programming. Academic Press, New York, 1983.", "A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.", "Stanisław Lem. Doktor Diagoras in: Sterntagebücher, suhrkamp p491, 1978. (original edition 1971)"

Nested Class Summary
`static class`	`MarkovDecisionProblem.DefaultTransition` Default implementation of transition options for Markov decision processes.
`static interface`	`MarkovDecisionProblem.Transition` Represents a transition option during a Markov decision process.

Method Summary
`boolean`	`isSolution(java.lang.Object state)` Check whether the given state is a goal state (a valid solution to the problem).

Methods inherited from interface orbital.algorithm.template.TransitionModel
`actions, states, transition`

Method Detail

isSolution

boolean isSolution(java.lang.Object state)

Check whether the given state is a goal state (a valid solution to the problem).

Optional variation: Search algorithms generally seek out for one single solution. In order to find all solutions to a problem, simply let this method store solutions and return false instead, until enough solutions have occurred. However, expanding solution nodes should result in an empty list at some time to ensure termination, then.

Parameters:: state - the state s∈S to check for being a goal state.
Returns:: G(s), resp. whether s∈G.
Preconditions:: s∈S

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Orbital library
1.3.0: 11 Apr 2009

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

orbital.algorithm.template Interface MarkovDecisionProblem

isSolution

orbital.algorithm.template
Interface MarkovDecisionProblem