As we saw in the post Probabilistic Robotic Reading Note - Partially Observable Markov Decision Processes, the complexity of the calculation of value function is exponential. For any real-world problem, the state space and the belif space are usually large enough to make the naive POMDP implementation impractical.
Chapter 16 of the book describes three approximate POMDP techniques
- QMDP
- AMDP
- MC-POMDP
In this post, we will just copy the algorithm box from the book and add some notes.
Some general notes:
- QMDP is much simpler than AMDP and MC-POMDP because it doesn't take uncertainty of measurem into account.
- ADMP and MC-POMDP are similar in the sense that both of them learn the value function from the simulation.
The effect of the uncertain is best illustrated by the following figure copied from the book. In a non probabilistic world (images on the left), where the state is known and there is no uncertainty, open space is obviously preferred by the planner because the robot is less likely to hit anything and crossing an open space usually produces a shorter path. However, if the state cannot be known with certainty (images on the right), then the robot will have a hard time to localize itself in an open space. It follows that the landmarks, including obstructions, in the environment are more valuable because they provide more information.

QMDP Algorithm

AMDP Algorithm

MC-POMDP Algorithm

Note that a belief is a collection of sampled states (e.g. particles) and the ultimate goal of the algorithm is to assign scores to thoses beliefs so we have
- K beliefs and K scores in the value function representation.
- Each belief consists of M particles.
- Technically, a particle is a snapshot of the map/enviornment.
----- END -----
©2019 - 2022 all rights reserved