The objective of an exploration task is to gain information of the environment as much as possible. As the state of the environment is unknown and number of variables in a unknown environment is huge, applying the POMDP is not practicle. The techniques discussed in the exploration chapter are greedy techniques. More specifically, instead of calculating the best control for a long horizen, the exploration algorithm only looks one step ahead and selects the control that has the best immediate impact.
The basic idea is simple: we should move the robot to a location where it can gain most information about the environment. Of course, all control actions are associated with some cost and it follows that in an exploration problem, the reward function has two parts
Note that here the control \(u\) is a high level concept. It represents high level concept such as moving the robot to a target location instead of detailed control such as making the robot turn left or setting the velocity to 1m/s.
Information gain can be characterized by the change in entropy. In a completely unkonwn state (e.g. uniform distribution), the entropy reaches its maximum. After the exploration, as the robot gains more information of the environment, the entropy will decrease. Hence,
The greedy technique is simple, we just choose the (high level) action that has the largest reward:
Here is the greedy algorithm:
The objective of active localization is to obtain more information about the robot's pose. In this problem, the map is known. The main idea of active localization is to solve the problem in the local coordinate of the robot.
Here are the main steps in the active localization algorithm
- Use initial belief distribution and translate it into robot's local coordinate.
- For exmaple, suppose the initial belief distribution indicates that the robot is most likely at six locations, then we will have six maps constructed using the robots' local coordinate.
Note that in an active locazliation problem, a control action means moving the robot to a target location and make a measurement.
Exploration for learning occupancy grid map
The main idea is to move the robot to the boundary of the known map so that it can gain more inforamtion of the unknown part.
----- END -----
©2019 - 2022 all rights reserved