The objective of an exploration task is to gain information of the environment as much as possible. As the state of the environment is unknown and number of variables in a unknown environment is huge, applying the POMDP is not practicle. The techniques discussed in the exploration chapter are greedy techniques. More specifically, instead of calculating the best control for a long horizen, the exploration algorithm only looks one step ahead and selects the control that has the best immediate impact.

The basic idea is simple: we should move the robot to a location where it can gain most information about the environment. Of course, all control actions are associated with some cost and it follows that in an exploration problem, the reward function has two parts

Note that here the control \(u\) is a high level concept. It represents high level concept such as moving the robot to a target location instead of detailed control such as making the robot turn left or setting the velocity to 1m/s.

Information gain can be characterized by the change in entropy. In a completely unkonwn state (e.g. uniform distribution), the entropy reaches its maximum. After the exploration, as the robot gains more information of the environment, the entropy will decrease. Hence,

#### Greedy Technique

The greedy technique is simple, we just choose the (high level) action that has the largest reward:

Here is the greedy algorithm:

#### Active Localization

The objective of active localization is to obtain more information about the robot's pose. In this problem, the map is known. The main idea of active localization is to solve the problem in the local coordinate of the robot.

Here are the main steps in the active localization algorithm

- Use initial belief distribution and translate it into robot's local coordinate.
- For exmaple, suppose the initial belief distribution indicates that the robot is most likely at six locations, then we will have six maps constructed using the robots' local coordinate.
- For each local map, use value iteration algorithm to construct a cost map. This map represents the cost of moving the robot to a different cell. (For example, the cost may depend on the length of the shortest path and the probability of running into an obstruction.)
- Calculate information gain maps. We can apply the MC exploration algorithm presented in the previous section to build the information gain maps.
- Construct the reward map by combining the cost map and information gain map and superposing all the local maps.
- Select the optimal control action.

Note that in an active locazliation problem, a control action means moving the robot to a target location and make a measurement.

#### Exploration for learning occupancy grid map

The main idea is to move the robot to the boundary of the known map so that it can gain more inforamtion of the unknown part.

----- END -----

©2019 - 2021 all rights reserved

## Comments