In this post, we will talk about discrete planning algorithms. We first present different formulations of discrete planning problems and then present the value iteration algorithms.
Outline
- Problem Formulation
- General Forward Search
- Discrete Optimal Planning
- Value Iteration for K-Step Plan
- Backward Iteration
- Forward Iteration
- Optimal Plans of Unspecified Length
Problem Formulation
First, let's present the common components in almost every planning problem.
- State Space X: This is a collection of all possible states of the environment.
- Action Space U: This is a collection of all actions that the agent can take.
- Transition
- State Transition Function: \( f(x, u) \)
- State Transition equation: \( x_{k+1} = f(x_k, u) \)
- Start States: The start state of the environment. This specifies the initial condition of the problem.
- Goal States: The goal of the planning problem.
It is important to note that a planning problem is usually specified without explicitly representing the entire state transition graph. It is revealed incrementally in the planning process.
The feasibility problem is to figure out if there exists a plan that can move the agent from start states to the goal states. This is identical to a graph search problem. States in a planning problem are nodes in a graph. To some extent, planning algorithms are search algorithms. A search algorithm must be systematic. There are two versions:
- strong version:
- The search algorithm must reach every state in the state space.
- The search algorithm must terminate.
- weak version: (when space is infinite)
- If a solution exists, then the search algorithm still must report it in finite time
- If a solution does not exist, it is acceptable for the algorithm to search forever.
General Forward Search
There are many famous search algorithms. For example:
- BFS (Breadth-first search)
- DFS (Depth-first search)
- Dijkstra's algorithm
- A* algorithm
- BFS
They all have a similar structure.
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def forwardSearch:
Q.add(x_start)
visited.add(x_start)
while not Q.isEmpty():
x = G.get()
if x in targetStates:
return SUCCESS
for u in actionsInState(x):
x_next = f(x, u)
if not x_next in visited:
visited.add(x_next)
Q.add(x_next)
else:
processVisitedState(x_next)
return FAILURE
Forward search algorithms differ at line 6 when it retrieves the next state.
Discrete Optimal Planning
Now we can present the discrete optimal planning problem. The formulation requires additional components:
- Stage Index: The encode the "time" in the planning solution.
- Cost function
- Cost may need to have monotonicity property
- Navigation Function
- This is a function that will guide the agent to make decisions.
- The navigation function can be used to encode the planning solution.
- Examples:
- Gradient function in optimization problems.
- Heuristic algorithms
- Termination action
- If an agent takes a termination action at any state, it will enter the terminated state in the next stage.
- If an agent is in the terminated state, it can only take termination action.
- Therefore, once an agent takes a termination action, it will be locked in the terminated state and the system becomes stationary.
- K-Step Plan: A K-Step plan consists of \(k\) actions:
- Cost function:
- Taking actions can induce cost.
- Note that we don't require \(l(x_k, u_k)\) to have positive values in this formulation.
Value Iteration for K-Step Plan
We can perform valute iteration in two directions. They are symmetric.
Backward Iteration
It's called backward iteration because \(G_k\) is derived from the values in a future stage \(G_{k+1}\).
Forward Iteration
It's called forward iteration becuase value in current stage \(C_k\) is derived from values in previous stage \(C_{k-1}\)
Optimal Plans of Unspecified Length
We can generalize K-Step plan to an unspecified length plan. Here is the idea: suppose we are doing a backward iteration. When we reach stage 1, there is no particular reason that we cannot continue the process. If an optimal solution exists, at some point, it will be discovered by the algorithm.
The question is how do we know the algorithm can terminate? In general, if there is no negative cycles in the graph, the algorithm should terminate. If there are negative cycles, then the problem itself is malformed because we can always follow a negative cycle and reduce the cost.
Assuming there is no negative cycle and after a certain number of iterations, the optimal solution is discovered by the algorithm. At this point, the system should be stationary: Further iterations do not improve the cost. When the system is stationary, we should have the following equation:
----- END -----
©2019 - 2022 all rights reserved