Planning Algorithms - Discrete Planning

Subscribe Send me a message home page tags


In this post, we will talk about discrete planning algorithms. We first present different formulations of discrete planning problems and then present the value iteration algorithms.

Outline

Problem Formulation

First, let's present the common components in almost every planning problem.

It is important to note that a planning problem is usually specified without explicitly representing the entire state transition graph. It is revealed incrementally in the planning process.

The feasibility problem is to figure out if there exists a plan that can move the agent from start states to the goal states. This is identical to a graph search problem. States in a planning problem are nodes in a graph. To some extent, planning algorithms are search algorithms. A search algorithm must be systematic. There are two versions:

General Forward Search

There are many famous search algorithms. For example:

They all have a similar structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def forwardSearch:
    Q.add(x_start)
    visited.add(x_start)

    while not Q.isEmpty():
        x = G.get()
        if x in targetStates:
            return SUCCESS

        for u in actionsInState(x):
            x_next = f(x, u)

            if not x_next in visited:
                visited.add(x_next)
                Q.add(x_next)
            else:
                processVisitedState(x_next)

    return FAILURE

Forward search algorithms differ at line 6 when it retrieves the next state.

Discrete Optimal Planning

Now we can present the discrete optimal planning problem. The formulation requires additional components:

$$ \Pi_{K} = (u_1, u_2, ..., u_K) $$
$$ L(\pi_{k}) = \sum_{k=1}^{k} l(x_k, u_k) + l_F(x_F) $$
$$ \begin{eqnarray} l_F(x_F) = \begin{cases} 0 & \text{if} \; x_F \in X_G \\ \infty & \text{if} \; x_F \notin X_G \end{cases} \end{eqnarray} $$

Value Iteration for K-Step Plan

We can perform valute iteration in two directions. They are symmetric.

Backward Iteration

It's called backward iteration because \(G_k\) is derived from the values in a future stage \(G_{k+1}\).

$$ G_k^*(x_k) = \min_{u_k, u_{k+1}, ..., u_K} \left\{ \sum_{i=k}^{K} l(x_i, u_i) + l_F(x_F) \right\} $$
$$ G_k^*(x_k) = \min_{u_k} \left\{l(x_k, u_k) + G_{k+1}^*(x_{k+1}) \right\} $$

Forward Iteration

It's called forward iteration becuase value in current stage \(C_k\) is derived from values in previous stage \(C_{k-1}\)

$$ C_{k}^{*}(x_k) = \min_{u_1, ..., u_{k-1}} \left\{ l_I(x_1) + \sum_{i=1}^{k-1} l(x_i, u_i) \right\} $$
$$ C_k^*(x_k) = \min_{u^{-1} \in U^{-1}(x_k)} \left\{ C_{ k-1 }^*(x_{k-1}) + l(x_{k-1}, u_{k-1}) \right\} $$

Optimal Plans of Unspecified Length

We can generalize K-Step plan to an unspecified length plan. Here is the idea: suppose we are doing a backward iteration. When we reach stage 1, there is no particular reason that we cannot continue the process. If an optimal solution exists, at some point, it will be discovered by the algorithm.

The question is how do we know the algorithm can terminate? In general, if there is no negative cycles in the graph, the algorithm should terminate. If there are negative cycles, then the problem itself is malformed because we can always follow a negative cycle and reduce the cost.

Assuming there is no negative cycle and after a certain number of iterations, the optimal solution is discovered by the algorithm. At this point, the system should be stationary: Further iterations do not improve the cost. When the system is stationary, we should have the following equation:

$$ G^*(x) = \min_{u} \left\{ l(x,u) + G^*(f(x,u)) \right\} $$

----- END -----

Welcome to join reddit self-learning community.
Send me a message Subscribe to blog updates

Want some fun stuff?

/static/shopping_demo.png