CS440 Lectures

CS 440/ECE 448
Margaret Fleck

Automated (classical) Planning

Making toast

We'd like to end up with a toasted and buttered slice of bread. So our goal is something like holding(slice) AND toasted(slice) AND buttered(slice). The individual conjuncts, e.g. toasted(slice), are the main subgoals.

To get toasted(slice) AND buttered(slice), we might decide we need to do this:

Heat(slice,toaster).
Spread-on(butter,slice)

But each of these actions requires some set-up:

Put-in(slice,toaster)
Heat(slice,toaster).
Take-out(slice,toaster)
Spread-on(butter,slice)

But, we can't do any of this without first getting the slice of bread and the butter. So before anything else we should do

Open(fridge)
Open(breadbag)
Take-out(slice,breadbag)
Take-out(butter,fridge)

Here's a more-or-less final plan:

Make-Open(breadbag) --> achieves open(breadbag)
Take-out(slice,breadbag) ---> requires open(breadbag), achieves holding(slice)
Make-Open(fridge) --> achieves open(fridge)
Take-out(butter,fridge) ---> requires open(fridge), achieves holding(butter)
Put-in(slice,toaster) --> requires holding(slice), achieves in(slice,toaster)
Heat(slice,toaster) --> requires slice in toaster, achieves toasted(slice)
Take-out(slice, toaster) --> achieves holding(slice)
Put-on(slice,plate) --> requires holding(slice)
Spread(butter,slice) --> requires slice on plate, holding(butter), achieves buttered(toast)

Upgraded representation of the world

Plan construction is yet another application of search, but this time with a more powerful representation of states and actions. In previous search problems (e.g. constraint-satisfaction puzzles), our world was represented by a small set of variables bound to concrete values such as numbers. E.g.

Q1 = 3 (The queen in column 1 is in row 3.)

In classical planning, features of the world are encoded using predicates that take objects as parameters:

open(fridge)
in(slice,toaster)

The state of the world is represented by a logical combination of predicates (e.g. AND, OR, NOT)

open(fridge) AND NOT open(breadbag)

Upgraded representation of actions

In our previous search examples, each action set or reset the value of one variable, e.g. set Q1 to be 3. Actions now take objects as parameters, e.g.

Take-out(butter,fridge)

Each action has two sets of predicates:

Preconditions: logical conditions required for an action to happen
Effects (postconditions): changes that the action makes to the state of the world

So the action take-out(butter,fridge) might have

Preconditions: in(butter,fridge), is-open(fridge)
Effects: holding(butter)

The preconditions and effects determine how actions can fit together into a coherent plan. They also encode some very limited real-world knowledge about the meaning of each action. This representation depends on a tacit assumption: if it's not mentioned in the effects, it didn't change.

A classical planning problem

A classical planning world consists of

A set of objects
States defined by logical combinations of predicates, e.g. open(fridge) AND NOT open(breadbag)
Actions with
- objects as arguments, e.g. Take-out(butter,fridge)
- preconditions and effects

Our task is to plan a sequence of actions that gets us from a specified starting state to a state matching some goal conditions.

History and applications

Classical planning gets its name from the fact that it goes back to the early days of AI. A simple version was popularized by the STRIPS planner, created by Richard Fikes and Nils Nilsson 1971 at SRI. (STRIPS is short for Stanford Research Institute Problem Solver.)

These days, a planner would most likely be used to manage the high-level goals and constraints of a complex planning problem. For example, it might set the main way points for a mobile robot or a robot doing assembly/disassembly tasks. A neural net or specialized robotics code would then be used to fill in the details. Scenario planning assistants can help humans work through likely possibilities e.g. for an emergency response.

A planner can also be used to generate stories. These may be freestanding stories (e.g. like a novel). More often, these systems also include human interaction. A planner might be paired with neural net technology to handle low-level parts of the task, e.g. interpreting natural language input and generating fluent output.

Storytelling for entertainment, e.g. Automated storytelling projects by Mark Riedl, GA Tech.
Randomized plots and non-player characters for computer games
Personalization and randomization of training scenarios (e.g. military exercises, online ethics training)

Neural nets can do a good job of filling in details, e.g. generating sentences given a sketch of what they should say. But they aren't (at least so far) good at keeping the high-level plot under control. E.g. see this article on automatically-generated Harry Potter stories.

A closely-related problem is story understanding. To understand stories written for a human audience, it helps to have models of what typically happens in them. The observation goes back to Roger Schank's work in the 1970's and 80's. E.g.

In news reports, common events can be summarized in terms of standard features. E.g. an earthquake has a location, a date, death toll, major aid agencies assisting.
Everyday actions often follow "scripts" that have only a few variations, e.g. the main events of having a meal in a restaurant.

Understanding a story often requires filling in missing details that are obvious to humans. E.g. two people leave a store and the cashier runs out after them. Why? Well maybe to return a forgotten credit card. A human knows the payment script well enough to realize that it involves returning the card.

Finally, planners can help manage dialogs with a human. These can be be for serious purposes such as booking an airplane flight. Or they can be for entertainment, e.g. a chatbot. Again, a neural net may be handling the low-level parts of the process but has trouble managing high-level continuity and hard constraints (e.g. enough time to get through customs).

A classical planner would typically be used when the task (or key parts of it) are well specified and not too complex. (Complex parts of the task should be offloaded onto some other method.) Expert knowledge should be more readily available than training data. A strength is that explanations can be provided for decisions, which is important for some applications more than others.

Situation space planning

The most obvious approach to classical planning is "situation space planning". In this method

Each state is a complete model of the world.
Each edge is an action that changes the state of the world.

Suppose that we're using situation space planning to make a cake. Our starting state might look like

start: have(flour), not(have(eggs)), number-of(spoons,3), not(have(cakepan)), ...

What action should we take first? Perhaps we should go buy the eggs. Then the start of our plan might look like:

start: have(flour), not(have(eggs)), number-of(spoons,3), not(have(cakepan )), ...
next: have(flour), have(eggs), number-of(spoons,3), not(have(cakepan)), ...

Or we might buy the cakepan next, in which case the first two states would be:

start: have(flour), not(have(eggs)), number-of(spoons,3), not(have(cakepan )), ...
next: have(flour), not(have(eggs)), number-of(spoons,3), have(cakepan), ...

Problems with situation space planning

In any realistic planning problem, there's many actions that could be taken in each state. So the search space has a huge branching factor. Also, the representation is unnecessarily specific about the ordering of unrelated actions. Here's one variation on our toast example:

Open(fridge)
Open(breadbag)
Take-out(slice,breadbag)
Take-out(butter,fridge)

But this ordering works just as well.

Open(breadbag)
Take-out(slice,breadbag)
Open(fridge)
Take-out(butter,fridge)

Any planner will eventually need to pin down the order of all actions, so that it can execute the plan. However, it makes sense to make unimportant ordering decisions as late as possible.

Some ideas that don't quite work

It's tempting to try to tweak the details of situation space planning, hoping to find ways around its limitations. We could try working backwards from the desired goal state. Or we could try exploiting the hierarchical structure of many real-world problems by defining a sequence of high-level goals, each one of which can be expanded into smaller subgoals. These tricks work on some examples (see the appendix) but don't address the general limitations of the sitaution space approach.

Sussman anomaly

The Sussman anomaly (Gerry Sussman, 1975) is a famous example of the hierarchical idea failing. The setting is a simple "blocks world" in which blocks can be put on an (infinite) table or stacked on top of one another. The robot can move only one block at a time. Suppose we have the following starting state and goal:

                                
    Start state            C    
                     B     A

            A
    goal    B
            C

The intended solution is

move(C,table)
move(B,C)
move(A,B)

Let's try to do this hierarchically. Our goal can be divided into two subgoals: on(A,B) and on(B,C). Let's first deal with one subgoal and then deal with the other.

Attempt #1: Suppose we try to deal with on(A,B) first. To do this, we clear the block A (so we can pick it up) and then put it on top of B.

            C                                A
       B    A    ===>    B  C  A   ===>      B   C

Now we turn to goal on(B,C). But we can't put B onto C without unpiling A and B. So we're going to have to take the whole thing apart and we've wasted all our previous work putting A onto B.

Attempt #2: Suppose we try to deal with on(B,C) first. So we make the following move:

                                B
            C        ====>      C
      B     A                   A

Now we turn to goal on(A,B). But we can't put A onto B without unpiling all three blocks, thus undoing on(B,C). So, again, we have to take the whole tower apart and start from scratch.

So strictly hierarchical planning doesn't work. We need a more flexible approach to planning. For example, expand what's required to meet subgoals, then try to order all the little tasks. So you can "interleave" sub-tasks from multiple high-level goals.

Plan space (partial order) planning

So far, we've used a "situation space" approach to planning, in which our search space looks like:

Each state is a complete model of the world.
Each edge is an action that changes the state of the world.

Searching in situation space has a massive branching factor. Patches such as backwards planning and hierarchical planning don't properly address this problem.

When looking at the edit distance problem near the start of the term, we built a much more efficient solution by replacing a search through complete matches with a search through partial matches. A similar idea is used in "plan space" planning (also called "partial order" planning). Our search space now looks like:

Each state is a partial plan, perhaps missing some actions or ordering links.
Each edge adds detail to the partial plan.

Specifically, in plan space planning, each node in the search graph contains a partial plan. A partial plan includes

actions
orderings between actions

This reformulation of the planning problem gives us a search problem with a more manageable set of options at each step.

Plan space example

For example, suppose that we're creating a plan to make tea. One state in our search, i.e. one partial plan, might look like this:

                start--------------
                 |                |
                 V                V
            heat(water,kettle   put-in(teabag,teapot)
                 |                |
                 V                |
                finish <-----------

This partial plan has some of the actions in place but isn't complete.

The initial state in a partial order planner has only a start and a finish state, with start ordered before finish. It looks like this:

Start and finish are special actions:

"start": no preconditions, its effects are the information that is true at start of the plan
"finish": no effects, preconditions are what needs to be true at the end of the plan

The preconditions and effects are invisible in this diagram. For our tea example, they might be

effects of start:
- cold(water)
- not(in(teabag,teapot))
- not(tasty(water)
- not(in(water,teapot))
pre-conditions of finish:
- tasty(water)
- not(in(teabag,teapot))

Each edge in the planning graph adds more actions and/or ordering constraints. Finish has a precondition tasty(water). We might try to satisfy this using the steep action:

                start
                 |   
                 V    
             steep(teabag,water,teapot)
                 |      
                 V      
                finish

But steeping requires that the water be hot and everything put in the teapot. And then we need to also ensure that the teabag is out of the water at the end. So we add more actions and orderings until we reach a complete plan.

                        start
                     /         \
                    /           \
                   V             V
        put-in(water,kettle)    put-in(teabag,teapot)
                   |                |
                   V                |
        heat(water,kettle)          |
               |                    |
               V                    | 
           put-in(water,teapot)     |
                   |                |
                   V                V
                  steep(teabag,water,teapot)
                        |
                        V
                  remove(teabag,teapot)
                        |
                        V
                      finish

Notice that our final plan is only a partial order. We might need to add some extra ordering links to actually execute it. Also notice that our path from start to finish may require undoing some properties that we have made true. For example, the teabag is put into the teapot and then later removed.

Also notice that the open preconditions are driving our selection of new actions to consider for the evolving plan. There may be a very large set of possible actions in our planning world, but we only need to consider a small subset of them.

Improving incomplete plans

A key idea in partial order planning is to add actions and orderings of actions only when they are required to ensure that the plan will succeed. The output plan is typically a partial order, in which some pairs of actions are left unordered. When the plan is executed, these pairs can be scheduled in either order or done in parallel. In our example, our final plan leaves several choices for when to put the teabag into the teapot.

What could be wrong with a partial plan? There are two basic problems for the planner to detect and resolve:

"open precondition": an action (including finish) has a precondition with no guarantee that it will be true when the action is executed
"threat": P is true at some point, but an (unordered) action might cause not(P)

For example, suppose that our partial plan looks as below.

                       start
                     /        \
                    /          \
                   V            V
           heat(water,kettle)    put-in(water,teapot)
                   |                |
                   V                V
                  steep(teabag,water,teapot)
                        |
                        V
                      finish

The action heat(water,kettle) has an open precondition, i.e. that the water needs to be in the kettle. We can fix this by adding an extra action:

                       start
                     /        \
                    /          \
                   V            \
        put-in(water,kettle)     \
                   |              \  
                   V               V
        heat(water,kettle)    put-in(water,teapot)
                   |                |
                   V                V
                  steep(teabag,water,teapot)
                        |
                        V
                      finish

Now we have an action in place that can make sure that the water is in the kettle before we try to heat it. However, as our plan stands right now, this is possible but not guaranteed. Another parallel action, putting the water into the teapot, could "clobber" that precondition if we were do put-in(water,teapot) before heat(water,kettle). We can fix this threat by adding an ordering link between the two actions.

                       start
                         |
                         V
                put-in(water,kettle)
                         |
                         V
                heat(water,kettle)    
                         |
                         V
                put-in(water,teapot)
                         |
                         V
                steep(teabag,water,teapot)
                        |
                        V
                      finish

In general, we have the following set of plan modification operations

To fix an open precondition
- add an ordering link to an existing action
- add a new action

To fix X threatening to "clobber" Y's preconditions
- order X after Y
- order Y after X (leaving Y with an explicitly open precondition)

It's reasonable to expect that a partial plan has only a small set of immediately visible defects. Although our knowledge base may contain a large number of actions, it's reasonable to hope that only a small set would be relevant to filling in an open prerequisite. So our search graph has a much smaller fan-out (typical number of successors to each state) than the search graphs in situation space.

Not as smart as it looks

This style of representation makes our planner look smarter than it is. If we write "spread(butter,slice)", we've implied that the computer has some ability to recognize spreadinb butter on a slice of bread, or doing that action itself. That connection would have to be made by other parts of our AI system, e.g. the image recognition system and the low-level robotics system. This task is known as "grounding" our high-level representations. Planning systems frequently aren't grounded and, when they are, the grounding is a lot less general and robust than one might hope.

Extending our representation

The representation of objects in classical planning leaves much to be desired. Extensions of this framework are beyond the scope of this class. But it's worth thinking about some of the issues. They are often important to applications (consider games as well as cooking). And they continue to be problems even when classical planning is combined with more modern technology.

First, we need ways to represent type hierarchies for objects, e.g. pie is a type of desert and deserts are a type of food. For example, washing a tablecloth should use the same action as washing a napkin. Eating an apple is similar to, but not identical to, eating rice. So perhaps they should inherit some of their representation from a shared ancestor. Then we might define actions that can be used on any object from a broad range of types.

Some object properties should be called out as predicates. In a kitchen scenario, many types of object are movable (e.g. eggs, kettle) but some are not (e.g. the counters) and some (e.g. the stove) should be considered non-movable for most routine planning. However, it's not immediately obvious when we should create an explicit predicate (e.g. movable) vs. having rules that target only certain types of objects.

A related problem is how to allow multiple objects of the same type (e.g. two pots on the stove) in the same planning problem (e.g. one holds the potatoes and one holds the peas). When we allow this feature, then we can talk about counting objects of the same type (e.g. ten energy bars) and perhaps using a quantifier to refer to a group of similar objects "move all the dumplings into the steamer".

Extending our representation: processes, substances, fluents

Simple planning examples use objects that have a natural shape, named by "count nouns" in English. Notice that the water in our tea-making example was essentially a single chunk of water being moved around, rather than a proper liquid. An important set of representational issues involve substances that are measured rather than handled individually. These include

liquids (e.g. milk)
finely divided substances such as flour and rice

In English, "mass nouns" are used to refer to these sorts of substances. Substances can sometimes be formed into shapes, e.g. a ball of clay.

Similarly, simple planning models treat actions as inducing changes in state that are either instantaneous or could be approximated as instantaneous. However, many real-world actions extend for a significant period of time. Researchers modelling human language traditionally classify actions into three types

an activity (a pattern of change in state over time),
a state change (nominally instantaneous),
an accomplishment (an activity ending in, and causing, a state change).

Some examples of the different types would be

activities: swim, drive a car, shred documents,
state changes: reach the finish line, find a free terminal, pass an exam,
accomplishments: build a house, do a problem set, make a pot of coffee.

Specialized "temporal planning" algorithms are required to these different types of actions and how they might interact in time.

A more subtle challenge is presented by properties that decay over time, e.g. milk left in the fridge eventually spoils, a cat is unlikely to remain in a room because it will try to escape. Not only must these processes be represented but they also give rise to actions that maintain properties which would otherwise decay (e.g. keep the cat out of the house).

Finally, our representations can be extended to allow "fluents," which are variables whose values are numbers. These could be used to measure properties such as temperature, also to measure the lengths of actions and the quantities of substances.

Harder environments

A final issue is that planning may need to work in less tightly managed environments. These might include:

Incomplete knowledge of environment (e.g. via sensors)
Obstacles, other actors might move/change
What if actions fail?
How long before I have to finish planning and act?

All of these require specialized planning methods. In particular, "contingent" planners try to cope with incomplete knowledge of the world and include sensing actions.

Some of these challenges may be best handled by a combination of a classical planning and an adaptive technology such as reinforcement learning. But it's not always clear how to make them work well together.

Better Algorithms

In general, classical planning can get very difficult. Consider the Towers of Hanoi puzzle. (Also see demo.) Solutions to this puzzle have length exponential in the number of disks. When modelling realistic situations, planning constraints and background knowledge (e.g. action pre- and post-conditions) can get very large. So it's essential to exploit all available domain constraints. For practical tasks (e.g. robot assembly, face recognition) folks often engineer the task/environment to make the AI problem doable. This often involves using models that approximate the world rather than describing it exactly, just as a chemist mixing two chemicals would use only high-level knowledge (e.g. mixing X and Y will cause an explosion) rather than drilling down into the underlying physics.

One approach to fast planning is to convert the planning problem to a logic satisfiability (SA) problem, and then call a specialized SAT solver ( Satplan, 1992). A faster algorithm is the GraphPlan algorithm. Details beyond the scope of this course. See the paper by Blum and Furst (1997) ( on the web). Graphplan uses a more complex planning graph with alternating layers of facts and actions. Edges link either facts to actions (preconditions), or actions to facts (effects).

Planning can also be made faster by using heuristic search, e.g. choosing the best partial plan to improve by estimating the distance from each partial plan to a complete plan. This is a place where neural nets can be used to help, because they can learn to produce heuristics from poorly understood sets of features.

Representations can be more succinct if our planner can spell out the logical consequences of a change in state and/or the relationships among predicats. For example, how are cold and hot related? If the teabag is in the pot, then it's not in my hand, because a (specific) teabag cannot be in two places at once. A "Truth Maintenance System" tries to support this sort of knowledge. Notice that more succint representations of domain knowledge are more likely be correct, because there are fewer places to make errors.

More sophisticated representations of actions and objects are likely to make it much harder to write efficient planning algorithms.

There is also recent work on learning STRIPS-style representations from observations (which may be noisy and/or incomplete). A recent example is Mourao et al 2012 paper (also at Xarchiv).

Finally, here's a nice recent tutorial by Sohrabi, Katz and Udrea at AAAI 2022.

Appendix

Forward and backward planning

We saw that situation space planning can get into trouble because there's too many possible actions at each step. We might try applying backwards or bidirctional search. For example:

start: typical student apartment
goal #1: make dinner

It's sensible to see what we can make with available ingredients. (Not ramen AGAIN!!!) But suppose we switch to

start: typical student apartment
goal #2: make a birthday cake

We probably don't have the ingredients. So we should start by checking the recipe, so we can plan what to go out and buy.

In planning problems, the goal is often very specific compared to possible successors of the starting state. So it makes sense to do backwards or bidirectional planning. However, we'll see a different approach (partial order planning) that makes a more significant difference.

Hierarchical Planning

Hierarchical planning is another useful idea that doesn't work as well as you might hope. The idea here is that planning problems often have a natural arrangement of tasks into main goals and subgoals. In a complex problem, there might be a multi-layer hierarchy of tasks.

For example, you might sketch out the overall plan for buying groceries:

Make a list
Drive to the store
Buy everything on the list
Drive home
Put everything away

However, actually doing each of these tasks requires quite a bit more detail. E.g. buying the items might require

Enter the store
Get a cart
Go through the aisles picking up items
Wait for a cashier
Pay for the items
Go to the car
Unload the cart
Return the cart

And "Go through the aisles picking up items" might translate into

Buy paper towels
Buy pasta
.....

And obviously we could go into even more detail. At the lowest levels, e.g. moving the car around the produce displays, we might wish to switch control over to a neural net or a reinforcement learning model. But some of these details (e.g. making sure we bought everything on the list) might be best handled symbolically. Hierarchical decomposition can help keep a complex planning task doable.

PDDL

Over the years, the STRIPS language has been extended to include some of the above features. The result is PDDL "Plan Domain Definition Language" (1998 through 2014). Here's the definition of a simple world in PDDL (from Wikipedia)

(define (domain gripper-strips)
  (:predicates (room ?r) (ball ?b) (gripper ?g) (at-robby ?r)
               (at ?b ?r) (free ?g) (carry ?o ?g))
  (:action move
   :parameters (?from ?to)
   :precondition (and (room ?from)
                      (room ?to)
                      (at-robby ?from))
   :effect (and (at-robby ?to)
                (not (at-robby ?from))))
  (:action pick
   :parameters (?obj ?room ?gripper)
   :precondition (and (ball ?obj)
                      (room ?room)
                      (gripper ?gripper)
                      (at ?obj ?room)
                      (at-robby ?room)
                      (free ?gripper))
   :effect (and (carry ?obj ?gripper)
                (not (at ?obj ?room))
                (not (free ?gripper))))
  (:action drop
   :parameters (?obj ?room ?gripper)
   :precondition (and (ball ?obj)
                      (room ?room)
                      (gripper ?gripper)
                      (carry ?obj ?gripper)
                      (at-robby ?room))
   :effect (and (at ?obj ?room)
                (free ?gripper)
                (not (carry ?obj ?gripper)))))

And here is a planning problem for that world:

(define (problem strips-gripper2)
    (:domain gripper-strips)
    (:objects rooma roomb ball1 ball2 left right)
    (:init (room rooma)
           (room roomb)
           (ball ball1)
           (ball ball2)
           (gripper left)
           (gripper right)
           (at-robby rooma)
           (free left)
           (free right)
           (at ball1 rooma)
           (at ball2 rooma))
    (:goal (at ball1 roomb)))