You are on page 1of 5

A Framework for Planning in Continuous-time Stochastic Domains

(extended abstract)
Håkan L. S. Younes
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA 15213, U.S.A.
lorens@cs.cmu.edu

Abstract stochastic logic (CSL) (Aziz et al. 2000; Baier, Katoen,


& Hermanns 1999) as a formalism for expressing tempo-
We propose a framework for policy generation in continuous-
rally extended goals in continuous-time domains. Recent
time stochastic domains with concurrent actions and events of
uncertain duration. We make no assumptions regarding the advances in probabilistic verification have resulted in effi-
complexity of the domain dynamics, and our planning algo- cient algorithms for verifying CSL properties of continuous-
rithm can be used to generate policies for any discrete event time stochastic systems (Baier, Katoen, & Hermanns 1999;
system that can be simulated. We use the continuous stochas- Infante López, Hermanns, & Katoen 2001; Younes & Sim-
tic logic (CSL) as a formalism for expressing temporally ex- mons 2002), but these results have not, to our best knowl-
tended probabilistic goals and have developed a probabilistic edge, been used to aid probabilistic plan generation. The
anytime algorithm for verifying plans in our framework. We work by Younes & Simmons is based on statistical sam-
also present an efficient procedure for comparing two plans pling techniques, and therefore handles any model that can
that can be used in a hill-climbing search for a goal-satisfying be simulated—in particular discrete event systems. In this
plan.
paper, we present an anytime (Dean & Boddy 1988) ver-
sion of that algorithm for a relevant subset of CSL. We also
Introduction present an efficient sampling-based algorithm for compar-
The problem of planning under uncertainty has been ad- ing two plans that can be used in a hill-climbing search for
dressed by researchers in operations research, artificial in- a satisfactory plan. The simulation traces generated during
telligence (AI), and control theory, among others. Numer- plan verification are used to guide plan repair. We recog-
ous approaches have been proposed, but as Bresina et al. nize the need for good search control in order to make the
(2002) recently pointed out, current methods for planning planning algorithm practical. Initial work on heuristics for
under uncertainty cannot adequately handle planning prob- guiding plan repair is described in the full-length version of
lems with either concurrent actions and events or uncertainty this paper presented at the main conference.
in action durations and consumption of continuous resources
(e.g. power). In this paper we focus on domains with con- Planning Framework
current events and actions with uncertain duration/delay. We We now present a general framework for probabilistic plan-
present a framework for planning in such domains that falls ning in stochastic domains with concurrent actions and
into the Generate, Test and Debug (GTD) paradigm pro- events. We will use a variation of the transportation domain
posed by Simmons (1988). developed by Blythe (1994) as an illustrative example. The
We limit our attention to planning domains that can be objective of the example problem is to transport a package
modeled as discrete event systems. The state of such sys- from CMU in Pittsburgh to Honeywell in Minneapolis, and
tems changes only at discrete time instants, which could be with probability at least 0.9 have the package arrive within
either at the occurrence of an event or the execution of an ac- five hours without losing it on the way.
tion. Many man-made dynamic systems (e.g. production or The package can be transported between the two cities by
assembly lines, air traffic control systems, and robotic sys- airplane, and between two locations within the same city by
tems) can be realistically modeled as discrete event systems. taxi. There is one taxi in each city. The Pittsburgh taxi is ini-
We present a general algorithm for generating (partial) sta- tially at CMU, while the Minneapolis taxi is at the airport.
tionary policies for discrete event systems, only requiring There is one airplane available, and it is initially at the Pitts-
that we can generate sample execution paths for the systems. burgh airport. Given these initial conditions, Figure 1 gives
The sample execution paths can be generated through dis- a sequence of actions that if executed can take the package
crete event simulation. from CMU to Honeywell.
Drummond & Bresina (1990) recognize the need for The plan in Figure 1 may fail, however, due to the ef-
maintenance and prevention goals in realistic planning prob- fects of exogenous events and uncertainty in the outcome
lems, in addition to the traditional planning goals of achieve- of planned actions. The Minneapolis taxi may be used by
ment. We embrace their view, and adopt the continuous other customers while the package is being transported to
load-taxi(pgh-taxi, CMU) sents a possible sample execution path for the transportation
drive(pgh-taxi, CMU, pgh-airport) domain.
unload-taxi(pgh-taxi, pgh-airport) Sample execution paths come into play when verifying
load-airplane(plane, pgh-airport) and repairing plans. For now, we make no assumptions
fly(plane, pgh-airport, msp-airport) about the underlying dynamical model of a discrete event
unload-airplane(plane, msp-airport) system other than that we can generate sample execution
load-taxi(msp-taxi, msp-airport) paths for the system through discrete event simulation. In
drive(msp-taxi, msp-airport, Honeywell) particular, we do not assume that the system is Markovian.
unload-taxi(msp-taxi, Honeywell) We present a general algorithm for planning with discrete
event systems using only the information contained in sam-
Figure 1: Initial plan for transporting a package from CMU ple execution paths.
to Honeywell.
Problem Specification
A planning problem for a given discrete event system is an
Minneapolis, meaning that the taxi may not be at the airport
initial state s0 (or possibly a distribution p0 (s) over states)
when the package arrives there. The package may be lost at
and a goal condition φ. We propose CSL as a formalism for
an airport if it remains there for too long without being se-
specifying goal conditions. CSL—inspired by CTL (Clarke,
curely stored. The plane may already be full when we arrive
Emerson, & Sistla 1986) and its extensions to continuous-
at the airport unless we have made a reservation. Finally,
time systems (Alur, Courcoubetis, & Dill 1993)—adopts
there is uncertainty in the duration of actions (e.g. drive and
probabilistic path quantification from PCTL (Hansson &
fly) and the timing of exogenous events.
Jonsson 1994).
The syntax of CSL is defined as
Discrete Event Systems
φ ::= a ¬φ φ ∧ φ Pr≥θ (φ U ≤t φ)

The planning domain introduced above can be modeled as a
discrete event system. A discrete event system, M, consists
of a set of states S and a set of events E. At any point in where p ∈ [0, 1], t > 0, and a is an atomic proposition.
time, the system occupies some state s ∈ S. The system The standard logic operators have the usual semantics. A
remains in a state s until the occurrence of an event e ∈ E, probabilistic formula, Pr≥θ (φ1 U ≤t φ2 ) holds in a state s
at which point the system instantaneously transitions to a iff the probability of the set of paths starting in s and sat-
state s′ (possibly the same state as s). We divide the set isfying φ1 U ≤t φ2 is at least θ. The formula φ1 U ≤t φ2
of events into two disjoint sets Ea and Ee , E = Ea ∪ Ee , holds over σ iff φ2 becomes true in some state si along σ
where Ea is the set of actions (or controllable events) and Ee before more than t time units have passed and φ1 is true
is the set of exogenous events. We assume that Ea always in all states prior to si along σ. In this paper we will con-
contains a null-action ǫ, representing idleness. A policy, π, centrate on CSL formulas of the form φ = Pr≥θ (ρ) where ρ
for a discrete event system is a mapping from situations to does not contain any probabilistic statements. While the ver-
actions. ification algorithm presented by Younes & Simmons (2002)
Returning to the example problem, we can represent the can handle nested probabilistic quantification and conjunc-
possibility of a taxi moving without us in it by two exoge- tive probabilistic statements, it is not immediately obvious
nous events: move-taxi and return-taxi. Examples of actions how to compare plans if we allow such constructs in goal
are given in the plan in Figure 1 (e.g. load-taxi and drive). conditions.
We can express the goal condition of the example problem
A discrete event system , M, controlled by a policy, π, is
as the CSL formula
a stochastic process, denoted M[π]. The execution history
of M[π] is captured by a sample execution path, which is a φ = Pr≥0.9 (¬lost(pkg) U ≤300 at(pkg, Honeywell)).
sequence
The problem is then to find a plan such that φ holds in the
t0 ,e0 t1 ,e1 t2 ,e2 initial state s0 . The solution is a policy mapping situations
σ = s0 −→ s1 −→ s2 −→ . . .
to actions. Next, we present techniques for generating and
with si ∈ S, ei ∈ E, and ti > 0 being the time spent in repairing stationary policies, π : S → Ea .
state si before event ei triggered a transition to state si+1 .
We call ti the holding time for si . Planning Algorithm
Consider the example problem again. Say that the ac- Algorithm 1 shows a generic procedure, F IND -P LAN, for
tion load-taxi(pgh-taxi, CMU) triggers in the initial state af- probabilistic planning based on the GTD paradigm. The pro-
ter 1 minute. The holding time for the initial state is 1 in cedure takes a discrete event system M, an initial state s0 ,
this case. The triggering of the load-taxi action takes us a CSL goal condition φ, and an initial plan π0 . The initial
to a state where the package is in the Pittsburgh taxi. Say plan can be generated by an efficient deterministic planner,
that in this state, the event move-taxi(msp-taxi) triggers after ignoring any uncertainty, or it can be a null-plan mapping
2.6 minutes. We are now in a state where the Minneapolis all states to the null-action ǫ. If the initial plan is given as a
taxi is moving, and the holding time for the previous state sequence of events (as in Figure 1), then we derive a station-
is 2.6. This sequence of states and triggering events repre- ary policy by simulating the execution of the event sequence
and mapping s to action a whenever a is executed in s. In sequential probability ratio test were to be terminated after
the latter case we have a pure transformational planner (cf. n samples and d positive samples have been seen.
Simmons 1988). The planning algorithm is specified as an Given n samples and d positive samples, we would accept
anytime algorithm that can be stopped at any time to return φ as true if we had chosen α = α0 and β = β0 such that
the currently best plan found.
β0
f≤ (1)
1 − α0
Algorithm 1 Generic planning algorithm for probabilistic
planning based on the GTD paradigm. holds. If, instead, we had chosen α = α1 and β = β1
F IND -P LAN (M, s0 , φ, π0 ) satisfying the inequality
if V ERIFY-P LAN (M, s0 , φ, π0 ) then 1 − β1
return π0 f≥ , (2)
α1
else
π ⇐ π0 then we would reject φ as false at the current stage. What
loop  return π on break decision should be returned if the test procedure was termi-
repeat nated at this point, and what is the confidence in this decision
π ′ ⇐ R EPAIR -P LAN (π) in terms of error probability?
if V ERIFY-P LAN (M, s0 , φ, π ′ ) then We will from here on assume that βi can be expressed as
return π ′ a fraction of αi ; βi = γαi . We can think of γ as being the
else β
fraction α , with α and β being the target error bounds for
π ′ ⇐ B ETTER -P LAN (π, π ′ ) the verification of φ if enough time is given the algorithm.
until π ′ 6= π We can accept φ as true with probability of error at most β0
π ⇐ π′ if (1) is satisfied. We have
γα0 γα0 1
The procedure V ERIFY-P LAN returns true iff φ is satis- f≤ =⇒ 1 − α0 ≤ =⇒ α0 ≥ γ .
1 − α0 f 1+
fied in s0 by the stochastic process M[π]. B ETTER -P LAN f
returns the better of two plans. In the next two sections, we From this follows that if we had aimed for error bounds α =
describe how to efficiently implement these two procedures 1 γ
1+γ/f and β = 1+γ/f , then we would have accepted φ as
using acceptance sampling. In the third section we show
true at this stage.
how the information gathered during plan verification can
If (2) is satisfied, then we can reject φ as false with prob-
be used to guide plan repair.
ability of error at most α1 . In this case we have
Anytime Plan Verification 1 − γα1 1
f≥ =⇒ f α1 ≥ 1 − γα1 =⇒ α1 ≥ .
Younes & Simmons (2002) propose an algorithm for verify- α1 γ+f
ing probabilistic real-time properties using acceptance sam- It now follows that if we had aimed for error bounds α =
pling. Their work shows how to verify CSL properties given 1 γ
γ+f and β = γ+f , then we would have rejected φ as false
error bounds α and β, where α is the maximum probabil-
at this stage.
ity of incorrectly verifying a true property (false negative)
For the moment ignoring any decisions derived prior to
and β is the maximum probability of incorrectly verifying
the current sample, if we need to choose between accepting
a false property (false positive). We adopt this approach,
φ as true and rejecting φ as false at this stage, then we should
but develop a true anytime algorithm for verification of CSL
choose the decision that can be made with the lowest error
properties of the form φ = Pr≥θ (ρ) that can be stopped at
bounds (highest confidence). We choose to accept φ as true
any time to return a decision with a confidence level whether
if
φ holds or not. The more time the algorithm is given, the 1 1
higher the confidence in the decision will be. < , (3)
1 + γf γ+f
Assume we are using the sequential probability ratio test
(Wald 1945) to verify a probabilistic property φ = Pr≥θ (ρ) we choose to reject φ as false if
with an indifference region of width 2δ centered around θ. 1 1
Typically we fix the error bounds α and β and, given n sam- γ > , (4)
ples of which d are positive samples, compute the fraction 1+ f γ+f

pd1 (1 − p1 )n−d and we choose a decision with equal probability otherwise.


f= Note that because we have assumed that βi = γαi , if
pd0 (1 − p0 )n−d αi < αj then also βi < βj , so it is sufficient to compare
with p0 = θ + δ and p1 = θ − δ. We accept φ as true if the αi ’s. If condition (3) holds, then the probability of error
γ
f ≤ 1−α β
, reject φ as false if f ≥ 1−β for the chosen decision is at most 1+γ/f , while if condition
α , and generate an 1
additional sample otherwise. (4) holds the probability of error is at most γ+f . We denote
For an anytime approach to verification, we instead want the minimum of the αi ’s at the current stage, min(α0 , α1 ),
to derive error bounds α and β that can be guaranteed if the by α̌.
Algorithm 2 Procedure for anytime verification of φ = the goal condition. The confidence in the decision increases
Pr≥θ (ρ) with an indifference region of width 2δ. rapidly initially, and exceeds 0.99 within 0.08 seconds after
V ERIFY-P LAN (M, s, φ, π) 199 samples have been generated.
α(0) ⇐ 12 , d(0) ⇐ either, n ⇐ 0
Plan Comparison
f ⇐ 1, p0 ⇐ θ + δ, p1 ⇐ θ − δ
loop  return d(n) on break We need to compare two plans in order to perform hill-
generate sample execution path σ starting in s climbing search for a satisfactory plan. In this section
if M[π], σ |= ρ then we show how to use a sequential test, developed by Wald
f ⇐ f · pp10 (1945), to determine which of two plans is better.
else Let M be a discrete event system, and let π and π ′ be two
f ⇐ f · 1−p1−p1 plans. Given a planning problem with an initial state s0 and
1
0
1
a CSL goal condition φ = Pr≥θ (ρ), let p be the probability
α0 ⇐ 1+γ/f , α1 ⇐ γ+f that ρ is satisfied by sample execution paths of M[π] starting
if α0 < α1 then in s0 and let p′ be the probability that ρ is satisfied by paths
d ⇐ true of M[π ′ ]. We then say that π is better than π ′ for the given
else if α1 < α0 then planning problem iff p > p′ .
d ⇐ false The Wald test is carried out by pairing samples for the
else two processes M[π] and M[π ′ ]. We now consider samples
d ⇐ either of the form hb, b′ i, where b is the result of verifying ρ over
α̌ = min(α0 , α1 ) a sample execution path for M[π] (similarly for b′ and π ′ ).
if max(α̌, γ α̌) < 21 then We count samples only when b 6= b′ . A sample htrue, falsei
if α̌ < α(n) then is counted as a positive sample because, in this case, π is per-
α(n+1) ⇐ α̌, d(n+1) ⇐ d forming better than π ′ , while a sample hfalse, truei counts
else if α̌ = α(n) and d 6= d(n) then as a negative sample. Let p̃ be the probability of observing
α(n+1) ⇐ α(n) , d(n+1) ⇐ either a positive sample. It is easy to verify that if π and π ′ are
else equally good, then p̃ = 12 . We should prefer π if p̃ > 21 . The
α(n+1) ⇐ α(n) , d(n+1) ⇐ d(n) situation is similar to when we are verifying a probabilistic
else property Pr≥θ (ρ), so we can use the sequential probability
α(n+1) ⇐ α(n) , d(n+1) ⇐ d(n) ratio test with probability threshold θ̃ = 12 to compare π and
n⇐n+1 π′ .
Algorithm 3 shows the procedure for doing the plan com-
parison. Note that this algorithm assumes that we reuse sam-
Now, let us take into account decisions that could have ples generated in verifying each plan (Algorithm 2).
been made prior to seeing the current sample. Let α(n) de-
note the lowest error bound achieved up to and including Algorithm 3 Procedure returning the better of two plans.
the nth sample. If α̌ < α(n) , assuming the current stage is B ETTER -P LAN (π, π ′ )
n + 1, then the decision chosen at the current stage with- n ⇐ min(|bb|, |bb′ |)
out considering prior decisions is better than any decisions f ⇐ 1, p0 ⇐ 12 + δ, p1 ⇐ 12 − δ
made at earlier stages, and should therefore be the decision for all i ∈ [1, n] do
returned if the algorithm is terminated at this point. Other- if bi ∧ ¬b′i then
wise, a prior decision is better than the current, so we re- f ⇐ f · pp01
tain that decision as our choice. We set the lowest error else if ¬bi ∧ b′i then
bounds for stage n + 1 in agreement with the selected de- f ⇐ f · 1−p1−p1
cision: α(n+1) = min(α̌, α(n) ). 1
0
1
For the sequential probability ratio test to be well defined, α0 ⇐ 1+1/f , α1 ⇐ 1+f
it is required that the error bounds α and β are less than 12 . if α0 ≤ α1 then
It is possible that either α̌ or γ α̌ violates this constraint if return π  confidence 1 − α0
γ 6= 1. If that is the case, we simply ignore the current de- else
cision and make a random decision. This finalizes the algo- return π ′  confidence 1 − α1
rithm, summarized in pseudo-code as Algorithm 2, for any-
time verification of probabilistic properties. The result of the
algorithm, if terminated after n samples, is the decision d(n) Domain Independent Plan Repair
with a probability of error at most β = γα(n) if d(n) = true During plan verification, we generate a set of sample ex-
and α = α(n) otherwise. ecution paths σ = {σ1 , . . . , σn }. Given a goal condition
Figure 2 plots the error probability over time as the plan in φ = Pr≥θ (ρ), we verify the path formula ρ over each sam-
Figure 1 is verified for the example problem using δ = 0.01 ple path σi . We denote the set of sample paths over which
and γ = 1. The decision for all confidence levels, except a ρ holds σ + , and the set of paths over which ρ does not hold
few in the very beginning, is that the plan does not satisfy σ − . The sample paths in σ − provide information on how
designed to determine whether the probability of some prop-
0.5 erty holding is above a target threshold. We believe that the
verification algorithm in itself is a significant contribution to
0.4 the field of probabilistic planning.
Our planning algorithm utilizes information from the ver-
Probability of Error

ification phase to guide plan repair. In particular, we are us-


0.3 ing negative sample execution paths to determine what can
go wrong, and then try to modify the current plan so that
0.2 the negative behavior is avoided. Our planning framework
is not tied to any particular plan repair technique, however,
and our work on plan repair presented in this paper is only
0.1
preliminary. As with any planning, the key to focused search
is good search control heuristics. Information from simula-
0 t (s) tion traces helps us focus the repair effort on relevant parts
0 0.02 0.04 0.06 0.08 0.1 0.12 of the state space.

Figure 2: Probability of error over time, using δ = 0.01 References


and γ = 1, for the verification of the initial plan. The deci- Alur, R.; Courcoubetis, C.; and Dill, D. 1993. Model-
sion in—except in the very beginning for some error bounds checking in dense real-time. Information and Computation
above 0.4—is that the goal condition is not satisfied. 104.
Aziz, A.; Sanwal, K.; Singhal, V.; and Brayton, R. 2000.
Model-checking continuous-time Markov chains. ACM
a plan can fail. We can use this information to guide plan
Transactions on Computational Logic 1.
repair without relying on specific domain knowledge.
In order to repair a plan for goal condition φ = Pr≥θ (ρ), Baier, C.; Katoen, J.-P.; and Hermanns, H. 1999. Approxi-
we need to lower the probability of paths not satisfying ρ. A mate symbolic model checking of continuous-time Markov
negative sample execution path chains. In Proc. CONCUR’99.
ti,m−1 ,ei,m−1
Blythe, J. 1994. Planning with external events. In Proc.
ti0 ,ei0 ti1 ,ei1
σi− = si0 −→ si1 −→ . . . −→ sim UAI’94.
Bresina, J.; Dearden, R.; Meuleau, N.; Ramakrishnan, S.;
is evidence showing how a plan can fail to achieve the goal
Smith, D. E.; and Washington, R. 2002. Planning under
condition. We could, conceivably, improve a plan by mod-
continuous time and resource uncertainty: A challenge for
ifying it so that it breaks the sequence of states and events
AI. In Proc. UAI’02.
along a negative sample path. A generic repair procedure
could non-deterministically select a state occurring along Clarke, E. M.; Emerson, E. A.; and Sistla, A. P. 1986. Au-
some negative sample path, and for the selected state assign tomatic verification of finite-state concurrent systems using
an alternative action. The sample paths help us focus on the temporal logic specifications. ACM Transactions on Pro-
relevant parts of the state space when considering a repair gramming Languages and Systems 8.
for a plan. Dean, T., and Boddy, M. S. 1988. An analysis of time-
dependent planning. In Proc. AAAI’88.
Discussion Drummond, M., and Bresina, J. 1990. Anytime synthetic
We have presented a framework for policy generation in projection: Maximizing the probability of goal satisfaction.
continuous-time stochastic domains. Our planning algo- In Proc. AAAI’90.
rithm makes practically no assumptions regarding the com- Hansson, H., and Jonsson, B. 1994. A logic for reasoning
plexity of the domain dynamics, and we can use it to gener- about time and reliability. Formal Aspects of Computing 6.
ate stationary policies for any discrete event system that we Infante López, G. G.; Hermanns, H.; and Katoen, J.-P.
can generate sample execution paths for. While most pre- 2001. Beyond memoryless distributions: Model checking
vious approaches to probabilistic planning requires time to semi-Markov chains. In Proc. PAPM-PROBMIV’01.
be discretized, we work with time as a continuous quantity.
To efficiently handle continuous time, we rely on sampling- Lesh, N.; Martin, N.; and Allen, J. 1998. Improving big
based techniques. We use CSL as a formalism for specify- plans. In Proc. AAAI’98.
ing goal conditions, and we have presented an anytime algo- Simmons, R. G. 1988. A theory of debugging plans and
rithm based on sequential acceptance sampling for verify- interpretations. In Proc. AAAI’88.
ing whether a plan satisfies a given goal condition. Our ap- Wald, A. 1945. Sequential tests of statistical hypotheses.
proach to plan verification differs from previous simulation- Annals of Mathematical Statistics 16.
based algorithms for probabilistic plan verification (Blythe Younes, H. L. S., and Simmons, R. G. 2002. Probabilis-
1994; Lesh, Martin, & Allen 1998) in that we avoid ever tic verification of discrete event systems using acceptance
calculating any probability estimates. Instead we use ef- sampling. In Proc. CAV’02.
ficient statistical hypothesis testing techniques specifically

You might also like