You are on page 1of 36

HPLAN-P

An Heuristic Search Planner to Planning with Temporally Extended Preferences


Luca Ceriani

HPLAN-P
Heuristic planning with TEGs, SPs and TEPs
Incremental search algorithm

Extended version of TLPLAN


TLPLAN DDL PDDL2TLPlan translator

Awarded distinguished performance in the qualitative preference track (IPC5)

Heuristics Planning with Preferences


Distinguishing between successful plans of different quality
Qualitative vs Quantitative

Actively guide the search towards the achievement of preferences


heuristics for planning with preferences

Outline
PDDL3 problem/domain
Planning problem with TEGs and TEPs

Preprocessing Phase Adapting existing heuristic search techniques to achieve SPs and solve the compiled problem HPLAN-P algorithm
exploiting the adapted heuristics to incrementally find better plans

PDDL3 Overview
TEGs/TEPs Simple Preferences (SPs) Precondition Preferences (PPs) Metric Function (M)

TEGs and TEPs


Temporal constraints

Simple Preferences (SPs)


Atemporal conditions over the final state of a plan

Precondition Preferences (PPs)

Fare clic per modificare stili del testo dello schema


Secondo livello
Terzo livello
Quarto livello Quinto livello

Metric Function
Defines the plan (numeric) quality over:
Preference violation weigth/count
Preference internal/external quantification

Preprocessing PDDL3
Simpler planning problem containing only SPs
augmented planning domain/problem

New metric function M


refers to SPs

Preprocessing PPs
(preference p )
is-violated-p counter
Initiliazed to 0

(when (not ) (increase (is-violated-p) 1))

In the context of a single action

Preprocessing TEGs /TEPs


TEGs and TEPs reduced SGs and SPs
SPs are optional goal condition

(TEG or TEP) ,
a new domain predicate P P is TRUE is satisfied by the plan

Preprocessing TEGs /TEPs: Steps


TEGs/TEPs f-FOLTL f
finite LTL: not achievable goals

f Automaton A
No BA Transitions labeled with FO (PDDL) predicates A states monitor the satisfaction of

A Planning Domain
Only valid/preferred plans simulate automata
acceptance predicate acceptance state

Sometime

are clic per modificare stili del testo dello sche


Quarto livello Quinto livello

Secondo livello

Terzo livello

Always

are clic per modificare stili del testo dello sche


Quarto livello Quinto livello

Secondo livello

Terzo livello

Preprocessing Automaton
A two new predicates (eventually parameterized) (state-A ?s ?x) and (accepting-A ?x) Automaton state updates with CE (eventually quantified)
one step behind Augmenting each original action + finish action

Adding start/finish actions initialization/goal specification Mutex and exhaustive Multiple parallel updates of different automata

PNFA

are clic per modificare stili del testo dello sche


Quarto livello Quinto livello

Secondo livello

Terzo livello

PNFA
All different paths to the goal

PNFA
Pseudo-action updates
No augmenting action domain

Belief state reasoning Exploited TLPLAN pruning ability

Non-Compilable TEGs/TEPs
Constraints that require infinite plans

State trajectory constraints and linearization

Temporal Domain
CE added at both start/end points of each action TIL (exogenous events)
within, hold-after, hold-during

(always-within t )
Timed Automaton reset action

Heuristics Design
Active search Priority to achieving HG Desirability VS Ease of Achieving preferences

Heuristic for Planning with Preferences


Relaxed planning graph based heuristics
graph expanded until all goal and preference facts appear in the relaxed state accepting predicates pseuso actions

Goal Distance Function G


How hard is to reach the goal
non-admissible

Preference Distance Function P


How hard is to reach the preference facts Unreachable preference facts do not affect Ps value

Optimistic Metric Function O


Estimate the value achievable by any plan extending the partial plan reaching s NO RPG but evaluates M in s assuming:
no PPs will be violated in the future Unachievable preference are treated as false All inviolate preferences will achieved in the future

If M is non-increasing in the number of achieved preferences, O is a lower bound (for M) on the best plan extending s

Best Relaxed Metric Function B


Evaluates M in each world of the RPG and takes B as the minimum value
M in the relaxed world not increase

Tighter estimate than O


Lower bound under the same condition as O Computationally more expensive

Discounted Metric Function D(r)


Believes more in easier preferences
Ms weight has higher impact on D (tradeoff)

r [0, 1] discount factor


r 0: heavily discount deeper preferences

HPLAN-P
Forward search
Best First Search

Heuristic
Different from TLPLAN

Incremental (episodic)
Each episode ends as soon as a better plan is found

Optimal

Sequence of Planning Episodes


G with Best First Search
HG must be satisfied Other h. can conflict with HG

Restart the search using some combination of the h. functions


Any combination of h.
Always G at first

Prioritized sequences to break ties


GD(0.3)O GD(0.1)D(0.2)P

Caching relaxed states and computed h. values

Increase Plan Quality


Each subsequent episode yields a better plan Increasingly restricted pruning
MetricBoundFN(s) estimates a lower bound on M of any plan extending s Either O or B can be used by MetricBoundFN(.)

Pruning states that violate HC

HPLAN-P Algorithm

are clic per modificare stili del testo dello sche


Quarto livello Quinto livello

Secondo livello

Terzo livello

Sound Pruning
If MetricBoundFN(s) is a lower bound on M of any plan extending s pruning is sound With sound pruning optimal plans are never pruned
1. MetricBoundFN(s) bestMetric
2. s is pruned

3. MetricBoundFN(s) M(ss)
4. ss never reached

5. M(ss) bestMetric
6. sound pruning

Optimality
If HPLAN-P stops and sound pruning is used the last plan return is optimal Proof
Each planning episode has returned a better plan It stops only when final episode has rejected all possible plans Sound pruning never prunes optimal plans No better plan than the last one returned exists

UserHeuristic(.) can even be non-admissible k-optimality


sound pruning (total-time) k as HC

Termination
HPLAN-P termination conditions:
bestMetricintial finite MetricBoundFN(s) bestMetricintial finite
M cannot improve as the number of violated PPs increases

m | m < bestMetricintial and M=m


The number of plans with M<m is finite

References

A Heuristic Search Approach to Planning with Temporally Extended Prefere ,Baier, J.andBacchus, F.andMcIlraith, S., 2007 Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), pp. 1808-1815, January , Hyderabad, India

Planning with First-Order Temporally Extended Goals Using Heuristic Searc ,Jorge A. BaierandSheila McIlraith, Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pp. 788-795, July 2006, Boston, MA.

Alfonso E. Gerevini, Derek Long, Patrik Haslum, Alessandro Saetti, Yannis Dimopoulos, " Deterministic Planning in the Fifth International Planning Competition: PDD ,Artificial Intelligence, vol 173 (2009), pp. 619-668.

You might also like