Professional Documents
Culture Documents
Abstract A two-player nonlinear Stackelberg differen- feedback Stackelberg solutions [15]. Unlike closed-loop
tial game with player 1 and player 2 as leader and follower, Stackelberg solutions, feedback Stackelberg solutions
respectively, is considered. The feedback Stackelberg solu- can be found via dynamic programming, and these
tions to such games rely on the solution of two coupled
partial differential equations (PDEs) for which closed- are studied in this paper. Determining either Nash
form solutions cannot in general be found. A method or Stackelberg feedback equilibrium solutions for a
for constructing strategies satisfying partial differential in- given differential game relies on the solution of coupled
equalities in place of the PDEs is presented. It is shown that partial differential equations (PDEs), of the Hamilton-
these constitute approximate solutions to the differential Jacobi-Isaacs type. Closed-form solutions of these can-
game. The theory is illustrated by a numerical example.
not be found in general and it is often necessary to
I. I NTRODUCTION settle for approximate solutions. In [16][18] a method
Game theory is the study of multi-player decision for obtaining solutions approximating the Nash equi-
making [1][6]. Many problems involve decisions of librium solution of a class of nonzero-sum differential
some kind: these could be decisions between com- games has been introduced. In what follows, the devel-
peting players or determining the best trade-offs for oped method is extended to the case in which feedback
players with different goals. Thus, differential games Stackelberg solutions are sought. In particular a two-
have a large variety of applications in a wide range of player nonzero-sum differential game is considered.
areas. Typical areas are economics and management, The remainder of the paper is organised as follows.
defense, and robust control [2], [7][9]. Additionally, In Section II a formal definition of the problem and
differential games play a role in many other areas, of its solution is given. A method for obtaining ap-
including problems involving multi-agent systems and proximate solutions is presented in Section III, in which
biological systems [10], [11]. also the notion of the approximation is made precise.
One of the most common solution concepts associ- Simulations illustrating the method are then given in
ated with game theoretic problems is that of Nash equi- Section IV. Finally, directions for future work are given
librium solutions whereby it is assumed that all players along with concluding remarks in Section V.
are rational and announce their strategies simultane-
II. T HE T WO -P LAYER S TACKELBERG D IFFERENTIAL
ously [1], [12], [13]. A different solution concept was
G AME
introduced by Stackelberg in 1934 [14]. In Stackelberg
differential games some decision makers are able to act Consider the input-affine dynamical system1
prior to the other players which then react in a rational
x = f (x) + g1 (x)u1 + g2 (x)u2 , (1)
manner, i.e. the players act in a specific order [1]. Thus,
unlike Nash equilibria, Stackelberg equilibria introduce where x(t) Rn , with n > 0, is the state of the system,
a hierarchy between the players. Some simple examples f (x) : Rn Rn , g1 (x) Rnm and g2 (x) Rnm ,
of the different types of games are the game of rock- with m > 0, are smooth mappings and u1 (t) Rm and
paper-scissor, where all players act simultaneously u2 (t) Rm are the control signals of player 1 and 2,
(Nash), and tic-tac-toe or chess, where the players act respectively.
in a specific order (Stackelberg). In some cases the order Assumption 1: The origin of Rn is an equilibrium
in which the players act is irrelevant: the Nash and point of the vector field f (x), i.e. f (0) = 0.
Stackelberg equilibria are equivalent. This is, however,
not generally the case. A consequence of Assumption 1 is that we can write
Different solution concepts for Stackelberg differ- f (x) = F (x)x, for some (not unique) matrix-valued
ential games exist, namely open-loop, closed-loop and function F (x) : Rn Rnn .
T. Mylvaganam is with the Department of Electrical and Elec- Suppose the game is such that the players announce
tronic Engineering, Imperial College London, London SW7 2AZ, UK their strategies in a predefined order. In particular
(Email: thulasi.mylvaganam06@imperial.ac.uk). suppose that player 1 acts before player 2: in this case
A. Astolfi is with the Department of Electrical and Electronic
Engineering, Imperial College London, London SW7 2AZ, UK and player 1 is referred to as the leader, whereas player 2 is
with the Dipartimento di Ingegneria Civile e Ingegneria Informatica,
Universita di Roma Tor Vergata, Via del Politecnico, 1 00133 Roma, 1 All mappings and functions are assumed to be sufficiently
Italy (Email: a.astolfi@ic.ac.uk). smooth.
is maximised by the selection = 1 and the resulting Anticipating this behaviour, the leader should select its
bound on is given in the following statement, which strategy according to
is assumed to hold in the remainder of the paper. u1 = arg min H1 (x, u1 , u2 , 1 )
u1
Assumption 2: The parameter is such that || < 12 .
1
(g1 (x) g2 (x))> 1 g2 (x)> 2 .
Remark 1: The range of in Assumption 2 is more =
1 22
restrictive than that in [19], where || < 12 . (6)
Remark 2: For the cost functionals to remain Let g (x) = g1 (x) g2 (x) and consider the PDEs
bounded it is necessary that J1 + J2 0 for some
R. This is guaranteed provided and are such V1 1 V1 V2 >
f (x) + q1 (x) g2 (x)g2 (x)>
that 2 (1 + )2 0. The range of allowable is x 2 x x
maximised by the selection = 1 and the resulting 1 1
V1
V2
2
bound is that given in Assumption 2. g (x) g2 (x)
= 0,
2 1 22 x x
In what follows feedback Stackelberg equilibria for the
differential game are considered. >
V2 1 1 V2 > V2
Definition 1: A pair of feedback strategies (u1 , u2 ) is f (x) + q2 (x) + g2 (x)g2 (x)
x 2 1 22 x x
said to be admissible if it renders the zero equilibrium
2
1 1 2 V2 V1
of the closed-loop system (locally) asymptotically sta-
) g (x) (x)
2 g
ble.2 2 (1 22 )2 x x
(1
Problem 1: Consider the system (1) and the cost func- >
1 V2 > V1
tionals of the leader and the follower, namely (2) and g1 (x) g (x) = 0,
1 22 x x
(3), respectively. The 2-player Stackelberg differential (7)
game consists in determining a pair of admissible with solutions Vi such that Vi (0) = 0 for i = 1, 2, and
feedback strategies, (u1 , u2 ) satisfying such that V1 (x) + V2 (x) > 0 for all x 6= 0. Then, the
Stackelberg equilibrium solution of the game defined
J1 (x(0), u1 , u2 (u1 )) J1 (x(0), u1 , u2 (u1 )) ,
(4) by the dynamics (1) and the cost functionals (2) and (3),
J2 (x(0), u1 , u2 (u1 )) J2 (x(0), u1 (t), u2 (u1 )) , with player 1 as leader and player 2 as follower, is
for all u1 6= u1 such that (u1 , u2 (u1 ) is admissible, and given by
!
u2 (u1 ) 6= u2 (u1 ), where u1 is any strategy such that 1 > V1
>
> V2
>
1
P2 A + A> P2 + Q2 + 2P2 B2 B2> P2 III. C ONSTRUCTING A PPROXIMATE S OLUTIONS TO
1 22 THE S TACKELBERG D IFFERENTIAL G AMES
1
P1 B B1> P2 P2 B1 >B P 1 (1 2 )P2 B2
(1 22 )2
P1 B )((1 2 )B2> P2 > B P 1 = 0, In general it is not possible to obtain closed-form
(9) solutions to the coupled PDEs (7) making it necessary
where B = (B1 B2 ). Suppose P1 = P1> and to settle for approximate solutions. In this section a
P2 = P2> , such that P1 + P2 > 0, solve (9). Then, the method for constructing a solution to a modified dif-
Stackelberg equilibrium solution is ferential game that approximates the solution of the
Stackelberg game defined by (1), (2) and (3) is intro-
1
ulq > >
1 = B P1 B2 P2 x ,
duced. The approximate solution relies on the notion
1 2 2
of algebraic P Stackelberg solution which is defined in
ulq lq
2 (u1 ) = B2> P2 x ulq
1 . the following. Similar methods for obtaining -Nash
equilibrium strategies have been discussed in [16][18].
Note that the conditions W = 12 x> (P1 + P2 )x > 0, for
all x 6= 0, and W < 0 for all x 6= 0, are implied by (9). Definition 4: Consider the system (1), the cost func-
It follows that the pair (ulq lq
1 , u2 ) is admissible.
tionals (2) and (3) and the resulting Stackelberg differ-
ential game with player 1 as leader and player 2 as fol-
Remark 4: In [19] a stochastic linear-quadratic differ-
lower. Let 1 (x) : Rn Rnn and 2 (x) : Rn Rnn
ential game with B1 = B2 = I has been considered.
be matrix-valued functions satisfying i (x) = i (x)> >
The above results are consistent with [19].
0 for all x Rn \ {0} and i (0) = i , for i = 1, 2.
Remark 5: When = 0, the feedback Stackelberg The continuous matrix-valued functions P1 (x) Rnn
equilibrium strategies and the feedback Nash equilib- and P2 (x) Rnn , such that P1 (x) = P1 (x)> and
rium strategies coincide, i.e. in this case the order in P2 (x) = P2 (x)> , are said to constitute a X -algebraic P
which the players act is irrelevant to the corresponding Stackelberg solution of the PDEs (7) if P1 (x)+P2 (x) > 0
outcomes [19], [20]. for all x X , where X Rn is a neighbourhood of the
To conclude this section define the notions of - origin, and if the following conditions are satisfied.
admissible strategies and -Stackelberg equilibrium
solutions, similarly to what has been done in [18]. i) For all x X
Definition 2: The pair of strategies (u1 , u2 (u1 )) is said
to be -admissible for the non-cooperative Stackelberg
differential game if the zero equilibrium of the system P1 (x)F (x) + F (x)> P1 (x) + Q1 (x)
(1) in closed-loop with (u1 , u2 (u1 )) is (locally) asymp- 1
2
totically stable and3 (Acl + I) C , where Acl is the (x) (x) P (x)g (x) (10)
1 g 2 2
1 22
P
3 The spectrum of the matrix A is denoted by (A). 2P1 (x)g2 (x)g2 (x)> P2 (x) + 1 (x) = 0 ,
422
and Problem 2: Consider the system (1) and the cost func-
> tionals of the leader and the follower, i.e. (2) and (3).
P2 (x)F (x) + F (x) P2 (x) + Q2 (x)
Solving the approximate dynamic Stackelberg differential
1
2 game consists in determining dynamic feedback strate-
2
)P (x)g (x) P (x) (x)
(1 22 )2
(1 2 2 1 g
of the form (13) with (t) Rn , and non-
gies (u1 , u2 , )
negative functions c1 : Rn Rn R and c2 : Rn Rn
2
+ P 2 (x)g2 (x)g 2 (x)>P 2 (x) R, such that, for any x(0), (0) and for any admissible
1 + 22 with u1 6= 1 , and (1 , u2 , ),
(u1 , 2 , ), with u2 6= 2 ,
1
2
P2 (x)g1 (x)g (x)> P1 (x) J1 (x(0), 1 , 2 (1 )) J1 (x(0), u1 , 2 (u1 )) ,
1 + 2
1 >
J2 (x(0), 1 , 2 (1 )) J2 (x(0), 1 , u2 (1 )) ,
P 1 (x) g (x)g 1 (x) P 2 (x) + 2 (x) = 0 .
1 + 22
(11) with the modified cost functionals J1 and J2 given by
g2 (x)> >
+ P 2 (x)x + (R 2 2 (x, ))(x ) , 0.05
1 22 0.1
> 0 0
u2 (u1 ) = g2 (x)> P2 (x)x + (R2 2 (x, ))(x ) 1 1
2 2
u1 ,
3 3
(18) x1,0 x2,0
4 4
yields admissible dynamic feedback strategies that
solve Problem 2. Finally, there exists a neighbourhood
of the origin in which the strategies (18) constitute an Fig. 1. The value of C1 (x0 ) for various initial conditions.
-Stackelberg equilibrium solution of Problem 1 for all
> 0.
424
J1 (ud1 , u2 (ud1 )) J1 (ul1 , u2 (ulq
1 )
C1 (x0 ) = , if J1 (u1 , u2 (u1 )) 6= 0
|J1 (u1 , u2 (u1 ))|
(22)
J2 (ud1 , u2 (ud1 )) J2 (ul1 , u2 (ulq
1 )
C2 (x0 ) = , if J2 (u1 , u2 (u1 )) 6= 0
|J2 (u1 , u2 (u1 ))|
425