You are on page 1of 26

Linear-Quadratic McKean-Vlasov Stochastic Differential Games


† ‡
Enzo MILLER Huyên PHAM

December 4, 2018
arXiv:1812.00632v1 [math.PR] 3 Dec 2018

Abstract
We consider a multi-player stochastic differential game with linear McKean-Vlasov dynamics
and quadratic cost functional depending on the variance and mean of the state and control
actions of the players in open-loop form. Finite and infinite horizon problems with possibly some
random coefficients as well as common noise are addressed. We propose a simple direct approach
based on weak martingale optimality principle together with a fixed point argument in the
space of controls for solving this game problem. The Nash equilibria are characterized in terms
of systems of Riccati ordinary differential equations and linear mean-field backward stochastic
differential equations: existence and uniqueness conditions are provided for such systems. Finally,
we illustrate our results on a toy example.

MSC Classification: 49N10, 49L20, 91A13.

Key words: Mean-field SDEs, stochastic differential game, linear-quadratic, open-loop controls,
Nash equilibria, weak martingale optimality principle.

1 Introduction
1.1 General introduction-Motivation
The study of large population of interacting individuals (agents, computers, firms) is a central issue
in many fields of science, and finds numerous relevant applications in economics/finance (systemic
risk with financial entities strongly interconnected), sociology (regulation of a crowd motion, herding
behavior, social networks), physics, biology, or electrical engineering (telecommunication). Ratio-
nality in the behavior of the population is a natural requirement, especially in social sciences, and is
addressed by including individual decisions, where each individual optimizes some criterion, e.g. an
investor maximizes her/his wealth, a firm chooses how much to produce outputs (goods, electricity,
etc) or post advertising for a large population. The criterion and optimal decision of each individual
depend on the others and affect the whole group, and one is then typically looking for an equilibrium
among the population where the dynamics of the system evolves endogenously as a consequence of
the optimal choices made by each individual. When the number of indistinguishable agents in the
population tend to infinity, and by considering cooperation between the agents, we are reduced in
the asymptotic formulation to a McKean-Vlasov (McKV) control problem where the dynamics and

This work is supported by FiME (Finance for Energy Market Research Centre) and the “Finance et Développement
Durable - Approches Quantitatives” EDF - CACIB Chair.

LPSM, University Paris Diderot, enzo.miller at polytechnique.edu

LPSM, University Paris Diderot and CREST-ENSAE, pham at lpsm.paris

1
the cost functional depend upon the law of the stochastic process. This corresponds to a Pareto-
optimum where a social planner/influencer decides of the strategies for each individual. The theory
of McKV control problems, also called mean-field type control, has generated recent advances in the
literature, either by the maximum principle [5], or the dynamic programming approach [14], see also
the recent books [3] and [6], and the references therein, and linear quadratic (LQ) models provide
an important class of solvable applications studied in many papers, see, e.g., [15], [11], [10], [2].
In this paper, we consider multi-player stochastic differential games for McKean-Vlasov dyna-
mics. This corresponds and is motivated by the competitive interaction of multi-population with a
large number of indistinguishable agents. In this context, we are then looking for a Nash equilibrium
among the multi-class of populations. Such problem, sometimes refereed to as mean-field-type game,
allows to incorporate competition and heterogeneity in the population, and is a natural extension of
McKean-Vlasov (or mean-field-type) control by including multiple decision makers. It finds natural
applications in engineering, power systems, social sciences and cybersecurity, and has attracted
recent attention in the literature, see, e.g., [1], [7], [8], [4]. We focus more specifically on the case
of linear McKean-Vlasov dynamics and quadratic cost functional for each player (social planner).
Linear Quadratic McKean-Vlasov stochastic differential game has been studied in [9] for a one-
dimensional state process, and by restricting to closed-loop control. Here, we consider both finite
and infinite horizon problems in a multi-dimensional framework, with random coefficients for the
affine terms of the McKean-Vlasov dynamics and random coefficients for the linear terms of the cost
functional. Moreover, controls of each player are in open-loop form. Our main contribution is to
provide a simple and direct approach based on weak martingale optimality principle developed in [2]
for McKean-Vlasov control problem, and that we extend to the stochastic differential game, together
with a fixed point argument in the space of open-loop controls, for finding a Nash equilibrium.
The key point is to find a suitable ansatz for determining the fixed point corresponding to the
Nash equilibria that we characterize explicitly in terms of systems of Riccati ordinary differential
equations and linear mean-field backward stochastic differential equations: existence and uniqueness
conditions are provided for such systems.
The rest of this paper is organized as follows. We continue Section 1 by formulating the Nash
equilibrium problem in the linear quadratic McKean-Vlasov finite horizon framework, and by giving
some notations and assumptions. Section 2 presents the verification lemma based on weak sub-
martingale optimality principle for finding a Nash equilibrium, and details each step of the method
to compute a Nash equilibrium. We give some extensions in Section 3 to the case of infinite horizon
and common noise. Finally, we illustrate our results in Section 4 on some toy example.

1.2 Problem formulation


Let T > 0 be a finite given horizon. Let (Ω, F, F, P) be a fixed filtered probability space where
F = (Ft )t∈[0,T ] is the natural filtration of a real Brownian motion W = (Wt )t∈[0,T ] . In this section,
for simplicity, we deal with the case of a single real-valued Brownian motion, and the case of multiple
Brownian motions will be addressed later in Section 3. We consider a multi-player game with n
players, and define the set of admissible controls for each player i ∈ J1, nK as:
 Z T 
−ρt
A = αi : Ω × [0, T ] → R s.t. αi is F-adapted and
i di 2
e E[|αi,t | ]dt < ∞ ,
0

where ρ is a nonnegative constant discount factor. We denote by A = A1 × ... × An , and for any
α = (α1 , ..., αn ) ∈ A, i ∈ J1, nK, we set α−i = (α1 , . . . , αi−1 , αi+1 , . . . , αn ) ∈ A−i = A1 × . . . × Ai−1 ×
Ai+1 × . . . × An .

2
Given a square integrable measurable random variable X0 and control α = (α1 , ..., αn ) ∈ A, we
consider the controlled linear mean-field stochastic differential equation in Rd :
(
dXt = b(t, Xtα , E [Xtα ] , αt , E [αt ])dt + σ(t, Xtα , E [Xtα ] , αt , E [αt ])dWt , 0 ≤ t ≤ T,
(1)
X0α = X0 ,

where for t ∈ [0, T ], x, x ∈ Rd , ai , ai ∈ Rdi :


 Pn
b(t, x, x, α, α) = βt + bx,t x + b̃x,t x + i=1 bi,t αi + b̃i,t αi



 = βt + bx,t x + b̃x,t x + Bt α + B̃t α
Pn (2)


 σ(t, x, x, α, α) = γ t + σ x,t x + σ̃ x,t x + i=1 σi,t αi + σ̃i,t αi

 = γt + σx,t x + σ̃x,t x + Σt α + Σ̃t α.

Here all the coefficients are deterministic matrix-valued processes except β and σ which are vector-
valued F-progressively measurable processes.
The goal of each player i ∈ J1, nK during the game is to minimize her cost functional over αi ∈
Ai , given the actions α−i of the other players:
hZ T i
−i
i
J (αi , α ) = E e−ρt f i (t, Xtα , E [Xtα ] , αt , E [αt ])dt + g i (XTα , E[XTα ]) ,
0
−i
i
V (α ) = inf J i (αi , α−i ),
αi ∈Ai

where for each t ∈ [0, T ], x, x ∈ Rd , ai , ai ∈ Rdi , we have set the running cost and terminal cost for
each player:



 f i (t, x, x, a, a) = (x − x)| Qit (x − x) + x| [Qit + Q̃it ]x
+ nk=1 a|k Ik,ti (x − x) + a| (I i + I˜i )x

 P
k k,t k,t



+ nk=1 (ak − ak )| Nk,t i (a − a ) + a (N i + Ñ i )a

 P
k k k k,t k,t k
P | Gi | i i
(3)


 + 0≤k6 = l≤n (a k − a k ) k,l,t (a l − a l ) + ak (G k,l,t + G̃ k,l,t )al
 iT x +
Pn i|



 +2[L x,t k=1 Lk,t ak ]
g i (x, x)

= (x − x)| P i (x − x) + x(P i + P̃ i )x + 2ri| x.

Here all the coefficients are deterministic matrix-valued processes, except Lix , Lik , ri which are vector-
valued F-progressively measurable processes, and | denotes the transpose of a vector or matrix.
We say that α∗ = (α1∗ , ..., αn∗ ) ∈ A is a Nash equilibrium if for any i ∈ J1, nK,

J i (α∗ ) ≤ J i (αi , α∗,−i ), ∀αi ∈ Ai , i.e. , J i (α∗ ) = V i (α∗,−i ).

As it is well-known, the search for a Nash equilibrium can be formulated as a fixed point problem as
follows: first, each player i has to compute its best response given the controls of the other players:
αi? = BRi (α−i ), where BRi is the best response function defined (when it exists) as:

BRi : A−i → Ai
α−i 7→ argmin J i (α, α−i ).
α∈Ai

Then, in order to ensure that (α1? , ..., αn? ) is a Nash equilibrium, we have to check that this candidate
verifies the fixed point equation: (α1? , ..., αi? ) = BR(α1? , ..., αi? ) where BR := (BR1 , ...BRn ).

3
The main goal of this paper is to state a general martingale optimality principle for the search of
Nash equilibria and to apply it to the linear quadratic case. We first obtain best response functions
(or optimal control of each agent conditioned to the control of the others) of each player i of the
following form:

i −1 i i −1 i i i −1 i i
αi,t = −(Si,t ) Ui,t (Xt − E[Xt ]) − (Si,t ) (ξi,t − ξ i,t ) − (Ŝi,t ) (Vi,t E[Xt ] + Oi,t )

where the coefficients in the r.h.s., defined in (5) and (6), depend on the actions α−i of the other
players. We then proceed to a fixed point search for best response function in order to exhibit a
Nash equilibrium that is described in Theorem 2.3.

1.3 Notations and Assumptions


Given a normed space (K, |.|), and for T ∈ R?+ , we set:
( )
L∞ ([0, T ], K) = φ : [0, T ] → K s.t. φ is measurable and sup |φt | < ∞
t∈[0,T ]
 Z T
2 −ρu 2
L ([0, T ], K) = φ : [0, T ] → K s.t. φ is measurable and e |φt | du < ∞
0
L2FT (K) = φ : Ω → K s.t. φ is FT -measurable and E[|φ|2 ] < ∞

( )
S2F (Ω × [0, T ], K) = φ : Ω × [0, T ] → K s.t. φ is F-adapted and E[ sup |φt |2 ] < ∞
t∈[0,T ]
 Z T 
−ρu
L2F (Ω × [0, T ], K) = φ : Ω × [0, T ] → K s.t. φ is F-adapted and e 2
E[|φu | ]du < ∞ .
0

Note that when we will tackle the infinite horizon case we will set T = ∞. To make the notations
less cluttered, we sometimes denote X = X α when there is no ambiguity. If C and C̃ are coefficients
of our model, either in the dynamics or in a cost function, we note: Ĉ = C + C̃. Given a random
variable Z with a first moment, we denote by Z = E[Z]. For M ∈ Rn×n and X ∈ Rn , we denote by
M.X ⊗2 = X | M X ∈ R. We denote by Sd the set of symmetric d × d matrices and by Sd+ the subset
of non-negative symmetric matrices.
Let us now detail here the assumptions on the coefficients.
(H1) The coefficients in the dynamics (2) satisfy:
a) β, γ ∈ L2F (Ω × [0, T ], Rd )
b) bx , b̃x , σx , σ̃x ∈ L∞ ([0, T ], Rd×d ); bi , b̃i , σi , σ̃i ∈ L∞ ([0, T ], Rd×di )

(H2) The coefficients of the cost functional (3) satisfy:


a) Qi , Q̃i ∈ L∞ ([0, T ], Sd+ ), P i , P̃ i ∈ Sd , Nki , Ñki ∈ L∞ ([0, T ], Sd+k ), Iki , I˜ki ∈ L∞ ([0, T ], Rdk ×d )
b) Lix ∈ L2F (Ω × [0, T ], Rd ), Lik ∈ L2F (Ω × [0, T ], Rdk ), ri ∈ L2FT (Rd )
c) ∃δ > 0 ∀t ∈ [0, T ]:
i ≥ δI i| i )−1 I i ≥ 0
Ni,t dk P i ≥ 0 Qit − Ii,t (Ni,t i,t
d) ∃δ > 0 ∀t ∈ [0, T ]:
i|
i ≥ δI
N̂i,t dk P̂ i ≥ 0 Q̂it − Iˆi,t i )−1 Iˆi ≥ 0.
(N̂i,t i,t

Under the above conditions, we easily derive some standard estimates on the mean-field SDE:

4
- By (H1) there exists a unique strong solution to the mean-field SDE (1), which verifies:
h i
E sup |Xtα |2 ≤ Cα (1 + E(|X0 |2 )) < ∞ (4)
t∈[0,T ]
RT
where Cα is a constant which depending on α only through 0 e−ρt E[|αt |2 ]dt.
- By (H2) and (4) we have:
J i (α) ∈ R for each α ∈ A,
which means that the optimisation problem is well defined for each player.

2 A Weak submartingale optimality principle to compute a Nash-


equilibrium
2.1 A verification Lemma
We first present the lemma on which the method is based.
Lemma 2.1 (Weak submartingale optimality principle). Suppose there exists a couple
(α? , (W.,i )i∈J1,nK ), where α? ∈ A and W.,i = {Wα,it , t ∈ [0, T ], α ∈ A} is a family of adapted
processes indexed by A for each i ∈ J1, nK, such that:

(i) For every α ∈ A, E[Wα,i


0 ] is independent of the control αi ∈ A ;
i

(ii) For every α ∈ A, E[Wα,i i α


T ] = E[g (XT , PXT )];
α

(iii) For every α ∈ A, the map t ∈ [0, T ] 7→ E[Sα,i


t ], with
α,i α,i t
St = e Wt + 0 e f (u, Xu , PXuα , αu , Pαu )du is well defined and non-decreasing;
−ρt −ρu i α
R

?
(iv) The map t 7→ E[Sα
t
,i
] is constant for every t ∈ [0, T ];
?
Then α? is a Nash equilibrium and J i (α? ) = E[Wα
0
,i
]. Moreover, any other Nash-equilibrium α̃
α̃,i ?
α ,i i i ?
such that E[W0 ] = E[W0 ] and J (α̃) = J (α ) for any i ∈ J1, nK satisfies the condition (iv).
Proof. Let i ∈ J1, nK and αi ∈ Ai . From (ii) we have immediately J i (α) = E [Sα
T ] for any α ∈ A.
We then have:
(αi ,α?,−i),i (αi ,α?,−i ),i
E[W0 ] = E[S0 ]
(α ,α?−i ),i
≤ E[ST i ] = J i (αi , α?,−i ).
Moreover for αi = αi? we have:
(α?i ,α?,−i ),i (α?i ,α?,−i ),i
E[W0 ] = E[S0 ]
(αi ,α?,−i ),i
= E[ST ] = J i (αi? , α?,−i ),
?
which proves that α? is a Nash equilibrium and J i (α? ) = E[Wα ,i
0 ? ]. Finally, let us suppose that
α̃ ∈ A is another Nash equilibrium such that E[Wα̃,i0 ] = E[W0
α ,i
] and J i (α̃) = J i (α? ) for any
i ∈ J1, nK. Then, for i ∈ J1, nK we have:
? ?
E[Sα̃,i α̃,i α ,i
0 ] = E[W0 ] = E[W0 ] = E[Sα ,i i ? i α̃,i
T ] = J (α ) = J (α̃) = E[ST ].

Since t 7→ E[Sα̃,i
t ] is nondecreasing for every i ∈ J1, nK, this implies that the map is actually constant
and (iv) is verified.

5
2.2 The method and the solution
Let us now apply the optimality principle in Lemma 2.1 in order to find a Nash equilibrium. In the
linear-quadratic case the laws of the state and the controls intervene only through their expectations.
Thus we will use a simplified optimality principle where P is simply replaced by E in conditions (ii)
and (iii) of Lemma 2.1. The general procedure is the following:

Step 1. We guess a candidate for Wα,i . To do so we suppose that Wα,i t = wti (Xtα , E [Xtα ]) for
i
some parametric adapted random field {wt (x, x), t ∈ [0, T ], x, x ∈ R } of the form wti (x, x) =
d

Kti .(x − x)⊗2 + Λit .x⊗2 + 2Yti| x + Rti .


Rt
Step 2. We set Sα,i t = e−ρt wti (Xtα , E [Xtα ]) + 0 e−ρu f i (u, Xuα , E [Xuα ] , αu , E [αu ])du for i ∈
J1, nK and α ∈ A .We then compute dt d
E Sα,i = e−ρt E Dtα,i (with Itô’s formula) where the
   
t
drift Dα,i takes the form:
h d  i
E Dtα,i = E −ρwti (Xtα , E [Xtα ]) + E wti (Xtα , E [Xtα ]) + f i (t, Xtα , E [Xtα ] , αt , E [αt ]) .
  
dt

Step 3. We then constrain the coefficients of the random field so that the conditions of Lemma
2.1 are satisfied. This leads to a system of backward ordinary and stochastic differential
equations for the coefficients of wi .

Step 4. At time t, given the state and the controls of the other players, we seek the action αi
cancelling the drift. We thus obtain the best response function of each player.

Step 5. We compute the fixed point of the best response functions in order to find an open
loop Nash equilibrium t 7→ α?t .

Step 6. We check the validity of our computations.

2.2.1 Step 1: guess the random fields


The process t 7→ E wti (Xtα , E [Xtα ]) is meant to be equal to E g i (XTα , E [XTα ]) at time T , where
   

g(x, x) = P i .(x − x)⊗2 + (P i + P̃ i ).x⊗2 + ri| x with (P, P̃ , ri ) ∈ (Sd )2 × L2FT (Rd ). It is then natural
to search for a field wi of the form wti (x, x) = Kti .(x − x)⊗2 + Λit .x⊗2 + 2Yti| x + Rti with the processes
(K i , Λi , Y i , Ri ) in (L∞ ([0, T ], Sd+ )2 × S2F (Ω × [0, T ], Rd ) × L∞ ([0, T ], R) and solution to:



 dKti = K̇ti dt, KTi = P i
dΛi = Λ̇i dt,

ΛiT = P i + P̃ i
t t


 dYti = Ẏti dt + Zti dWt , 0 ≤ t ≤ T, YTi = ri
dRi = Ṙi dt,

RTi = 0,
t t

where (K̇ i , Λ̇i , Ṙi ) are deterministic processes valued in Sd ×Sd ×R and (Ẏ i , Z i ) are adapted processes
valued in Rd .

2.2.2 Step 2: derive their drifts


For i ∈ J1, nK, t ∈ [0, T ] and α ∈ A, we set:
Z t
Sα,i
t := e −ρt
wti (Xt , E [Xt ]) + e−ρu f i (u, Xuα , E [Xuα ] , αu , E [αu ])du
0

6
and then compute the drift of the deterministic function t 7→ E[Sα,i
t ]:

dE[Sα,i
t ]
= e−ρt E[Dtα,i ]
dt
|
= e−ρt E[(Xt − Xt )| [K̇ti + Φit ](Xt − Xt ) + Xt (Λ̇it + Ψit )Xt + 2[Ẏti + ∆it ]| Xt
i
+ Ṙti − ρRti + Γt + χit (αi,t )],

where we have defined:


i
i
χit (αi,t ) := (αi,t − αi,t )| Si,t (αi − αi,t ) + α|i,t Ŝi,t
i i
αi,t + 2[Ui,t i
(Xt − Xt ) + Vi,t i
Xt + Oi,t i
+ ξi,t − ξ i,t ]| αi,t

with the following coefficients:


|
Kti σx,t + Kti bx,t + b|x,t Kti − ρKti

i = Qit + σx,t
Φt

|
Kti σ̂x,t + Λit b̂x,t + b̂|x,t Λit − ρΛit

Ψit = Q̂it + σ̂x,t




i i
|
= Lix,t + b|x,t Yti + b̃|x,t Y t + σx,t |

i Zti + σ̃x,t Z t + Λit β t
∆t



|
+σx,t K i γt + σ̃ | K i γ t + Kti (βt − β t ) − ρYti (5)
 P t i| x,t t i|
+ k6=i Uk,t (αk,t − αk,t ) + Vk,t αk,t




= γt| Kti γt + 2βt| Yti + 2γt| Zti

Γit





i (α | i i i i | i
+ k6=i (αk,t − αk,t )| Sk,t
P
k,t − αk,t ) + αk,t Ŝk,t αk,t + 2[Ok,t + ξk,t − ξ k,t ] αk,t − ρRt ,

and 
i
Sk,t i + σ| K iσ
= Nk,t k,t t k,t




 i
Ŝk,t i + σ̂ | K i σ̂
= N̂k,t k,t t k,t




Uk,ti i + σ| K iσ
= Ik,t | i
k,t t x,t + bk,t Kt



i = Iˆi + σ̂ | K i σ̂x,t + b̂| Λi



Vk,t k,t k,t t k,t t
i i i

Oi = Lk,t + b̂|k,t Y t + σ̂k,t
| |
Kti γ t

k,t Z t + σ̂k,t
i| (6)
+ 12 k6=i (Jˆi,k,t
i + Jˆk,i,t
P


 )αk,t
i |
= Gik,l,t + σk,t Kti σl,t




Jk,l,t
ˆi |
= Ĝik,l,t + σ̂k,t Kti σ̂l,t

Jk,l,t



i = Lik,t + b|k,t Yti + σk,t
|
Zti + σk,t|
Kti γt




ξk,t
i|
+ 12 k6=i (Ji,k,t
i
 P
 + Jk,i,t )αk,t .

2.2.3 Step 3: constrain their coefficients


Now that we have computed the drift, we need to constrain the coefficients so that Sα,i satisfies
the condition of Lemma 2.1. Let us assume for the moment that Si,t i and Ŝ i are positive definite
i,t
matrices (this will be ensured by the positive definiteness of K). That implies that there exists an
i θ i| = Ŝ i for all t ∈ [0, T ]. We can now rewrite the drift as: "a
invertible matrix θti such that θti Si,t t i,t
square in αi " + "other terms not depending in αi ". Since we can form the following square:

E[χit (αi,t )] = E[(αi,t − αi,t + θti| αi,t − ηti )Si,t


i
(αi,t − αi,t + θti| αi,t − ηti ) − ζti ]

7
with: 
ηt
 i = ai,0 i| i,1
t (Xt , Xt ) + θt at (Xt )
i −1 U i (x − x) − S i −1 (ξ i − ξ i )

 i,0  
at (x, x)

 = − Si,t i,t i,t i,t i,t
i −1 (V i x + O i )

ai,1 (x)

= − Ŝi,t

t i,t i,t
i i| −1 i| −1 i

ζt = (Xt − Xt )| Ui,t Sti Uti (Xt − Xt ) + Xt Vi,t Ŝti Vi,t Xt
−1 i −1

i|
+2(Ui,t Sti i − ξ ) + V i Ŝ i i )X
  



 (ξi,t i,t i,t i,t Oi,t t
 i i | i
 −1 i i i| i
−1 i
 +(ξi,t − ξ i,t ) Si,t (ξi,t − ξ i,t ) + Oi,t Ŝi,t Oi,t ,
we can then rewrite the drift in the following form:
|
E[Dtα,i ] = E[(Xt − Xt )| [K̇ti + Φi0 i i0 i i0 |
t ](Xt − Xt ) + Xt (Λ̇t + Ψt )Xt + 2[Ẏt + ∆t ] Xt

+ Ṙti + Γi0
t
+ (αi,t − αi,t + θti| αi,t − ηti )Si,t
i
(αi,t − αi,t + θti| αi,t − ηti )],

where
i −1 U i

i|
Φi0 = Φit − Ui,t


 t Si,t i,t
i −1 V i

Ψi0 i|
= Ψit − Vi,t
 
t Ŝi,t
i| −1 i,ti i i| −1 i (7)


∆i0
t
i i
= ∆t − Ui,t Si,t (ξi,t − ξ i,t ) − Vi,t Ŝti Oi,t
Γi0
 i i i | i
−1 i i i| i −1 O i .

t = Γt − (ξi,t − ξ i,t ) Si,t (ξi,t − ξ t ) − Oi,t Ŝi,t i,t

We can finally constrain the coefficients. By choosing the coefficients K i , Γi , Y i and Ri so that
only the square remains, the drift for each player i ∈ J1, nK can be rewritten as a square only (in the
next step we will verify that we can indeed choose such coefficients). More precisely we set K i , Γi , Y i
and Ri as the solution of:

i i0 KTi = P i
dKt = −Φt dt


dΛi = −Ψi0 dt

ΛiT = P i + P̃ i
t t
i = −∆i0 dt + Z i dW i = ri
(8)


 dY t t t t Y T
dRi = −Γi0 dt

RTi = 0,
t t

and stress the fact that Y i , Z i , Ri depend on α−i , which appears in the coefficients ∆i0 , and Γi0 .
With such coefficients the drift takes now the form:

E[Dtα,i ] = E[(αi,t − αi,t + θti| αi,t − ηti )Si,t


i
(αi,t − αi,t + θti| αi,t − ηti )]
= E[(αi,t − αi,t − ai,1 i| i,0 i i,1 i| i,0
t + θt (αi,t − at ))Si,t (αi,t − αi,t − at + θt (αi,t − at ))]

and thus satisfies the nonnegativity constraint: E[Dtα,i ] ≥ 0, for all t ∈ [0, T ], i ∈ J1, nK, and α ∈ A.

2.2.4 Step 4: find the best response functions


Proposition 2.2. Assume that for all i ∈ J1, nK, (K i , Λi , Y i , Z i , Ri ) is a solution of (8) given α−i
∈ A−i . Then the set of processes

αi,t = ai,0 i,1


t (Xt , E[Xt ]) + at (E[Xt ])
i (9)
i −1 i i −1 i i −1 i i
  
= − Si,t Ui,t (Xt − E[Xt ]) − Si,t (ξi,t − ξ i,t ) − Ŝi,t (Vi,t E[Xt ] + Oi,t )

8
(depending on α−i ) where X is the state process with the feedback controls α = (α1 , ..., αn ), are
best-response functions, i.e., J i (αi , α−i ) = V i (α−i ) for all i ∈ J1, nK. Moreover we have

V i (α−i ) = E[Wi,α
0 ]
⊗2
= E[K0i .(X0 − X 0 )⊗2 + Λi0 .X 0 + 2Y0i| X0 + R0i ].

Proof. We check that the assumptions of Lemma 2.1 are satisfied. Since Wα,i is of the form
Wα,i
t = wti (Xtα , E[Xtα ]), condition (i) is verified. The condition (ii) is satisfied thanks to the termi-
nal conditions imposed on the system (8). Since (K i , Λi , Y i , Z i , Ri ) is solution to (8), the drift of
t 7→ E[Sα,i ] is positive for all i ∈ J1, nK and all α ∈ A, which implies condition (iii). Finally, for
α ∈ A, we see that E[Dtα,i ] ≡ 0 for t ∈ [0, T ] and i ∈ J1, nK if and only if:

αi,t − αi,t − ai,1 α α i| i,0 α


t (Xt , E[Xt ]) + θt (αi,t − at (E[Xt ])) = 0 a.s. t ∈ [0, T ].

Since θti is invertible, we get αi,t = ai,0 t by taking the expectation in the above formula. Thus
E[Dt ] ≡ 0 for every i ∈ J1, nK and t ∈ [0, T ] if and only if αi,t = αi,t + ai,1
α,i i,1 i,0
t = at + at for every
i ∈ J1, nK and t ∈ [0, T ]. For such controls for the players, the condition (iv) is satisfied. We now
check that αi ∈ Ai for every i ∈ J1, nK (i.e. it satisfies the square integrability condition). Since
X is solution to a linear Mckean-Vlasov dynamics and satisfies the square integrability condition
E[sup0≤t≤T |Xt |2 ] < ∞, it implies that αi ∈ L2F (Ω × [0, T ], Rdi ) since Sii , Uii , Ŝii , Vii are bounded and
(Oii , ξii ) ∈ L2 ([0, T ], Rdi ) × L2 (Ω × [0, T ], Rdi ). Therefore αi ∈ Ai for every i ∈ J1, nK.

2.2.5 Step 5: search for a fixed point


We now find semi-explicit expressions for the optimal controls of each player. The issue here is
the fact that the controls of the other players appear in the best response functions of each player
through the vectors (Y 1 , Z 1 ), ..., (Y n , Z n ). To solve this fixed point problem, we first rewrite (9)
and the backward equations followed by (Y, Z) = ((Y 1 , Z 1 ), ..., (Y n , Z n )) in the following way (note
that we omit the time dependence of the coefficients to make the notations less cluttered):



 α?t − α? t = Sx (Xt − X t ) + Sy (Yt − Y t ) + Sz (Zt − Z t ) + H − H

?
α t = Ŝx X t + Ŝy Y t + Ŝz Z t + Ĥ



dY t = Py (Yt − Y t ) + Pz (Zt − Zt ) + Pα (αt − αt ) + F − F (10)

+ P̂y Y t + P̂z Z t + P̂α αt + F̂ dt






 +Z dW ,
t t

9
where we define
= (Sii )−1 1i=j i,j∈J1,nK
 

S
  
= (Ŝii )−1 1i=j

Ŝ




  i,j∈J1,nK

 1 i i


J = 2 (Jij + Jji )1i6=j


  i,j∈J1,nK 
Jˆ = 12 (Jˆiji + Jˆjii )1

= (1i=j (Uii (Sii )−1 b|i − b|x + ρ))i,j∈J1,nK
 i6=j


 Py
i,j∈J1,nK 
 
−1
J = (1i=j (Vii (Ŝii )−1 b̂|i − b̂|x + ρ))i,j∈J1,nK
 


 = − (Id + SJ ) S 

P̂y
 −1 
= (1i=j (Uii (Sii )−1 σi| − σx| ))i,j∈J1,nK
  



Ĵ = − Id + Ŝ Jˆ Ŝ



Pz
= (1i=j (Vii (Sii )−1 σ̂i| − σ̂x| ))i,j∈J1,nK
 

Sx = J Uii i∈J1,nK
 P̂z
i|
Pα = −(1i6=j (Uji + Uii (Sii )−1 (Jiji + Jji )))i,j∈J1,nK
= Ĵ Vii i∈J1,nK
  

Ŝx 

i|

 
P̂α i i i −1 ˆi ˆ
= −(1i6=j (Vj + Vi (Ŝi ) (Jij + Jji )))i,j∈J1,nK
= J (1i=j b|i )i,j∈J1,nK
 
Sy

 

= (K i β + σx| K i γ)i∈J1,nK
 

   
F
= Ĵ 1i=j b̂|i
 
Ŝy 
= (Uii (Sii )−1 (Li + σi| K i γ) − Lx − σx| K i γ − K i β)i∈J1,nK .

 
F̂

 i,j∈J1,nK
|
= J (σi )i∈J1,nK

Sz




= Ĵ (σ̂i| )i∈J1,nK

Ŝz




= J Lii + σi| K i γ
 



H i∈J1,nK
| i 
= Ĵ Lii

Ĥ + σ̂i K γ i∈J1,nK

(11)

Now, the strategy is to propose an ansatz for t ∈ [0, T ] 7→ Y t in the form:

Y t = πt (Xt − X t ) + π̂t X t + ηt (12)

where (π, π̂, η) ∈ L∞ ([0, T ], Rnd×d ) × L∞ ([0, T ], Rnd×d ) × S2F (Ω × [0, T ], Rnd ) satisfy:

i
dηt = ψt dt + φt dWt , ηT = r = (r )i∈J1,nK

dπt = π̇t dt, πT = 0
dπ̂t = π̂˙ t dt, π̂T = 0.

By applying Itô’s formula to the ansatz we then obtain:

dY t = π̇t (Xt − X t )dt + πt d(Xt − X t ) + π̂˙ t X t dt + π̂t dX t + ψt dt + φt dW


 
= dt π̇t (Xt − X t ) + ψt − ψ t + πt β − β + bx (Xt − X t ) + B(αt − αt )
h  i
+ dt π̂˙ t X t + ψ t + π̂t β + b̂x X t + B̂αt
h  i
+ dWt φt + πt γ + σx Xt + σ̃x X t + Σαt + Σ̃αt .

By comparing the two Itô’s decompositions of Y , we get





Py (Yt − Y t ) + Pz (Zt − Z t ) + Pα (αt − αt ) + F − F = π̇t (Xt − X t ) + ψt − ψ t
 

 +πht β − β + bx (Xt− X t ) + B(αt − i αt )

P̂y Y t + P̂z Z t + P̂α αt + F̂ = π̂˙ t X t + ψ t + π̂t β + b̂x X t + B̂αt

 h  i
Z t

= φt + πt γ + σx Xt + σ̃x X t + Σαt + Σ̃αt .
(13)

10
We now substitute the Y by its ansatz in the best response equation (10), and obtain the system:
 ? ?
(Id − Sz πΣ)(αt − α t ) = (Sx + Sy πt + Sz πt σx )(Xt − X t )

+(H − H + Sy (ηt − η t ) + Sz (φt − φt + πt (γ − γ))) (14)

(Id − Ŝz πt Σ̂)α? t = (Ŝx + Ŝy π̂t + Ŝz πt σ̂x )X t + (Ĥ + Ŝy η t + Ŝz (φt + πt .γ)).

To make the next computations slightly less painful we rewrite (14) as




α?t − α? t = Ax (Xt − X t ) + Rt − Rt

α? t



 = Âx X t + R̂t




 where
Ax := (Id − Sz πt Σ)−1 (Sx + Sy πt + Sz πt σx ) (15)

Âx := (Id − Ŝz πt Σ̂)−1 (Ŝx + Ŝy πt + Ŝz πt σ̂x )





Rt := (Id − Sz πt Σ)−1 (H + Sy ηt + Sz (φt + πt γ))





R̂t := (Id − Ŝz πt Σ̂)−1 (Ĥ + Ŝy η + Ŝz (φt + πt γ)).

By injecting (14) into (13) we have:





 0 = [π̇t + πt bx − Py πt − Pz πt (σx + ΣAx ) − (Pα − πt B)Ax ] (Xt − X t )
 +ψt − ψ t + πt (β − β) − (Pα − πt B)(R − R) − (F − F )




−Phz (φt − φt + πt (γ − γ + Σ(R − R))) − Py (ηt − η t ) i
0 = π̂˙ t + π̂t b̂x − P̂y π̂ − P̂z πt (σ̂x + Σ̂Âx ) − (P̂α − π̂t B̂)Âx X t






 +ψ + π̂ β − (P̂ − π̂ B̂)R̂ − F̂ − P̂ (φ + π (γ + Σ̂R̂)) − P̂ η.
t t α t z t t y

Thus we constrain the coefficients (π, π̂, ψ, φ) of the ansatz of Y to satisfy:





 π̇t = −πbx + Py πt + Pz πt σx + (Pα + Pz Σ)(Id − Sz πt Σ)−1 (Sx + Sy πt + Sz πt σx )
−πt B(Id − Sz πt Σ)−1 (Sx + Sy πt + Sz πt σx )









 πT =0
π̂˙

= −π̂t b̂x + P̂y π̂t + P̂z πt σ̂x + (P̂α + P̂z Σ̂)(Id − Ŝz πt Σ̂)−1 (Ŝx + Ŝy π̂t + Ŝz πt σ̂x )


 t
−π̂t B̂(Id − Ŝz πt Σ̂)−1 (Ŝx + Ŝy π̂t + Ŝz πt σ̂x )






π̂T

 =0


dη = ψt dt + φt dW
t
(16)

 ηT =r




 where:

ψt − ψ t = −πt (β − β) + (Pα − πt B)(R − R) + (F − F )





+Pz (φt − φt + πt (γ − γ + Σ(R − R))) + Py (ηt − η t )









 ψt = −π̂t β + (P̂α − π̂t B̂)R̂ + F̂ + P̂z (φt + πt (γ + Σ̂R̂)) + P̂y η t
:= (Id − Sz πΣ)−1 (H + Sy η + Sz (φ + πγ))




 Rt
:= (Id − Ŝz π Σ̂)−1 (Ĥ + Ŝy η + Ŝz (φ + πγ)).

R̂
t

We now have a feedback form for (Y, Z) = ((Y 1 , Z 1 ), ..., (Y n , Z n )). We can inject it in the best
response functions α? in order to obtain the optimal controls in feedback form. We then inject these
latter in the state equation in order to obtain an explicit expression of t 7→ Xt? .

11
2.2.6 Step 6: check the validity
Let us now check the existence and uniqueness of t 7→ (Kti , Λit , (Yti , Zti ), Rti , πt , π̂t , (ηt , φt )) where
K i ∈ L∞ ([0, T ], Sd+ ), Λi ∈ L∞ ([0, T ], Sd+ ), (Yti , Zti ) ∈ S2F (Ω × [0, T ], Rd ) × L2F (Ω × [0, T ], Rd ), Ri ∈
L∞ ([0, T ], R), π, π̂ ∈ L∞ ([0, T ], Rnd×d ) and (η, φ) ∈ S2F (Ω × [0, T ], Rnd ) × L2F (Ω × [0, T ], Rnd ), under
the assumptions (H1)-(H2). We recall that t 7→ (Kti , Λit , (Yti , Zti ), Rti and t 7→ (πt , π̂t , (ηt , φt )) are
solutions respectively to (8) and (16). Fix i ∈ J1, nK:
(i) We first consider the coefficients K i which follow Ricatti equations:
| |

i i i i i i
K̇t + Qt + σx,t Kt σx,t + Kt bx,t + bx,t Kt − ρKt

i + σ| K iσ
−(Ik,t | i i | i −1 i | i | i
 k,t t x,t + bk,t Kt )(Nk,t + σk,t Kt σk,t ) (Ik,t + σk,t Kt σx,t + bk,t Kt ) = 0
 i
KT = P i.
By standard result in control theory (see [16], Ch. 6, Thm 7.2]) under (H1) and (H2) there
exists a unique solution K i ∈ L∞ ([0, T ], Sd+ ).
(ii) Given K i let us now consider the Λi ’s. They also follow Ricatti equations:
(
| ˆiK | i iK −1 ˆiK | i
Λ̇i + Q̂iK i i i
t + Λt b̂x,t + b̂x,t Λt − ρΛt − (It + b̂i,t Λt )(N̂t ) (It + b̂i,t Λt ) = 0
ΛiT = P̂ i
where we define:
|

iK := Q̂it + σ̂x,t Kti σ̂x,t
Q̂t

|
IˆtiK := Iˆi,t
i + σ̂ K i σ̂
i,t t x,t
|

 iK
N̂t i
:= N̂i,t + σ̂i,t Kti σ̂i,t .
We need the same arguments as for K i . The only missing argument to conclude the existence
and uniqueness of Λi is: Q̂iK − (IˆiK )| (N̂ iK )−1 IˆiK ≥ 0. As in [2] we can prove with some
algebraic calculations that it is implied by the hypothesis Q̂i − (Iˆi )| (N̂ i )−1 Iˆi ≥ 0 that we
made in (H2).
(iii) Given (K i , Λi ) we now consider the equation for (Y i , Z i ) which is a linear mean-field BSDE
of the form:
(
dYti = Vit + Git (Yti − E[Yti ]) + Ĝit E[Yti ] + Jit (Zti − E[Zti ]) + Ĵit E[Zti ] + Zti dWt
(17)
YTi = ri ,

where the deterministic coefficients Git , Ĝi , Ji , Ĵi ∈ L∞ ([0, T ], Rd×d ) and the stochastic process
Vi ∈ L2 ([0, T ], Rd ) are defined as:
  
V i := −Li − Λi β − σ | K i γ − σ̃ | K i γ − K i (β − β ) − i (α i α
P

 t
 x,t t t x,t t t x,t t t t t t k6=i U k,t k,t − α k,t ) + Vk,t k,t

 i| i−1 i i | i 1 P i i|



 +Ui,t St (Lk,t − E[Lk,t ] + σi,t Kt (γt − E [γt ]) + 2 k6=i (Ji,k,t + Jk,i,t )(αk,t − E [αk,t ]))
i| i−1 | i|
Ŝt (E[Lik,t ] + σ̂i,t Kti − E [γt ] + 21 k6=i (Jˆi,k,t
i + Jˆk,i,t
 P


 +Vi,t )E [αk,t ])
| i| i−1 |
 Gt := ρId − bx,t + Ui,t Si,t bi,t
i
i| i−1 |
Ĝit = ρId − b̂|x,t + Vi,t




 Ŝi,t b̂i,t
| i| |
Jt := −σx,t + Ui,t Si,t σk,t
i i−1





Ĵi := −σ̂ | + V i| Ŝ i−1 σ | .

t x,t i,t i,t k,t
(18)
By standard results (see Thm. 2.1 in [12]) we obtain that there exists a unique solution
(Y i , Z i ) ∈ S2F (Ω × [0, T ], Rd×d ) × L2F (Ω × [0, T ], Rd×d ) to (17).

12
(iv) Given (K i , Λi , Y i , Z i ) we consider the equation of Ri which is a linear ODE whose solution is
given by: Z T
Rti = e−ρ(s−t) his ds,
t

where hi is a deterministic function defined by:

hit = −E[γt| Kti γt + 2βt| Yti + 2γt| Zti +


i
(αk,t − αk,t ) + α|k,t Ŝk,t
X
i i i i
(αk,t − αk,t )| Sk,t αk,t + 2[Ok,t + ξk,t − ξ k,t ]| αk,t ]
(19)
k6=i
h i
i i i| i−1 i
i
+ E (ξi,t − ξ i,t )| Sti−1 (ξi,t
i
− ξ t ) − Oi,t Ŝt Oi,t ,

and the expressions of the coefficients are recalled in (6).

(v) The final step is to verify that the procedure to find a fixed point is valid. More precisely we
need to ensure that t 7→ (πt , π̂t , ηt ) is well defined. It is difficult to ensure the well posedness of
t 7→ (πt , π̂t ) for two reasons: first because π and π̂ follow Ricatti equations but are not squared
matrices; and second because π appears in the equation followed by π̂. We are not aware of
any work addressing this kind of equations in a general setting.
If we suppose t 7→ (πt , π̂t ) well defined, then t 7→ (ηt , φt ) follows a linear mean-field BSDE of
the type:
(  
dηt = Vt + Gt (ηt − E[ηt ]) + Ĝt E [ηt ] + Jt (φt − E[φt ]) + Ĵt E [φt ] dt + φt dWt
(20)
ηT = r,

where the deterministic coefficients Git , Ĝi , Ji , Ĵi ∈ L∞ ([0, T ], Rnd×nd ) and the stochastic process
Vi ∈ L2 ([0, T ], Rnd ) are defined as:

Vt := −π(β − β) + F − F + Pz π(γ − γ) + (Pα − πB + Pz πΣ)(Id − Sz πΣ) (H − H + Sz π(γ − γ))
 −1

−π̂β + F̂ + P̂z πγ + (P̂α − π̂ B̂ + P̂z π Σ̂)(Id − Ŝz π Σ̂)−1 (Ĥ + Ŝz π̂γ)





G := P + (P − πB + P πΣ)((I − S πΣ)−1 S )

t y α z d z y


 Ĝt = P̂y
J −1 S )
  



 t := P z + (P α − πB + P z πΣ)((Id − S z πΣ) z
Ĵ := −σ̂ | + V i| Ŝ i−1 σ | .

t x,t i,t i,t k,t

Again, by standard results (see Thm. 2.1 in [12]) we obtain that there exists a unique solution
(η, φ) ∈ S2 (Ω × [0, T ], Rd×d ) × L2F (Ω × [0, T ], Rd×d ) to (20).
To sum up the arguments previously presented, our main result provides the following charac-
terization of the Nash equilibrium:
Theorem 2.3. Suppose assumptions (H1) and (H2). Suppose also that the system associated with
the fixed point search (16) is well defined. Then α? = (α1 , ..., αn ) defined by
?
(
α?t − α? t = Sx,t (Xt? − X t ) + Sy,t (Yt − Y t ) + Sz,t (Zt − Z t ) + Ht − H t
?
α? t = Ŝx,t X t + Ŝy,t Y t + Ŝz,t Z t + Ĥ t

where Sx , Sy , Sz , H, Ŝx , Ŝy , Ŝz , Ĥ are defined in (11), (Y, Z) are in S2F (Ω×[0, T ], Rnd×d )×L2F (Ω×
[0, T ], Rnd×d ) and satisfy (12) and (16), is a Nash equilibrium.

13
3 Some extensions
3.1 The case of infinite horizon
Let us now tackle the infinite horizon case. The method is similar to the finite-horizon case but some
adaptations are needed when dealing with the well posedness of (K i , Λi , Y i , Ri ) and the admissibility
of the controls.
We redefine the set of admissible controls for for each player i ∈ J1, nK as:
 Z ∞ 
−ρu
A = α : Ω × [0, T ] → R s.t. α is F-adapted and
i di 2
e E[|αu | ]du < ∞
0
while the controlled state defined on R+ now follows a dynamics of the form
(
dXt = b(t, Xtα , E [Xtα ] , αt , E [αt ])dt + σ(t, Xtα , E [Xtα ] , αt , E [αt ])dWt
(21)
X0α = X0

where for each t ∈ [0, T ], x, x ∈ Rd , ai , ai ∈ Rdi :


 Pn


 b(t, x, x, a, a) = β t + bx x + b̃x x + i=1 bi ai + b̃i ai

 = βt + bx x + b̃x x + Bat + B̃at
Pn (22)


 σ(t, x, x, a, a) = γ t + σ x x + σ̃ x x + i=1 σi ai + σ̃i ai

 = γt + σx x + σ̃x x + Σa + Σ̃a.

Notice that now only the coefficients β and γ are allowed to be stochastic processes. The other
linear coefficients are constant matrices.
The goal of each player i ∈ J1, nK during the game is still to minimize her cost functional with
respect to control αi over Ai , and given control α−i of the other players:
Z ∞ 
i −i −ρt i α α
J (αi , α ) = E e f (t, Xt , E [Xt ] , αt , E [αt ])dt (23)
0

where for each t ∈ [0, T ], x, x ∈ Rd , ai , ai ∈ Rdi we have set the running cost for each player as:


 f i (t, x, x, a, a) = (x − x)| Qi (x − x) + x| [Qi + Q̃i ]x
+ nk=1 a|k Iki (x − x) + a|k (Iki + I˜ki )x

 P



+ nk=1 (ak − ak )| Nki (ak − ak ) + ak (Nki + Ñk,i )a
P
k
| Gi (a − a ) + a| (Gi + G̃i )a
 P
+ (a − a )

0≤k6=l≤n k k k,l l l k,l l


 k k,l
+2[Li| i|
 Pn
x + L a ].

x,t k=1 k,t k

Note that the only coefficients that we allow to be time dependent are Lix and Lik for k ∈ J1, nK
which may be stochastic processes.

3.1.1 Assumptions
We detail below the new assumptions:
(H1’) The coefficients in (22) satisfy:
a) β, γ ∈ L2F (Ω × [0, T ], Rd )
b) bx , b̃x , σx , σ̃x ∈ Rd×d ; bi , b̃i , σi , σ̃i ∈ Rd×di
(H2’) The coefficients of the cost functional (3) satisfy:

14
a) Qi , Q̃i ∈ Sd+ ; Nki , Ñki ∈ Sd+k ; Iki , I˜ki ∈ Rdk ×d
b) Lix ∈ L2F (Ω × R?+ , Rd ), Lik ∈ L2F (Ω × R?+ , Rdk )
c) Nki > 0 Qi − Iii| (Nii )−1 Iii ≥ 0
d) i >0
N̂k,t Q̂i − Iˆii| (N̂ii )−1 Iˆii ≥ 0
 
(H3’) ρ > 2 |bx | + |b̃x | + 8(|σx |2 + |σ̃x |2 )
As shown below, the new hypothesis (H3’) ensure the well posedness of our problem. Notice
first that by (H1’) and classical results, there exists a unique strong solution X α to the SDE (21).
Furthermore by (H1’) and (H3’) we obtain by similar arguments as in [2] the following estimate:
Z ∞
e−ρu E[|Xuα |2 ]du ≤ Cα (1 + E[|X0 |2 ]) < ∞, (24)
0
R∞
in which Cα is a constant depending on α = (α1 , ..., αn ) only through 0 e−ρ E[|αu |2 ]du. Finally by
(H2’) and (24) the minimizing problem (23) is well defined for each player.

3.1.2 A weak submartingale optimality principle on infinite horizon


We now give an easy adaptation of the weak submartingale optimality principle in the case of infinite
horizon.

Lemma 3.1 (Weak submartingale optimality principle). Suppose there exists a couple
(α? , (W.,i )i∈J1,nK ), where α? ∈ A and W.,i = {Wα,i
t , t ∈ R+ , α ∈ A} is a family of adapted processes
?

indexed by A for each i ∈ J1, nK, such that:

(i) For every α ∈ A, E[Wα,i


0 ] is independent of the control αi ∈ A ;
i

h i
(ii) For every α ∈ A, limt→∞ e−ρt E Wα,i
t = 0;
h i
(iii) For every α ∈ A, the map t ∈ R?+ 7→ E Sα,i t , with
t
Sα,i = e−ρt Wα,i −ρu f i (u, X α , P α , α , P )du is well defined and non-decreasing;
R
t t + 0 e u Xu u αu
?
(iv) The map t 7→ E[Sα
t
,i
] is constant for every t ∈ R?+ ;
?
Then α? is a Nash equilibrium and J i (α? ) = E[Wα
0
,i
]. Moreover, any other Nash-equilibrium α̃
α̃,i α? ,i i i ?
such that E[W0 ] = E[W0 ] and J (α̃) = J (α ) for any i ∈ J1, nK satisfies the condition (iv).

Proof. The proof is exactly the same as in Lemma 2.1.

Let us now describe the steps to follow in order to apply Lemma 3.1. Since they are similar to
the ones in the finite-horizon case, we only report the main changes.
Steps 1-3
For each player i ∈ J1, nK we still search for a random field {wti (x, x), t ∈ [0, T ], x, x ∈ Rd } of the
form wti (x, x) = Kti .(x − x)⊗2 + Λit .x⊗2 + 2Yti| x + Rti for which the optimality principle in Lemma
3.1 now leads to the system: 


dKti = −Φi0 t dt
dΛi = −Ψi0 dt

t t
i i0 dt + Z i dW
(25)


dY t = −∆ t t t
dRi = −Γi0 dt, t ≥ 0.

t t

15
Notice that there are no terminal conditions anymore since we are in the infinite horizon case. The
coefficients Φi0 , Ψi0 , ∆i0 i0
t , Γt are defined in (7). The fourth step is exactly the same as in the finite
horizon case.
Step 5
We now search for a fixed point of the best response functions. Let us define Y = (Y 1 , ...Y n )
and propose an ansatz in a feedback form Y : Yt = π(Xt − E [Xt ]) + π̂E [Xt ] + ηt where π, π̂ ∈
L∞ (R+ , Rnd×d ) and η ∈ L2F (Ω × R+ , Rnd ) satisfy


 0 = −πbx + Py π + Pz πσx + (Pα + Pz Σ)(Id − Sz πΣ)−1 (Sx + Sy π + Sz πt σx )

−πB(Id − Sz πΣ)−1 (Sx + Sy π + Sz πσx )





0 = −π̂ b̂x + P̂y π̂ + P̂z πσ̂x + (P̂α + P̂z Σ̂)(Id − Ŝz π Σ̂)−1 (Ŝx + Ŝy π̂ + Ŝz πσ̂x )



−π̂ B̂(Id − Ŝz π Σ̂)−1 (Ŝx + Ŝy π̂ + Ŝz πσ̂x )






dηt = ψt dt + φt dWt



with: (26)

ψt − ψ t = −πt (β − β) + (Pα − πt B)(R − R) + (F − F )+





Pz (φt − φt + πt (γ − γ + Σ(Rt − Rt ))) + Py (ηt − η t )





ψt = −π̂t β + (P̂α − π̂t B̂)R̂t + F̂ + P̂z (φ + πt (γ + Σ̂R̂t )) + P̂y η t





Rt := (Id − Sz πt Σ)−1 (H + Sy ηt + Sz (φt + πt γ))





R̂t := (Id − Ŝz πt Σ̂)−1 (Ĥ + Ŝy ηt + Ŝz (φt + πt γ))

where the coefficients are defined in (11).


Step 6
We finally tackle the well-posedness of (25) and (26).
(i) We first consider the ODE for K i . Since the map (t, k) 7→ φi0
t (k) does not depend on time (all
the coefficient being constant) we search for a constant non-negative matrix K i ∈ Sd satisfying
Φi0 (K i ) = 0, more precisely solution to:

Qi + σx| K i σx + K i bx + b|x K i − ρK i
(27)
− (Iki + σk| K i σx + b|k K i )(Nki + σk| K i σk )−1 (Iki + σk| K i σx + b|k K i ) = 0.

As in [2] we can show using a limit argument that there exists K i ∈ Sd+ solution to (27). The
argument for Λi is the same as for K i .
(ii) Given (K i , Λi ) the equation for (Y i , Z i ) is a linear mean-field BSDE on infinite horizon:

dYti = Vit + Gi (Yti − E[Yti ]) + Ĝi E[Yti ] + Ji (Zti − E[Zti ]) + Ĵi E[Zti ] + Zti dWt ,

where the coefficient are defined in (18). Notice that now Gi , Ĝi , Ji , Ĵi are all constant matrices.
To the best of our knowledge, there are no general results ensuring the existence for such
equation. We then add the following assumption:

(H4’) There exists a solution (Y i , Z i ) ∈ ×S2F (Ω × R+ , Rd ) × L2F (Ω × R+ , Rd )


(iii) Given (K i , Λi , Y i , Z i ) the equation for Ri is a linear ODE whose solution is:
Z ∞
i
Rt = e−ρ(s−t) his ds,
t

where hi is a deterministic function defined in (19).

16
(iv) We now study the well posedness of the fixed point procedure. More precisely we need to
ensure that the process t → (π, π̂, ηt ) defined as a solution of the system (26), recalled below,
is well defined. Note that in the infinite horizon framework we search for constant π and π̂.


 0 = −πbx + Py π + Pz πσx + (Pα + Pz Σ)(Id − Sz πΣ)−1 (Sx + Sy π + Sz πσx )

−πB(Id − Sz πΣ)−1 (Sx + Sy π + Sz πσx )




0 = −π̂ b̂x + P̂y π̂ + P̂z πσ̂x + (P̂α + P̂z Σ̂)(Id − Ŝz π Σ̂)−1 (Ŝx + Ŝy π̂ + Ŝz πσ̂x ) (28)


 −1
−π̂ B̂(Id − Ŝz π Σ̂) (Ŝx + Ŝy π̂ + Ŝz πσ̂x )



dηt = ψt dt + φt dWt , t ≥ 0.

Existence of (π, π̂) in whole generality is a difficult problem. Let us first rewrite the system
(28) as:
F ((π, π̂), C) = 0
Where C = (bx , σx , B, Σ, b̂x , σ̂x , B̂, Σ̂, Sx , Sy , Sz , Ŝx , Ŝy , Ŝz , Py , Pz ,Pα , P̂y , P̂z , P̂α ). Note that F
is continuously differentiable on its domain of definition. Thus, if ∂F ∂π , ∂ π̂ (π, π̂, C) is invertible
∂F

for, then, by the implicit function theorem, there exists an open set U containing C and a
continuously differentiable function g : U 7→ (Rnd×d )2 such that for all admissible coefficients
C ∈ U : F (g(C), C) = 0 and the solutions  ∂F ∂F (π,π̂) = g(C) are unique. It means that if we find a
solution to (28) while the condition ∂π , ∂ π̂ is invertible, then for small perturbations on the
coefficients we still have solutions for (π, π̂).
Let us now give sufficient conditions to ensure the existence of (π, π̂) in a simplified setting
where the state t → Xt belongs to R and all the players are symmetric in the sense that all
the coefficients associated with each player are equals (b1 = ... = b2 , Q1 = ... = Qn , etc.). We
suppose also that the volatility is not controlled i.e. σi = 0 for all i ∈ J1, nK. In such a case
π, π̂ ∈ Rn×1 , π1 = ... = πn , π̂n = ... = π̂n and the systems (28) of coupled equations now
reduces to two coupled second order equations:
 h i P
π 2 [nb1 Sy,1 ] + π1 −bx + Py,1 + Pz,1 σx + P P S − nb S
1 x,1 +
1 j6=1 α,1,j y,1 j6=1 Pα,1,j Sx,j = 0
h i h i P
π̂ 2 nb̂1 Ŝy,1 + π̂1 −b̂x + P̂y,1 +
P
1 j6=1 P̂α,1,j Ŝy,1 − nb̂1 Ŝx,1 + j6=1 P̂α,1,j Ŝx,j + π1 P̂z,1 σ̂x = 0.

If we note: 
a := nb1 Sy,1

P
b := −bx + Py,1 + Pz,1 σx + j6=1 Pα,1,j Sy,1 − nb1 Sx,1
 P
c := j6=1 Pα,1,j Sx,j ,

then a sufficient condition for π1 to exists is simply: b2 − 4ac ≥ 0. Since a ≤ 0 and c ≥ 0 we


have two possibilities a priori for π1 . We choose the positive one to ensure that α ∈ A. Then
if we note: 
â
 := nb̂1 Ŝy,1
P
b̂ := −b̂x + P̂y,1 + j6=1 P̂α,1,j Ŝy,1 − nb̂1 Ŝx,1
 P
ĉ(π1 ) := j6=1 P̂α,1,j Ŝx,j + π1 P̂z,1 σ̂x ,

a sufficient condition for π̂1 to exist is b̂2 − 4âĉ(π1 ) ≥ 0. To ensure that there is a positive
solution we also need ĉ(π1 ) ≥ 0.

17
(v) Let us finally verify that α? ∈ A. Let us consider the candidate for the optimal control for
each player:

α? − α? = Ax (X − X) + Rt − Rt
α? = Âx X + R̂t

where the coefficients are defined


P in (15) and X ? is the state process optimally controlled.
Since R, R̂ ∈ L2F (Ω × R+ , R i di ) and given that the coefficient are constant in the infinite
horizon case, we need to verify that:
Z ∞ Z ∞
−ρu ? ? 2
e E[|Xu − E[Xu ]| ]du < ∞ e−ρu |E[Xu? ]|2 du < ∞.
0 0

As we will see below, we will have to choose ρ large enough to ensure these conditions. From
the above expressions we see that X ? satisfies:
(
dXt? = b?t dt + σt? dWt
X0 = x0

with:

b?t = βt? + B ? (Xt? − E[Xt? ]) + B̂ ? E[Xt? ] σt? = γt? + Σ? (Xt? − E[Xt? ]) + Σ̂? E[Xt? ]

where we define

B ? = bx + BAx B̂ ? = b̂x + B̂ Âx Σ? = σx + ΣAx Σ̂? = σ̂x + Σ̂Âx


βt? = βt + B(Rt − E[Rt ]) + B̂ R̂t
γt? = σt + Σ(Rt − E[Rt ]) + Σ̂R̂t .

By Itô’s formula we have:


d −ρt 
? ? ?

e |X t |2 ≤ e−ρt −ρ|X t |2 + 2(bt )| X t
dt  
? ? ?
≤ e−ρt |X t |2 (−ρ + 2B̂ ? ) + 2|X t ||β t |
 
−ρt ? 2 ? 1 ?2
≤e |X t | (−ρ + 2B̂ + ) + |β t | .


If we now set:
Z ∞
1 ?
K= |E[X0? ]|2 + e−ρu |β t |2 du C = −ρ + 2B̂ ? + ,
 0

then, by Grownall inequality we obtain:

e−ρt |E[Xt? ]|2 ≤ KeCt .


R∞
Therefore, in order to have 0 e−ρu |E[Xu? ]|2 du < ∞, we shall impose that ρ > 2B̂ ? . Finally,

18
by Itô’s formula we also have:
d ?
h
? ? ?
i
E[e−ρt |Xt? − X t |2 ] ≤ e−ρt E −ρ|Xt? − Xt |2 + 2(b?t − bt )| (Xt? − Xt ) + |σt? |2
dt 
−ρt ? 1 ?
≤ e E |Xt? − Xt |2 (−ρ + 2B ? + ) + |βt? − βt |2

i
+4(|γt? |2 + |Σ? |2 |Xt? − E[Xt? ]|2 + |Σ̂? |2 |E[Xt? ]|2 )
h
?
≤ e−ρt E |Xt? − Xt |2 (−ρ + 2B ? +  + 4|Σ? |2 )+

1 ? ?
|βt − βt |2 + 4(|γt? |2 + |Σ̂? |2 |E[Xt? ]|2 ) .


If we now set:
Z ∞  
? −ρu 1 ? ? 2
K= |X0? − X 0 |2 + e ? 2 ? 2 ? 2
E |βu − βu | + 4(|γu | + |Σ̂ | |E[Xu ]| ) du
0 
C = −ρ + 2B ? + 4|Σ | + , ? 2

then, by Grownall inequality we obtain:


h i
?
e−ρt E |Xt? − Xt |2 ≤ Ke−Ct .
h i
R∞ ?
This time in order to ensure the convergence 0 e−ρu E |Xu? − Xu |2 du < ∞, we will add
the constraint ρ > +2B ? To conclude, in order to ensure that α? ∈ A we make the
4|Σ? |2 .
following assumption:
h i
(H5’) ρ > max 2B̂ ? , 2B ? + 4|Σ? |2 .

3.2 The case of common noise


Let W and W 0 be two independent Brownian motions defined on the same probability space
(Ω, F, F, P) where F = {Ft }t∈[0,T ] is the filtration generated by the pair (W, W 0 ). Let F0 = {Ft0 }t∈[0,T ]
be the filtration generated by W 0 . For any X0 and α ∈ A as in Section 1, the controlled process
X α is defined by:

α α 0 0 α α 0 0
dXt = b(t, Xt , E[Xt |W ], αt , E[αt |W ])dt +σ(t, Xt , E[Xt |W ], αt , E[αt |W ])dWt

+σ 0 (t, Xtα , E[Xtα |W 0 ], αt , E[αt |W 0 ])dWt0

 α
X0 = X0

where for each t ∈ [0, T ], x, x ∈ Rd , ai , ai ∈ Rdi :


 Pn


 b(t, x, x, α, α) = β t + bx,t x + b̃x,t x + i=1 bi,t αi + b̃i,t αi




 = βt + bx,t x + b̃x,t x + Bt α + B̃t α
σ(t, x, x, α, α) = γ + σ x + σ̃ x + Pn σ α + σ̃ α

t x,t x,t i=1 i,t i i,t i


 = γt + σx,t x + σ̃x,t x + Σt α + Σ̃t α
Pn
σ(t, x, x, α, α) = γt0 + σx,t
0 x + σ̃ 0 x + 0 0

i=1 σi,t αi + σ̃i,t αi

x,t



= γt0 + σx,t
0 x + σ̃ 0 x + Σ0 α + Σ̃0 α.


x,t t t

19
Since we will condition on W 0 , we assume that the coefficients bx , b̃x , bi , b̃i , σx , σ̃x , σi , σ̃i , σx0 , σ̃x0 , σi0 , σ̃i0
are essentially bounded and F0 -adapted processes, whereas β, γ, γ 0 are square integrable F0 -adapted
processes. The problem of each player i is to minimize over αi ∈ Ai , and given control α−i of the
other players, a cost functional of the form
Z T 
i −i −ρt i α α 0 0 −ρT i α α 0
J (αi , α ) = E e f (Xt , E[Xt |Wt ], αt , E[αt |Wt ])dt + e g (XT , E[XT |WT ])
0

with f i , g i as in (3). We now suppose that Qi , Q̃i , I i , I˜i , N i , Ñ i are essentially bounded and F0 -
adapted, Lix , Lik are square-integrable F-adapted processes, P i , P̃ i are essentially bounded FT0 -
mesurable random variable and ri are square-integrable FT -mesurable random variable. Hypoth-
esis c) and d) of (H2) still holds. As in step 1 we guess a random field of the type Wα,i t =
i α
 α 0 i i i ⊗2 i ⊗2 i| i
wt (Xt , E Xt |Wt ) where w is of the form wt (x, x) = Kt .(x − x) + Λt .x + 2Yt x + Rt with
suitable coefficients K i , Λi , Y i , Ri . Given that the quadratic coefficient in f i , g i are F0 -adapted we
guess that K i , Λi are also F0 -adapted. Since the linear coefficients in f i , g i and the affine coefficients
in b, σ, σ 0 are F-adapted, we guess that Y i is F-adapted as well. Thus for each player we look for
processes (K i , Λi , Y i , Ri ) valued in Sd+ × Sd+ × Rd × R and of the form:
 i


 dKti = K̇ti dt + ZtK dWt0 KTi = P i
i
dΛi = Λ̇i dt + Z Λ dW 0 ΛiT = P i + P̃ i

t t t t
i
dYti = Ẏti dt + Zti dWt + Zt0,Y dWt0 YTi = ri


dRi = Ṙi dt

Ri = 0
t t T

i i i i
where K̇ i , Λ̇i , Z K , Z Λ are F0 -adapted processes valued in Sd ; Ẏ i , Z Y , Z 0,Y are F-adapted processes
valued in Rd and Ri are continuous functions valued in R. In step 2 we now consider, for each player
i ∈ J1, nK, a family of processes of the form:
Z t
Sα,i −ρt
wti (Xtα , E e−ρu f i (u, Xuα , E Xuα |Wu0 , αu , E αu |Wu0 )du.
 α 0    
t =e Xt |Wt ) +
0

By Itô’s formula we then obtain for (5) and (6):


i i
|
Kti σx,t + σt0| Kti σt0 + Kti bx,t + b|x,t Kti + ZtK σx,t
0 + σ 0| Z K − ρK i
 i

Φt = Qit + σx,t x,t t t
| 0| i 0 | i 0 0| Λi
Ψit i i i i Λ i

= Q̂t + σ̂x,t Kt σ̂x,t + σ̂x,t Λt σ̂x,t + Λt b̂x,t + b̂x,t Λt + Zt σ̂x,t + σ̂x,t Zt − ρΛt




i i
0| 0,Y i
0| 0,Y i
|
= Lix,t + b|x,t Yti + b̃|x,t Y t + σx,t |

i Zti + σ̃x,t i
∆t Z t + σx,t Zt + σ̃x,t Z t + Λt β t



| i i
+σx,t K i γt + σ̃ | K i γ t + Kti (βt − β t ) + ZtK (γt0 − γ 0t ) + ZtΛ γ 0t − ρYti
 P t i| x,t t i|
+ k6=i Uk,t (αk,t − αk,t ) + Vk,t αk,t



 i
= γt| Kti γt + γ 0| | i | i 0| 0,Y

Γit i 0 0 0 | i 0 0
t Λt γ t + (γt − γ t ) Kt (γt − γ t ) + 2βt Yt + 2γt Zt + 2γt Zt





i (α | i i i i | i
+ k6=i (αk,t − αk,t )| Sk,t
P
k,t − αk,t ) + αk,t Ŝk,t αk,t + 2[Ok,t + ξk,t − ξ k,t ] αk,t − ρRt ,

20
with
i + σ| K iσ

i
Sk,t = Nk,t 0 | i 0

 k,t t k,t + (σk,t ) Kt σk,t
i + σ̂ | K i σ̂

i
Ŝk,t = N̂k,t 0 | i 0
k,t t k,t + (σ̂k,t ) Λt σ̂k,t





i + σ| K iσ i |
U i = Ik,t 0 | i 0 0 | K i
k,t t x,t + (σk,t ) Kt σx,t + (σk,t ) Zt + bk,t Kt


 k,t
i
= Iˆi + σ̂ | K i σ̂x,t + (σ̂ 0 )| Λi σ̂ 0 + (σ̂ 0 )| Z Λ + b̂| Λi


V i


 k,t k,t k,t t k,t t x,t k,t t k,t t
 i
 i | i | i | i 0 | i 0 0 )| Z 0,Y i
Ok,t = Lk,t + b̂k,t Y t + σ̂k,t Z t + σ̂k,t Kt γ t + (σ̂k,t ) Λt γ t + (σ̂k,t t
i|
+ 12 k6=i (Jˆi,k,t
i + Jˆk,i,t
P


 )αk,t
i |
= Gik,l,t + σk,t Kti σl,t + (σk,t0 )| K i σ 0

Jk,l,t



 t l,t
|
Jˆk,l,t
 i = Ĝik,l,t + σ̂k,t Kti σ̂l,t + (σ̂k,t
0 )| Λi σ̂ 0



 t l,t
i
= Lk,t + bk,t Yt + σk,t Zt + (σk,t )| Zt0,Y + σk,t
| | |

i i i i 0 Kti γt + (σk,t
0 )| K i γ 0
ξk,t


 t t
i|

1 P i
+ 2 k6=i (Ji,k,t + Jk,i,t )αk,t .

Note that we now denote by U the conditional expectation with respect to Wt0 , i.e. U = E U |Wt0 .
 

Then, at step 3, we constraint the coefficients (K i , Λi , Y i , Ri ) to satisfy the following problem:



dKti = −Φi0 Ki 0 KTi = P i
t dt + Zt dWt


 i
dΛi = −Ψi0 dt + Z Λ dW 0

ΛiT = P i + P̃ i
t t t t
0,Y i (29)


 dYti = −∆i0 i
t dt + +Zt dWt + Zt dWt0 YTi = ri
dRi = −Γi0 dt

Ri = 0
t t T

where Φi0 , Ψi0 , ∆i0 , Γi0 are defined in (8). Thus we obtain the best response functions of the players:

i−1 i i−1 i i i−1


αi,t = −Si,t Ui,t (Xt − E[Xt |Wt0 ]) − Si,t (ξi,t − ξ i,t ) − Ŝi,t i
(Vi,t E[Xt |Wt0 ] + Oi,t
i
).

We then proceed to step 5 and to the search of a fixed point in the space of controls. The
only difference at that point is in the ansatz for t 7→ Yt . Since we consider the case of common
noise, we now search for an ansatz of the form Y t = πt (Xt − X t ) + π̂t X t + ηt where (π, π̂, η) ∈
L∞ ([0, T ], Rnd×d ) × L∞ ([0, T ], Rnd×d ) × S2F (Ω × [0, T ], Rnd ) satisfy:



 dηt = ψt dt + φt dWt + φ0t dWt0
ηT = r = (ri )i∈J1,nK




dπ = π̇ dt + Z 0,πi dW 0


t t t t
(30)


 πT = 0
i
dπ̂t = π̂˙ t dt + Zt0,π̂ dWt0






π̂ = 0.
T

The method to determine the coefficients is then similar. Existence and uniqueness of a solution
(K i , Λi ) to the backward stochastic Ricatti equation in (29) is discussed in [13], section 3.2. The
i
existence of a solution (Y i , Z i , Z 0,Y ) to the linear mean-field BSDE in (29) is obtained as in step
6 thanks to Thm. 2.1 in [12]. As in the previous section the existence of a soltion (π, π̂) ∈ L∞ F0 (Ω ×
R+ , R nd×d 0
) (essentially bounded F -adapted functions) of (30) in the general case is a conjecture
and needs to be verified in each example. We are not aware of any work tackling the existence of
solutions in such situation. Given (π, π̂) the existence of (η, φ, φ0 ) solution to (30) is ensured as in
the previous section by Thm. 2.1 in [12].

21
3.3 The case of multiple Brownian motions
We quickly sketch an extension to the case where there are multiple Brownian motions driving the
state equation. The assumptions on the coefficients are the same as in the previous part. Only the
length of the calculus changes. Let us now consider the state dynamic:
κ
X
dXt = b(t, Xtα , E [Xtα ] , αt , E [αt ])dt + σ ` (t, Xtα , E [Xtα ] , αt , E [αt ])dWt`
`=1

where Φi0 , Ψi0 , ∆i0 , Γi0


t are defined in (8).

αt
 = (α1,t , ..., αn,t )
= βt + bx,t x + b̃x,t x + ni=1 bi,t ai,t + b̃i,t ai,t
P
b(t, x, x, a, a) (31)

 ` Pn
σ (t, x, x, a, a) = γt` + σx,t
` x + σ̃ ` x +
x,t
` `
i=1 σi,t αi,t + σ̃i,t αi,t .

We require the coefficients in (31) to satisfy an adaptation of (H1) where γ, σx , σ̃x , (σi , σ̃i )i∈J1,nK are
replaced by γ ` , σx` , σ̃x` , (σi` , σ̃i` )i∈J1,nK for ` ∈ {1, ..., κ}.
To take into account the multiple Brownian motions in step 1, we now search for random fields of
the form wti (x, x) = Kti .(x−x)⊗2 +Λit .x⊗2 +2Yti| x+Rti with the processes (K i , Λi , Y i , (Z `,i )`∈J1,κK , Ri )
in (L∞ ([0, T ], Sd+ ))2 ×, S2F (Ω × [0, T ], Rd ) × (L2F (Ω × [0, T ], Rd ))κ × L∞ ([0, T ], R) and solution to:



 dKti = K̇ti dt KTi = P i
dΛi = Λ̇i dt

ΛiT = P i + P̃ i
t t
dYti = Ẏti dt + ` Zti,` dWt` YTi = ri
P



dRi = Ṙi dt

RTi = 0
t t

where (K̇ i , Λ̇i , Ṙi ) are deterministic processes valued in Sd+ × Sd+ × R and (Ẏ i , Z i ) are adapted
processes valued in Rd .
The method then follows the same steps with generalized coefficients and at step 2 we obtain
generalized coefficient for (5) and (6):
P r| i r
Φt = Qit + Kti bx,t + b|x,t Kti + r σx,t
 i

 Kt σx,t − ρKti
 i | r|
Ψt = Q̂it + Λit b̂x,t + b̂x,t Λit + r σ̂x,t Λit σ̂x,t r − ρΛi
 P

 t


i i | i | i i i
∆t = P Lx,t + bx,t Yt + b̃x,t Y t + Λt β t + Kt (βt − β t )



`| `| `| i,` `| i,`
+ ` σx,t Kti γt` + σ̃x,t Kti γ `t + σx,t Zt + σ̃x,t Z t − ρYti (32)

i i
P
+ k6=i Uk,t (αk,t − αk,t ) + Vk,t αk,t




Γt = 2βt| Yti + ` γt`| Kti γt` + 2γt`| Zti,`

i

 P



i (α | i i i i |
+ k6=i (αk,t − αk,t )| Sk,t
P
k,t − αk,t ) + αk,t Ŝk,t αk,t + 2[Ok,t + ξk,t − ξ k,t ] αk,t

 P `| i `
i i
Sk,t = Nk,t + ` σk,t Kt σk,t


 i i
P `| i `
Ŝk,t = N̂k,t + ` σ̂k,t Kt σ̂k,t


 P `| i `
 i
Uk,t i + b| K i +
= Ik,t ` σk,t Kt σx,t



 k,t t
| `|
V i = Iˆk,t + b̂k,t Λt + ` σ̂k,t
i i Kti σ̂x,t
`
 P
k,t
i i | i P `| i,` `| iγ`
(33)


 O k,t = L k,t + b̂k,t Y t + ` σ̂ k,t Z t + σ̂ k,t K t t
 i P `| i `

 Jk,l,t = Gik,l,t + ` σk,t Kt σ
P `| i l,t


ˆi i `




 Jk,l,t = Ĝk,l,t + ` σ̂k,t Kt σ̂l,t
ξ i = Li + b| Y i + σ `| Z i,` + σ `| K i γ ` .
 P
k,t k,t k,t t ` k,t t k,t t t

22
From these extended formulas we can then constrain the coefficients as in step 3 and obtain (8)
with now the generalized coefficients defined in (32) and (33). The step 4 is then straightforward
and we obtain the best response functions:

αi,t = ai,0 i,1


t (Xt , E[Xt ]) + at (E[Xt ])
i −1 i i −1 i i i −1 i i
= −(Si,t ) Ui,t (Xt − E[Xt ]) − (Si,t ) (ξi,t − ξ i,t ) − (Ŝi,t ) (Vi,t E[Xt ] + Oi,t ).

From step 4 we can then continue to step 5, i.e. the fixed point search. The only difference
at that point is in the ansatz fort 7→ Yt . Since we consider the case with multiple Brownian
motions we now search for an ansatz of the form Y t = πt (Xt − X t ) + π̂t X t + ηt where (π, π̂, η) ∈
L∞ ([0, T ], Rnd×d ) × L∞ ([0, T ], Rnd×d ) × S2F (Ω × [0, T ], Rnd ) satisfy:

dηt = ψt dt + ` φ`t dWt`
P



ηT = r = (ri )i∈J1,nK






dπ = π̇ dt
t t


 πT = 0
dπ̂t = π̂˙ t dt





π̂
T = 0.

The method to determine the coefficients π, π̂, η is then similar. The validity of the computations
i.e. Step 6 can be done exactly as in the case of a single brownian motion.

4 Example
We now focus on a toy example to illustrate the previous results. Let us consider a two player game
where the state dynamics is simply a Brownian motion that two players can control. The goal of
each player is to get the state near its own target t 7→ Tti , where t 7→ Tti , i = 1, 2, is a stochastic
process. In order to add mean-field terms we suppose that each player try also to minimize the
variance of the state and the variance of their controls.
(
dXt = (bh1 α1,t + b2
α2,t )dt + σdWt  i
i
R ∞ −ρu i
J (α1 , α2 ) = E 0 e λ Var(Xu ) + δ i (Xu − Tui )2 + θi Var(αi,u ) + ξ i αi,u
2 du , i = 1, 2,
where (λi , δ i , θi , ξ i ) ∈ R4+ . In order to fit to the context described in the first section we rewrite the
cost function as follows:
hZ ∞ 
2
J i (α1 , α2 ) = E e−ρu (λi + δ i )(Xu − X u )2 + δ i X u + (θi + ξ)2 (αi,u − αi,u )
0
 i
+ ξ i α2i,u + 2Xu [−2δ i T i ] + δ i (Tui )2 du .

Since the terms δ i (T i )2 do not influence the optimal control of the players, we work with the slightly
simplified cost function:
Z ∞   
˜i −ρu i i 2 i 2 i 2 i 2 i i
J (α1 , α2 ) = E e (λ + δ )(Xu − X u ) + δ X u + (θ + ξ) (αi,u − αi,u ) + ξ αi,u + 2Xu [−2δ T ] du .
0

Following the method explained in the previous section, we use Theorem 2.3 in order to find a
Nash equilibrium. We obtain the feedback form of the open loop controls and the dynamics of the

23
state:  i
αi − αi = − Pbi (K i + π i )(Xt − X t ) + ηti − η it



 i
αi = − P̃bi (Λi + π̃ i )X t + η it
 
Rt (34)


Xt = X 0 e−ãt + 0 e−ã(t−u) γ u du
Rt
= (X0 − X 0 )e−at + 0 e−a(t−u) [(γu − γ u )du + σdWu ]

X − X
t t

where K i ∈ R+ , Λi ∈ R+ , a ∈ R, ã ∈ R, π ∈ R2 , π̃ ∈ R2 , η ∈ L2F ((0, ∞), R2 ), γ ∈ L2F ((0, ∞), R2 ) satisfy:



 √ 
Sx = −(P 1 K 1 /b1 , P 2 K 2 /b2 )
−ρ+ ρ2 +4P i (λi +δ i )

Ki S̃x = −(P̃ 1 Λ1 /b1 , P̃ 2 Λ2 /b2 )

 = 

 

 2P i 

Sy = −(1i=j P i /bi )i,j∈{1,2}
 
−ρ+ ρ2 +4P̃ i δ i
Λi
 

 = i

P2 2P̃ i i
 
S̃y = −(1i=j P̃ i /bi )i,j∈{1,2}
 
(K + π i )
 

a = P 

Pi=1
 
Pα = −(1i6=j K i bj )i,j∈{1,2}
 
 2 i i i



ã = i=1 P̃ (Λ + π̃ )



P̃α = −(1i6=j Λi bj )i,j∈{1,2}

 
0 = Py π − (πB − Pα )(Sx + Sy π) (35)
Py = (1i=j (P i K i + ρ))i,j∈{1,2}


0 = P̃y π̃ − (π̃B − P̃α )(S̃x + S̃y π̃) 


P̃y = (1i=j (P̃ i Λi + ρ))i,j∈{1,2}
 R∞ 
= − t e[Py −(πB−Pα )Sy ](t−u) E Hu − H u |Ft du
   


ηt − η t 


H = (δ 1 T 1 , δ 2 T 2 )
 R∞ 
= − t e[P̃y −(π̃B−P̃α )S̃y ](t−u) H u du
 


ηt 


B = (b1 , b2 )
 
= − 2i=1 P i (ηi,t − η i,t )
 P 

γt − γ t 

b2i
 
Pi =
 
= − 2i=1 P̃ i η i,t

γ P 
θi +ξ i

t

b2i

 i

P̃ = ξi
.

From (34) and (35) we can study and simulate the influence of the different parameters of the
cost-function of the first player. We notice that (λ1 , θ1 ) only influence X − E [X] and the feedback
form of α1 − α1 (zero-mean terms).

• If λ1 → ∞ then π1 ∼ λ1 and K 1 → ∞ which implies that Xt − E [X]t → 0 for all t ≥ 0. This


is expected since the term λ1 penalizes the variance of the state Var(Xt ) in the cost function
of the first player. See Figure 1.

• If δ 1 → ∞ then (π, π̃) → ∞ and K 1 ∼ δ 1 and Λ1 ∼ δ 1 which imply that Xt → Tt1 for all t ≥ 0.
This is also expected since the term δ 1 penalizes the quadratic gap between the state X and
the target T 1 . See Figure 2.

• If θ1 → ∞ then P 1 → 0, P 1 K 1 → 0, P 1 π 1 → 0 and P 1 ηt1 → 0 for every t ≥ 0. We then have


αt − αt → 0 for all t ≥ 0 and all the terms relative to the first player in X − E [X] disappear.
Given that θ1 penalizes the variance of the control of the first player, this convergence is also
intuitive.

• If ξ 1 → ∞ then (P 1 , P̃ 1 ) → 0 which imply that (α1,t , α1,t ) → 0 for all t ≥ 0 and all the terms
relative to the first player in Xt − E [Xt ] and E [Xt ] disappear for all t ≥ 0. This means that
the first player becomes powerless.

24
(a) λi = 0 (b) λi = 10

(c) λi = 100 (d) λi = 500

Figure 1: Nash equilibrium with:


bi = σ = δ i = θi = ξ i = 1, ρ = 3, T 1 = 0, T 2 = 10

Figure 2: t 7→ E[Xt ]
bi = σ = δ i = θi = ξ i = 1, ρ = 3, T 1 = 0, T 2 = 10

References
[1] A. Aurell and B. Djehiche. “Mean-field type modeling of nonlocal crowd aversion in Mean-
field modeling of nonlocal crowd aversion in pedestrian crowd dynamics”. In: SIAM Journal
on Control and Optimization 56.1 (2018), pp. 434–455.
[2] M. Basei and H. Pham. “A Weak Martingale Approach to Linear-Quadratic McKean-Vlasov
Stochastic Control Problems”. In: Journal of Optimization Theory and Applications, to appear
(2018).
[3] A. Bensoussan, J. Frehse, and P. Yam. Mean field games and mean field type control theory.
Springer Briefs in Mathematics, 2013.

25
[4] A. Bensoussan, T. Huang, and M. Laurière. “Mean field control and mean field game models
with several populations”. arXiv: 1810.00783.
[5] R. Carmona and F. Delarue. “Forward-backward stochastic differential equations and con-
trolled McKean-Vlasov dynamics”. In: Annals of Probability 43.5 (2015), pp. 2647–2700.
[6] R. Carmona and F. Delarue. Probabilistic Theory of Mean Field Games with Applications vol
I and II. Springer, 2018.
[7] A. Cosso and H. Pham. “Zero-sum stochastic differential games of generalized McKean-Vlasov
type”. In: Journal de Mathématiques Pures et Appliquées, to appear (2018).
[8] B. Djehiche, J. Barreiro-Gomez, and H. Tembine. “Electricity price dynamics in the smart
grid: A mean-field-type game perspective.” In: 23rd International Symposium on Mathematical
Theory of Networks and Systems (MTNS2018). 2018, pp. 631–636.
[9] T. Duncan and H. Tembine. “Linear-quadratic mean-field-type games: a direct method”. In:
Games 9.7 (2018).
[10] P.J. Graber. “Linear-Quadratic Mean-Field Type Control and Mean-Field Games with Com-
mon Noise, with Application to Production of an Exhaustible Resource”. In: Applied Mathe-
matics and Optimization 74.3 (2016), pp. 459–486.
[11] J. Huang, X. Li, and J. Yong. “Linear-Quadratic Optimal Control Problem for Mean-Field
Stochastic Differential Equations in Infinite Horizon”. In: Mathematical Control and Related
Fields 5.1 (2015), pp. 97–139.
[12] X. Li, J. Sun, and J. Xiong. “Linear Quadratic Optimal Control Problems for Mean-Field Back-
ward Stochastic Differential Equations”. In: Applied Mathematics & Optimization, to appear
(2017).
[13] H. Pham. “Linear quadratic optimal control of conditional McKean-Vlasov equation with
random coefficients and applications”. In: Probability, Uncertainty and Quantitative Risk 1:7
(2016).
[14] H. Pham and X. Wei. “Dynamic programming for optimal control of stochastic McKean-Vlasov
dynamics”. In: SIAM Journal on Control and Optimization 55.2 (2017), pp. 1069–1101.
[15] J. Yong. “Linear-Quadratic Optimal Control Problem for Mean-Field Stochastic Differential
Equations in Infinite Horizon”. In: SIAM Journal on Control and Optimization 51.4 (2013),
pp. 2809–2838.
[16] J. Yong and X.Y. Zhou. Stochastic controls: Hamiltonian Systems and HJB Equations. SMAP.
Springer, 1999.

26

You might also like