You are on page 1of 23

Accepted Manuscript

Combining fast inertial dynamics for convex optimization with Tikhonov


regularization

Hedy Attouch, Zaki Chbani, Hassan Riahi

PII: S0022-247X(16)30803-4
DOI: http://dx.doi.org/10.1016/j.jmaa.2016.12.017
Reference: YJMAA 20956

To appear in: Journal of Mathematical Analysis and Applications

Received date: 30 July 2016

Please cite this article in press as: H. Attouch et al., Combining fast inertial dynamics for convex optimization with Tikhonov
regularization, J. Math. Anal. Appl. (2017), http://dx.doi.org/10.1016/j.jmaa.2016.12.017

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are
providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting
proof before it is published in its final form. Please note that during the production process errors may be discovered which could
affect the content, and all legal disclaimers that apply to the journal pertain.
COMBINING FAST INERTIAL DYNAMICS FOR CONVEX OPTIMIZATION
WITH TIKHONOV REGULARIZATION.

Hedy Attouch
Institut Montpelliérain Alexander Grothendieck, UMR CNRS 5149, Université Montpellier, 34095 Montpellier cedex 5, France

Zaki Chbani
Cadi Ayyad university, Faculty of Sciences Semlalia, Mathematics, 40000 Marrakech, Morroco

Hassan Riahi
Cadi Ayyad university, Faculty of Sciences Semlalia, Mathematics, 40000 Marrakech, Morroco

Abstract
In a Hilbert space setting H, we study the convergence properties as t → +∞ of the trajectories of the second-order
differential equation
α
(AVD)α, ẍ(t) + ẋ(t) + ∇Φ(x(t)) + (t)x(t) = 0,
t
where ∇Φ is the gradient of a convex continuously differentiable function Φ : H → R, α is a positive parameter, and
(t)x(t) is a Tikhonov regularization term, with (t) positive, and limt→∞ (t) = 0. In this damped inertial system,
the damping coefficient αt vanishes asymptotically, but not too quickly, a key property to obtain rapid convergence of
the values. In the case (·) ≡ 0, this dynamic has been highlighted recently by Su, Boyd, and Candès as a continuous
version of the Nesterov accelerated gradient method. Depending on the speed of convergence of (t) to zero, we analyze
the convergence properties of the trajectories of (AVD)α, . We obtain results ranging from the rapid convergence of
Φ(x(t)) to min Φ when (t) decreases rapidly to zero, up to the strong convergence of the trajectories to the element
of minimum norm of the set of minimizers of Φ, when (t) tends slowly to zero. When (t) = t1r , the critical value of
r separating the two above cases is r = 2.
Keywords: Convex optimization; hierarchical minimization; inertial dynamics; Nesterov accelerated gradient
method; Tikhonov approximation; vanishing viscosity.

1. Introduction

Throughout the paper, H is a real Hilbert space which is endowed with the scalar product ·, ·, with x2 = x, x
for x ∈ H. Let Φ : H → R be a convex differentiable function. We consider the convex minimization problem

min {Φ(x) : x ∈ H} , (1)

whose solution set S = argmin Φ is supposed to be nonempty. We aim at finding by rapid methods the element of
minimum norm of the closed convex set S. To that end, we study the asymptotic behaviour (as t → +∞) of the
trajectories of the second-order differential equation
α
(AVD)α, ẍ(t) + ẋ(t) + ∇Φ(x(t)) + (t)x(t) = 0, (2)
t
where α is a positive parameter, and (t)x(t) is a Tikhonov regularization term. Throughout the paper (unless
otherwise stated), we assume that

(H1 ) Φ : H → R is convex and differentiable, its gradient ∇Φ is Lipschitz continuous on bounded sets.

(H2 ) S := argminΦ
= ∅.

(H3 )  : [t0 , +∞[→ R+ is a nonincreasing function, of class C 1 , such that limt→∞ (t) = 0.

Preprint submitted to Elsevier December 9, 2016


The system
α
(AVD)α ẋ(t) + ∇Φ(x(t)) = 0,
ẍ(t) + (3)
t
which corresponds to the case (·) ≡ 0, has been introduced by Su, Boyd and Candès in [51]. In the case α = 3,
it provides a continuous version of the Nesterov accelerated gradient method, see [40]-[41]-[42]-[43]. The study of
the long-term behavior of the trajectories satisfying (AVD)α has given important insight into Nesterov’s acceleration
method and its variants.
System (3) is a particular case of the general model

ẍ(t) + a(t)ẋ(t) + ∇Φ(x(t)) = 0, (4)

where a(·) is a positive damping parameter that vanishes asymptotically. A comprehensive study of (4) was made by
Cabot, Engler and Gadat in [28]. They showed that the key property providing the asymptotic minimization property
 +∞
of the trajectories of (4) is t0 a(t)dt = ∞. This reflects the fact that a(t) does not vanish to quickly as t goes to
infinity. Of course, the latter property is satisfied by a(t) = αt , which gives (AVD)α . Note that (AVD)α appears as a
limiting case of this slowly vanishing property, when considering the scaling a(t) = t1r , 0 < r ≤ 1.
Let us review some of the main properties of (AVD)α :

• For α ≥ 3, its trajectories satisfy the fast minimization property Φ(x(t)) − minH Φ = O(t−2 ), which is known
to be the best possible estimate (in the worst case). This property has been first put to the fore by Su, Boyd,
Candès [51], and further developed by Attouch, Chbani, Peypouquet, Redont [11].
• For α > 3, we actually have Φ(x(t)) − minH Φ = o(t−2 ), which means that lim t2 (Φ(x(t)) − minH Φ) = 0,
see May [39], Jendoubi and May [37]. A corresponding algorithmic version has been developed by Attouch
and Peypouquet in [19]. In addition, the weak convergence of trajectories is satisfied in this case, making the
connection to the algorithmic results of Chambolle and Dossal [29]. This is a strong justification to take the
parameter α > 3.

We use the terminology introduced in [11], where (AVD)α stands for Asymptotic Vanishing Damping with param-
eter α. Linking convergence of continuous dissipative systems and algorithms is an ongoing research topic, the reader
may consult [33], [46], [47], [48]. Through the study of (AVD)α, , we seek to combine the rapid optimization property
of the system (AVD)α , with the property of strong convergence of trajectories to the solution of minimum norm. This
latter property is typically attached to the Tikhonov approximation. In doing so, we expect to open the way for new
algorithmic developments combining Nesterov accelerated gradient method with Tikhonov regularization.
An abundant litterature has been devoted to the asymptotic hierarchical minimization property which results from
the introduction of a vanishing viscosity term (in our context the Tikhonov approximation) in gradient-like dynamics.
For first-order gradient systems and subdifferential inclusions, see [6], [7], [12], [14], [20] , [31], [35]. In parallel way, there
is also a vast literature on convex minimization algorithms that combine different descent methods (gradient, Prox,
forward-backward) with Tikhonov and more general penalty and regularization schemes. The historical evolution
can be traced back to Fiacco and McCormick [32], and the interpretation of interior point methods with the help
of a vanishing logarithmic barrier. Some more specific references for the coupling of Prox and Tikhonov can be
found in Cominetti [30]. The time discretization of the first-order gradient systems and subdifferential inclusions
involving multiscale (in time) features provides a natural link between the continuous and discrete dynamics. The
resulting algorithms combine proximal based methods (for example forward-backward algorithms), with the viscosity
of penalization methods, see [16], [17], [23], [27], [35].
A closely related dynamic to (AVD)α, is the heavy ball with friction system with a Tikhonov regularization term

(HBF) ẍ(t) + γ ẋ(t) + ∇Φ(x(t)) + (t)x(t) = 0. (5)

By contrast with (AVD)α, , in (HBF) the damping coefficient γ is a fixed positive real number. The heavy ball with
friction system (HBF), which corresponds to  = 0 in (HBF) , is a dissipative dynamical system whose optimization
properties have been studied in detail in several articles, see [1], [2], [3], [5], [10], [18], [33], [49]. Under the sole
assumption of convexity of Φ, and argminΦ
= ∅, weak convergence of each trajectory
 +∞ to an optimal solution has been
first obtained by Alvarez in [2]. In [13], in the slow parametrization case 0 (t)dt = +∞, it is proved that any
solution x(·) of (HBF) converges strongly to the minimum norm element of argmin Φ. Equation (5) with an integrable
source term has been considered by Jendoubi and May in [36], who obtain quite similar convergence properties. A
parallel study has been developed for PDE’s, see [4] for damped hyperbolic equations with non-isolated equilibria, and
[6] for semilinear PDE’s. (HBF) system is a special case of the general dynamic model

ẍ(t) + γ ẋ(t) + ∇Φ(x(t)) + (t)∇Ψ(x(t)) = 0 (6)

2
which involves two potential functions Φ and Ψ intervening with different time scale. When (·) tends to zero mod-
erately slowly, it was recently shown in [15] that the trajectories of (6) converge asymptotically to equilibria that
are solutions of the following hierarchical minimization problem: they minimize the potential Ψ on the set of min-
imizers of Φ. When H = H1 × H2 is a product space, defining for x = (x1 , x2 ), Φ(x1 , x2 ) = Φ1 (x1 ) + Φ2 (x2 ) and
Ψ(x1 , x2 ) = A1 x1 − A2 x2 2 , where the Ai are linear operators, (6) provides (weakly) coupled inertial systems. Con-
tinuous and discrete-time versions of such systems have a natural links with best response dynamics for potential
games [14], domain decomposition for PDE’s [9], optimal transport [8], weakly coupled wave equations [34].
But, without additional assumption on Φ, no fast convergence result has been obtained for (HBF) . As an original
aspect of our approach (and a source of difficulties), through the study of (AVD)α, system, we wish to simultaneously
handle the two vanishing parameters, the damping parameter and the Tikhonov parameter. Ideally, we want to achieve
both rapid convergence and convergence towards the minimum norm solution. As we shall see, this is a difficult task,
since the two requirements are some way antagonistic.
In this paper, for simplicity, we limit our analysis to the case of a convex differentiable potential Φ, having finite
values. On the basis of the existence results in [10], a parallel analysis can be developed in the constrained case,
and more generally in the case of a non-smooth potential, with real extended values. This is because the Lyapunov
techniques that we use can be naturally extended to the case of a non-smooth potential (using generalized derivation
rules, and convex subgradient inequalities). Time discretization of the corresponding differential inclusion gives rise to
accelerated algorithms, like FISTA. But in this case, it is easier to study the algorithm directly. The dynamic system
will serve as a guideline only, to suggest Lyapunov functions, see [11] for an example of such an approach. That is
why we only consider the case of a smooth potential herein. This is a sufficient framework to give the keys to the
corresponding algorithmic study. As a perspective, the connection with the algorithms is discussed at the end of the
paper.
Let us come to the contents of the paper. Let us fix some t0 > 0, as a starting time. Taking t0 > 0 comes from
the singularity of the damping coefficient a(t) = αt at zero. Indeed, since we are only concerned about the asymptotic
behaviour of the trajectories, we do not really care about the origin of time that is taken. If one insists starting from
α
t0 = 0, then all the results remain valid taking a(t) = t+1 .
Depending on the speed of convergence of (t) to zero, and the value of the positive parameter α, we analyze the
convergence properties of the trajectories of (AVD)α, . We obtain results ranging from the rapid convergence of the
values when (t) decreases rapidly to zero, up to the convergence to the element of minimum norm of argmin Φ, when
(t) tends slowly to zero. When (t) = t1r , we show that the critical value of r separating the two above cases is r = 2.
This is described in the following result, that will follow as a consequence of the more general situations examined in
the two next sections.
Theorem 1.1. Let x(·) be a classical global solution of (AVD)α, .
1
A. In the ”fast vanishing case” (t) = tr , r>2

i) When α ≥ 3, Φ(x(t)) − minH Φ ≤ C


t2 .

ii) When α > 3


• Φ(x(t)) − minH Φ = o(t−2 );
• x(t) converges weakly to some element of argminΦ.
1
B. In the ”slow vanishing case” (t) = tr , 0 < r < 2, and α ≥ 3, then the trajectory is minimizing, and
lim inf x(t) − x∗  = 0,
t→∞

where x is the element of minimum norm of the closed convex nonempty set argmin Φ. In addition,
lim x(t) − x∗  = 0,
t→∞

when either the trajectory {x(t) : t ≥ T } remains in the ball B(0, x∗ ), or in its complement, for T large enough.

C. In the ”critical case (t) = c


t2 , by taking α > 3, and c > 49 α(α − 3), then all the conclusions of case B are met.

1
∞ (τ )
D. In the ”very slow vanishing case” (t) = (ln t)γ , 0 < γ ≤ 1, we have t0 τ dτ = +∞, and the following ergodic
convergence property is satisfied
 t
1 (τ )
lim  t (τ )
x(τ ) − x∗ dτ = 0. (7)
t→+∞ τ
t0 τ dτ t0

In particular,
lim inf x(t) − x∗  = 0. (8)
t→+∞

3
The paper is organized as follows. In section 2, we give some preliminary results concerning energy estimates,
and general minimizing properties for (AVD)α, . We also recall basic facts concerning Tikhonov approximation.
 +∞
In section 3, we analyze the fast converging properties of the trajectories of (AVD)α, , when t0 t(t)dt < +∞,
which reflects the fact that the Tikhonov parameter (t) tends ”rapidly” to zero. In section 4, we show the strong
convergence property of the trajectories to the solution with minimum norm, when lim t2 (t) = +∞, which reflects
that the Tikhonov parameter (t) tends ”slowly” to zero. Finally, we give numerical illustrations and a conclusion.
The appendix contains some technical lemmas.

2. Preliminary results and estimations.


2.1. Existence and uniqueness of orbits for the Cauchy problem
The existence of global solutions to (2) has been examined, for instance, in [28, Proposition 2.2] in the case of a
general asymptotic vanishing damping coefficient, see also [13] in the case of a fixed damping parameter. It is based
on the formulation of (2) as a first-order system. Then apply Cauchy-Lipschitz theorem, and use energy estimates
to pass from a local to a global solution. In our setting, for any t0 > 0, α > 0, and (x0 , v0 ) ∈ H × H, there exists a
unique global classical solution x : [t0 , +∞[→ H of (2), satisfying the initial condition x(t0 ) = x0 , ẋ(t0 ) = v0 , under
the sole assumption inf Φ > −∞.

2.2. Some energy estimates


In what follows, x(·) denotes a trajectory of (2). At different points, we shall use the global mechanical energy of
the system, given by W : [t0 , +∞[→ R
1 (t)
W (t) := ẋ(t)2 + Φ(x(t)) + x(t)2 . (9)
2 2
After scalar multiplication of (2) by ẋ(t) we obtain
d α 1
W (t) = − ẋ(t)2 + (t)x(t)
˙ 2
. (10)
dt t 2
˙ ≤ 0, from which we deduce the following dissipative property.
By assumption (t)
Lemma 2.1. Suppose that α > 0. Let W be defined by (9). For each t > t0 , we have
d α
W (t) ≤ − ẋ(t)2 .
dt t
i) Hence, W is nonincreasing, and W∞ = limt→+∞ W (t) exists in R.
ii) The velocity vector ẋ satisfies the following estimates
sup ẋ(t) < +∞, (11)
t≥t0
 +∞
1 1 
ẋ(t)2 dt ≤ W (t0 ) − inf Φ < +∞. (12)
t0 t α H

Now, given z ∈ H, define hz : [t0 , +∞[→ R by


1
hz (t) = x(t) − z2 . (13)
2
By the classical derivation chain rule, we have
ḣz (t) = x(t) − z, ẋ(t) and ḧz (t) = x(t) − z, ẍ(t) + ẋ(t)2 .
Combining these expressions, and using (2), we obtain
α α
ḧz (t) + ḣz (t) = ẋ(t)2 + x(t) − z, ẍ(t) + ẋ(t) = ẋ(t)2 − x(t) − z, ∇Φ(x(t)) + (t)x(t). (14)
t t
Using the convexity inequality
x(t) − z, ∇Φ(x(t)) ≥ Φ(x(t)) − Φ(z)
in (14), we deduce that
α
ḧz (t) + ḣz (t) + Φ(x(t)) − Φ(z) ≤ ẋ(t)2 − (t)x(t) − z, x(t). (15)
t
Reformulating this expression with the help of the energy function W (defined in (9)), we obtain the following
differential inequality, that will play a central role in the convergence analysis of the trajectories of (2).
4
Lemma 2.2. Take z ∈ H, and let W and hz be defined by (9) and (13), respectively. Then
α 3 1  
ḧz (t) + ḣz (t) + W (t) − Φ(z) ≤ ẋ(t)2 + (t) 2x(t), z − x(t)2 . (16)
t 2 2
As a consequence,
α 3 1
ḧz (t) + ḣz (t) + W (t) − Φ(z) ≤ ẋ(t)2 + (t)z2 . (17)
t 2 2
2.3. Some classical facts concerning Tikhonov approximation
For each  > 0, we denote by x the unique solution of the strongly convex minimization problem
 
x = argmin x∈H Φ(x) + x2 .
2
Equivalently,
∇Φ(x ) + x = 0.
Let us recall that the Tikhonov approximation curve,  → x , satisfies the well-known strong convergence property:

lim x = x∗ , (18)
→0

where x∗ is the element of minimal norm of the closed convex nonempty set argmin Φ.
This result was first obtained by Tikhonov [52], Tikhonov and Arsenin [53] in the case of ill-posed least square
problems, and Browder [24] for monotone variational inequalities, then extended and revisited by many authors, see for
example [7, Corollary 5.2], [21, Theorem 23.44], [50]. Moreover, by the monotonicity property of ∇Φ, and ∇Φ(x∗ ) = 0,
∇Φ(x ) = −x , we have
x − x∗ , −x  ≥ 0
which, after dividing by  > 0, and by Cauchy-Schwarz inequality gives

x  ≤ x∗  for all  > 0. (19)

A remarkable geometrical property of the viscosity curve  → x is that for all  > 0

x = proj{Φ≤Φ(x )} 0.

This implicit charaterization of x has been used [54] in order to construct an example where the viscosity curve has
infinite length. This explains some of the difficulties met when analyzing the convergence properties of the trajectories
which are intended to follow asymptotically the viscosity curve, see [12].

2.4. A general minimization property


Under general assumptions, let us show the minimizing property of the trajectories, limt→+∞ Φ(x(t)) = inf H Φ.
Here we do not examine the fast minimization property, neither the identification of the limit. These are topics that
will be examined in the next sections.
Proposition 2.1. Let Φ : H → R be a convex continuously differentiable function such that inf H Φ > −∞ (the set
argmin Φ is possibly empty). Suppose that α > 1. Let  : [t0 , +∞[→ R+ be a C 1 decreasing function such that
 +∞
(t)
dt < +∞. (20)
t0 t

Let x(·) be a classical global solution of (AVD)α, . Then, the following minimizing property holds

lim Φ(x(t)) = inf Φ, (21)


t→+∞ H

and
lim ẋ(t) = 0. (22)
t→+∞

Proof. Let z ∈ H, taken arbitrarily. By Lemma 2.2, the function h : [t0 , +∞[→ R+ defined by h(t) = 12 x(t) − z2
satisfies the differential inequality
α
ḧ(t) + ḣ(t) + W (t) − Φ(z) ≤ g(t) (23)
t
where
3 (t)
g(t) := ẋ(t)2 + z2 . (24)
2 2
5
Let us integrate the differential inequality (23). After multiplication of (23) by tα we have

d α 
t ḣ(t) + tα (W (t) − Φ(z)) ≤ tα g(t).
dt
Integrating from t0 to t we obtain
 t  t
tα ḣ(t) − tα
0 ḣ(t0 ) + sα (W (s) − Φ(z)) ds ≤ sα g(s)ds.
t0 t0

Since the function W (·) is nonincreasing (see Lemma 2.1), we deduce that
 t  t
tα ḣ(t) − tα
0 ḣ(t 0 ) + (W (t) − Φ(z)) s α
ds ≤ sα g(s)ds,
t0 t0

which, after division by tα , gives


 t  t

0 ḣ(t0 )
ḣ(t) + t−α (W (t) − Φ(z)) sα ds ≤ + t−α sα g(s)ds.
t0 tα t0

Integrating once more from t0 to t we obtain


 t
 s  t  t
 s
h(t) − h(t0 ) + s−α (W (s) − Φ(z)) τ α dτ ds ≤ tα
0 ḣ(t0 ) s−α ds + s−α τ α g(τ )dτ ds.
t0 t0 t0 t0 t0

Using again that W (·) is nonincreasing, we deduce that


 t
 s  t
 s
−α 1 −α
h(t) − h(t0 ) + (W (t) − Φ(z)) s τ dτ ds ≤
α
t0 |ḣ(t0 )| + s α
τ g(τ )dτ ds.
t0 t0 α−1 t0 t0

Computing the first integral, and since h is nonnegative, we obtain



2  t
 s
1 t t0 2 t0 α+1 t0 2 1 −α
(W (t) − Φ(z)) − + − ≤ h(t 0 ) + t 0 | ḣ(t 0 )| + s τ α
g(τ )dτ ds. (25)
α+1 2 2 (α − 1)tα−1 α−1 α−1 t0 t0

Let us compute this last integral by Fubini’s theorem


 t
 s  t
 t
s−α τ α g(τ )dτ ds = s−α ds τ α g(τ )dτ
t0 t0 t0 τ
 t

1 1 1
= − τ α g(τ )dτ
α−1 t0 τ α−1 tα−1
 t
1
≤ τ g(τ )dτ.
α−1 t0

Returning to (25), we deduce that



2  t
1 t t0 2 t0 α+1 t0 2 1 1
(W (t) − Φ(z)) − + − ≤ h(t0 ) + t 0 | ḣ(t0 )| + τ g(τ )dτ. (26)
α+1 2 2 (α − 1)tα−1 α−1 α−1 α − 1 t0
∞  +∞
By (11) we have t0 1t ẋ(t)2 dt < +∞. By assumption (20) we have t0 1t (t)dt < +∞. According to these
properties and the definition (24) of g, we have
 ∞
1
g(t)dt < +∞. (27)
t0 t

Let us rewrite (26) as




1 t2 t0 2 t0 α+1 t0 2 1 1 t
1
(W (t) − Φ(z)) − + − ≤ h(t0 ) + t0 |ḣ(t0 )| + τ 2 g(τ )dτ.
α+1 2 2 (α − 1)t α−1 α−1 α−1 α−1 t0 τ

Dividing by t2 , and letting t → +∞, we obtain thanks to Lemma 7.3

lim sup W (t) ≤ Φ(z). (28)


t→+∞

6
Since W (t) ≥ Φ(x(t)), we deduce that
lim sup Φ(x(t)) ≤ Φ(z).
t→+∞

This being true for any z ∈ H, we obtain


lim sup Φ(x(t)) ≤ inf Φ.
t→+∞

The other inequality lim inf Φ(x(t)) ≥ inf Φ being trivially satisfied, we finally obtain

lim Φ(x(t)) = inf Φ.


t→+∞ H

Returning to (28), we also obtain


lim ẋ(t) = 0.
t→+∞


 +∞ (t) 1 1
Remark 2.1. With t0 > 0, the condition t0 t dt < +∞ is satisfied by (t) = tγ for any γ > 0, and by (t) = (ln t)γ
for any γ > 1.

 +∞
3. Fast vanishing case t0
t(t)dt < +∞
 +∞
Under the hypothesis of rapid parametrization t0 t(t)dt < +∞, and assuming α ≥ 3, we obtain a fast
minimization property. In addition, for α > 3, we obtain convergence of the trajectories to minimizers, and an even
faster rate of convergence of the values. In this situation, the limit depends on the initial data. Asymptoticaly, the
regularizing term is not large enough to induce a viscosity selection. Note that, despite the fact that it is not active
asymptotically, the Tikhonov term makes the dynamic governed at all times t by a strongly monotone operator, which
induces favorable numerical aspects.
The following fast minimization property, and the convergence of trajectories, is consistent with the results obtained
in [11] for the perturbed version of (AVD)α . The Tikhonov regularization term acts as a small perturbation which
does not affect the convergence properties of (AVD)α .

Theorem 3.1. Let Φ : H → R be a convex continuously differentiable


 +∞ function such that argmin Φ is nonempty. Let
 : [t0 , +∞[→ R+ be a C 1 nonincreasing function such that t0 t(t)dt < +∞. Let x(·) be a classical global solution
of (AVD)α, .
i) Suppose α ≥ 3. Then the fast convergence of the values holds true
1
Φ(x(t)) − inf Φ = O( ).
t2
Moreover  +∞
t(t)x(t)2 < +∞.
t0

ii) Suppose α > 3. Then x(t) converges weakly to an element of argmin Φ, as t → +∞. In addition,
1
Φ(x(t)) − inf Φ = o( ), (29)
t2
lim tẋ(t) = 0, (30)
t→+∞
 +∞
t(Φ(x(t)) − inf Φ)dt < +∞, (31)
t0
 +∞
tẋ(t)2 dt < +∞. (32)
t0

Proof. i) The proof is parallel to that of [11]. Fix z ∈ argmin Φ, and consider the energy function
2 2 t
E(t) = t [ft (x(t)) − inf Φ] + (α − 1)x(t) − z + ẋ(t)2 ,
α−1 α−1
where ft : H → R is defined for any x ∈ H by

(t)
ft (x) := Φ(x) + x2 . (33)
2
7
Let us observe that

∇ft (x(t)) = ∇Φ(x(t)) + (t)x(t)


α
= −ẍ(t) − ẋ(t).
t
By differentiating E(·), and using the above relation we obtain

4t 2 2 x(t)2
Ė(t) = [ft (x(t)) − inf Φ] + t ∇ft (x(t)), ẋ(t) + (t)
˙
α−1 α−1 2
t α
+ 2tx(t) − z + ẋ(t), ẍ(t) + ẋ(t)
α−1 t

4t 2 2 x(t)2
= [ft (x(t)) − inf Φ] + t ∇ft (x(t)), ẋ(t) + (t)
˙
α−1 α−1 2
t
− 2tx(t) − z + ẋ(t), ∇ft (x(t)).
α−1
After simplification, we obtain

4t t2 2
Ė(t) = [ft (x(t)) − inf Φ] + (t)x(t)
˙ − 2tx(t) − z, ∇ft (x(t)). (34)
α−1 α−1
By the strong convexity of ft

(t)
ft (z) − ft (x(t)) ≥ ∇ft (x(t)), z − x(t) + x(t) − z2 .
2
Equivalently
(t) (t)
∇ft (x(t)), x(t) − z ≥ ft (x(t)) − inf Φ − z2 + x(t) − z2 . (35)
2 2
Combining (34) with (35) we obtain

4 t2 2
Ė(t) + (2 − )t [ft (x(t)) − inf Φ] − (t)x(t)
˙ + t(t)x(t) − z2 ≤ t(t)z2 .
α−1 α−1
From (33), we deduce that

α−3 tx(t)2
Ė(t) + 2( )t [Φ(x(t)) − inf Φ] + [(α − 3)(t) − t(t)]
˙ + t(t)x(t) − z2 ≤ t(t)z2 . (36)
α−1 α−1
Since (·) is a nonincreasing, nonnegative function, and α ≥ 3, we infer

Ė(t) ≤ t(t)z2 .
 +∞
By assumption t0 t(t)dt < +∞. As a consequence, the positive part [Ė]+ of Ė belongs to L1 (t0 , +∞). Since E is
bounded from below (it is nonnegative), we conclude that E(t) has a limit as t → +∞, and hence is bounded, which
C
gives the claim Φ(x(t)) − inf Φ ≤ 2 . Now integrating (36), we obtain
t
 +∞
t(t)x(t) − z2 dt < +∞. (37)
t0

Combining the inequality


t(t)x(t)2 ≤ 2t(t)x(t) − z2 + 2t(t)z2
 +∞
with (37) and t0
t(t)dt < +∞, we conclude that
 +∞
t(t)x(t)2 dt < +∞. (38)
t0

ii) Let us now suppose α > 3. By integration of (36) we first obtain


 +∞
t(Φ(x(t)) − inf Φ)dt < +∞. (39)
t0
8
In order to obtain further energy estimates, let us take the scalar product of (2) with t2 ẋ(t). We obtain
t2 d d t2 d
ẋ(t)2 + αtẋ(t)2 + t2 Φ(x(t)) + (t) x(t)2 = 0.
2 dt dt 2 dt
After integration (by parts) on [t0 , t], we obtain
 t  t
t2
ẋ(t)2 + (α − 1) sẋ(s)2 ds + t2 (Φ(x(t)) − inf Φ) − 2 s (Φ(x(s)) − inf Φ) ds
2 t0 t0
 t
t2 s2
+ (t)x(t)2 − (s(s) + (s))x(s)
˙ 2
ds ≤ C
2 t0 2
where C is independent of t, and just depends on the initial data. By using (·) ˙ ≤ 0, we deduce that
 t  ∞  ∞
t2
ẋ(t)2 + (α − 1) sẋ(s)2 ds ≤ C + 2 s (Φ(x(s)) − inf Φ) ds + s(s)x(s)2 ds.
2 t0 t0 t0

Using the previous estimates (38) and (39), and since α > 1, we obtain
 +∞
tẋ(t)2 dt < +∞ (40)
t0

sup tẋ(t) < +∞. (41)


t≥t0

We now have all the ingredients to prove the weak convergence of trajectories. From Lemma 2.2, (17), we have

tḧz (t) + αḣz (t) ≤ g(t), (42)

with
3 1
g(t) = tẋ(t)2 + t(t)z2 .
2 2
 +∞  +∞
Using again the assumption t0 t(t)dt < +∞, and the energy estimate (32) t0 tẋ(t)2 dt < +∞ , it follows that
g ∈ L1 (t0 , +∞). Lemma 7.2 (Appendix) now shows that the positive part [ḣ]+ of ḣ belongs to L1 (t0 , +∞). Hence,
for any z ∈ argmin Φ, limt→+∞ h(t) exists. This is one of the two conditions of Opial’s lemma [45], which is detailed
in Lemma 7.1. The other condition is clearly satisfied: we know that Φ(x(t)) tends to the infimal value of Φ. By the
lower semicontinuity of Φ for the weak topology, every sequential weak cluster point of x(·) is a minimizer of Φ.
Let us complete the proof by proving Φ(x(t)) − inf Φ = o( t12 ). We consider the classical energy function
1
E(t) := ẋ(t)2 + (Φ(x(t)) − inf Φ).
2
E is a nonnegative function. In view of the estimates (31)-(32), we have
 +∞
tE(t)dt < +∞. (43)
t0

Let us prove that lim t2 E(t) exists. We have


d 2 d
(t E(t)) = 2tE(t) + t2 E(t)
dt
dt
1
= 2t ẋ(t) + Φ(x(t)) − inf Φ + t2 (∇Φ(x(t)) + ẍ(t), ẋ(t))
2
2

 α 
1
= 2t ẋ(t)2 + Φ(x(t)) − inf Φ − t2  ẋ(t) + (t)x(t), ẋ(t)
2 t
= 2t (Φ(x(t)) − inf Φ) − (α − 1)tẋ(t)2 − t2 (t)ẋ(t), x(t).

Since α > 1, and by Cauchy-Schwarz inequality, it follows that


d 2
(t E(t)) ≤ 2t (Φ(x(t)) − inf Φ) + t(t)(tẋ(t))x(t).
dt
From supt≥t0 tẋ(t) < +∞ (see (41)), and x(t) bounded (recall that the trajectory converges weakly), we deduce
that there exists a constant C such that
d 2
(t E(t)) ≤ 2t (Φ(x(t)) − inf Φ) + Ct(t).
dt
9
 +∞
Invoking again the estimates (31), and the assumption t0 t(t)dt < +∞, we deduce that the positive part of
d 2 2
dt (t E(t)) is integrable on (0, +∞). Hence lim t E(t) exists, which in view of (43) implies

lim t2 E(t) = 0.
t2
Noticing that t2 E(t) ≥ t2 (Φ(x(t)) − inf Φ) ≥ 0, we deduce that Φ(x(t)) − inf Φ = o( t12 ). From t2 E(t) ≥ 2 ẋ(t)
2
≥ 0,
we obtain the remaining assertion (30), which completes the proof. 
 +∞
Corollary 3.1. By taking (t) = t1r , with r > 2, we have t0 t(t)dt < +∞, and all the conditions of Theorem 3.1
are satisfied. As a consequence, we obtain the conclusions of part A of Theorem 1.1 (presented in the introduction).

Remark 3.1. The above convergence result is not a consequence of the general perturbation Theorem of [11], because
we do not know a priori if the trajectories remain bounded. This is only obtained ultimately as a consequence of the
convergence of the trajectories for the weak topology. Note that, in view of applications to ill-posed problems, we do
not want to make coerciveness, or strong convexity assumptions on Φ, which makes the boundedness of the trajectories
a non trivial issue.

Remark 3.2. When the solution set argminΦ is empty, the fast convergence properties obtained in Theorem 3.1 may
fail to be satisfied, see [11, Example 2.12]. The minimization properties in this case are examined in Section 2.4.

4. Strong convergence to the minimum norm solution

Recall that S = argmin Φ is a nonempty closed convex set. When (t) tends to zero in a moderate way (not too
fast, and not to slow), we show that any orbit of (AVD)α, converges strongly to the element of argmin Φ which has
minimum norm.

4.1. A strong convergence result


Theorem 4.1. Let Φ : H → R be a convex continuously differentiable function such that argmin Φ is nonempty.
Suppose that α ≥ 3. Let  : [t0 , +∞[→ R+ be a C 1 nonincreasing function satisfying the following assumptions i), ii),
and iii):
i) limt→+∞ (t) = 0;
ii) Depending on α = 3 or α > 3, the function t → t2 (t) satisfies the following growth condition:
• limt→+∞ t2 (t) = +∞ in the case α = 3;
• t2 (t) ≥ c > 49 α(α − 3) for t large enough, and some positive constant c, in the case α > 3.
∞
iii) t0 (s) s ds < +∞.
Let x(·) be a classical global solution of (AVD)α, . Then, the trajectory is minimizing, and

lim inf x(t) − x∗  = 0,


t→+∞

where x∗ is the element of minimum norm of the closed convex nonempty set argmin Φ. Moreover

lim x(t) − x∗  = 0,
t→+∞

when either the trajectory {x(t) : t ≥ T } remains in the ball B(0, x∗ ), or in its complement, for T large enough.

Proof. The proof uses in an essential way the strong convexity of the function

(t)
ft (x) := Φ(x) + x2 ,
2
that was introduced in (33). By the strong convexity of ft

(t)
ft (x(t)) − ft (x(t) ) ≥ ∇ft (x(t) ), x(t) − x(t)  + x(t) − x(t) 2 .
2
Since ft achieves its (unique) minimum at x(t) , we have ∇ft (x(t) ) = 0, and the above relation gives the strong
minimum property
(t)
ft (x(t)) − ft (x(t) ) ≥ x(t) − x(t) 2 .
2

10
On the other hand, by definition of ft , and x∗ ∈ argmin Φ, we have ft (x∗ ) − ft (x(t) ) ≤ (t) ∗ 2
2 (x  − x(t) 2 ), from
which we deduce that
ft (x(t)) − ft (x(t) ) = (ft (x(t)) − ft (x∗ )) + (ft (x∗ ) − ft (x(t) ))
≤ ft (x(t)) − ft (x∗ ) + (t) ∗ 2
2 (x  − x(t)  ).
2

Combining the two above inequalities, we obtain

(t)  
x(t) − x(t) 2 + x(t) 2 − x∗ 2 ≤ ft (x(t)) − ft (x∗ ). (44)
2
Our strategy now consists in obtaining a majorization of ft (x(t)) − ft (x∗ ). To that end, we introduce the rescaled
energy function
1
Eλp (t) = tp t2 (ft (x(t)) − ft (x∗ )) + λ(x(t) − x∗ ) + tẋ(t)2 .
2
This technique of rescaling is parallel to that developed in the strongly convex case in [11], and [51]. But, in our
situation, the strong convexity property vanishes asymptotically. The parameters p, λ will be adjusted during the
proof. Let us differentiate Eλp , and use the relation ẍ(t) = − αt ẋ(t) − ∇ft (x(t)). After simplification, we obtain

tp+2
 
d p
dt Eλ (t) = (p + 2)tp+1 (ft (x(t)) − ft (x∗ )) + 2 (t)
˙ x(t)2 − x∗ 2
tp−1
+tp λ(p + λ + 1 − α)x(t) − x∗ , ẋ(t) + 2
2 pλ x(t) − x∗ 2
+tp+1 ( p2 + λ + 1 − α)ẋ(t)2 − tp+1 λ∇ft (x(t)), x(t) − x∗ .

By the strong convexity of ft

(t)
∇ft (x(t)), x(t) − x∗  ≥ ft (x(t)) − ft (x∗ ) + x(t) − x∗ 2 .
2
d p
Replacing in the above formulation of dt Eλ (t), we obtain

tp+2
 
d p
dt Eλ (t) ≤ tp+1 (p + 2 − λ)(ft (x(t)) − ft (x∗ )) + 2 (t)
˙ x(t)2 − x∗ 2
tp−1 λ
+tp λ(p + λ + 1 − α)x(t) − x∗ , ẋ(t) + 2 (pλ − t2 (t))x(t) − x∗ 2
+tp+1 ( p2 + λ + 1 − α)ẋ(t)2 .

Let us choose p = 23 (α − 3), and λ = 2α


3 , so that p + 2 − λ = 0, and p
2 + λ + 1 − α = 0. We obtain

d p ˙  ∗ 2
tp+2 (t)  tp λ(α − 3) tp−1 λ
Eλ (t) + x  − x(t)2 ≤ x(t) − x∗ , ẋ(t) + (pλ − t2 (t))x(t) − x∗ 2 . (45)
dt 2 3 2
By assumption ii), for t large enough, say t ≥ t1 , we have t2 (t) ≥ pλ = 49 α(α − 3). As a consequence

d p ˙  ∗ 2
tp+2 (t)  tp λ(α − 3) tp λ(α − 3) d
Eλ (t) + x  − x(t)2 ≤ x(t) − x∗ , ẋ(t) = x(t) − x∗ 2 .
dt 2 3 6 dt
By integrating on [t1 , t] (use integration by parts on the right-hand side)
 
t  ∗ 2  λ(α − 3) t p d
Eλp (t) + 1
2 sp+2 (s)
˙ x  − x(s)2 ds ≤ Eλp (t1 ) + s x(s) − x∗ 2 ds
6 ds
t1  t 1
t
∗ 2 ∗ 2 ∗ 2
= Eλp (t1 ) + λ(α−3)
6 t p
x(t) − x  − t p
1 x(t 1 ) − x  − p s p−1
x(s) − x  ds
2  t
t1
p ∗ 2 λp ∗ 2
≤ Eλ (t1 ) + 4 t x(t) − x  −
λp p
s x(s) − x  ds.
p−1
4 t1

By definition of Eλp , we have Eλp (t) ≥ tp+2 (ft (x(t)) − ft (x∗ )). From the above it follows that
 
1 t  ∗ 2  E p (t1 ) λp λp2 t
t2 (ft (x(t)) − ft (x∗ )) + sp+2 (s)
˙ x  − x(s)2 ds ≤ λ p + x(t) − x∗ 2 − p sp−1 x(s) − x∗ 2 ds
2tp t1 t 4 4t t1
Eλp (t1 ) λp
≤ + x(t) − x∗ 2 .
tp 4

11
Combining with (44) we obtain

t2 (t)   E p (t1 ) λp λp
x(t) − x(t) 2 + x(t) 2 − x∗ 2 ≤ λ p + x(t) − x(t) 2 + x(t) − x∗ 2
2 t 2 2
 t
1  ∗ 2 
− p sp+2 (s)
˙ x  − x(s)2 ds. (46)
2t t1

To go further
 ∗ 2in this Lyapunov
 analysis, the main difficulty comes from the last term in the above expression,
sp+2 (s)
˙ x  − x(s)2 , whose sign may vary along the trajectory. As we shall see, the trajectory may involve
oscillations which make difficult to treat this term by direct energetical arguments. We follow the method developed
by Attouch and Czarnecki in [13] for analyzing the asymptotic behavior of the heavy ball with friction dynamical
system, in the presence of a vanishing Tikhonov regularization term. The proof combines geometrical and energetical
arguments: depending on the sign of the term x∗ 2 − x(s)2 , we analyze separately the following three situations.
We denote by B(0, x∗ ) the open ball centered at the origin, with radius x∗ . Note that, by a continuity argument,
we might as well discuss with the corresponding closed ball.
2 2
Case (a) We assume that there exists some T ≥ t1 such that, for every t ≥ T , x∗  − x(t) ≤ 0. Equivalently,
/ B(0, x∗ ). Since ˙ ≤ 0, the last term in (46) is less or equal than zero. Hence,
x(t) ∈

t2 (t)   E p (t1 ) λp λp
x(t) − x(t) 2 + x(t) 2 − x∗ 2 ≤ λ p + x(t) − x(t) 2 + x(t) − x∗ 2 . (47)
2 t 2 2
t2 (t)
Dividing by 2 , and rearranging the terms of this inequality, it follows that


λp Eλp (t1 ) λp
1− 2 x(t) − x(t) 2 ≤ (x∗ 2 − x(t) 2 ) + 2 p+2 + x(t) − x∗ 2 . (48)
t (t) t (t) t2 (t)

Since (t) tends to zero as t goes to infinity, we have the following strong convergence property (see section 2.3 for
basic facts concerning Tikhonov approximation)

lim x(t) = x∗ .
→0

In order to pass to the limit in (48), we use condition (ii). Let us show that

lim sup x(t) − x(t) 2 ≤ 0. (49)

Let us successively examine the cases α = 3 and α > 3.


• If α = 3, we have p = 0, and (48) reduces to

Eλp (t1 )
x(t) − x(t) 2 ≤ (x∗ 2 − x(t) 2 ) + 2 . (50)
t2 (t)

By (ii), we have limt→+∞ t2 (t) = +∞, which immediately gives (49).




λp
• If α > 3, we have p > 0, and 1 − 2 ≥ 1 − λp c > 0. From (48) we deduce that
t (t)


λp E p (t1 ) 1
1− x(t) − x(t) 2 ≤ (x∗ 2 − x(t) 2 ) + 2 λ + x(t) − x∗ 2 ,
c c tp

which immediately gives (49).


Using again (18), we finally obtain
lim x(t) − x∗  = 0,
t→∞

which gives the claim in case (a).


2 2
Case (b) We assume that there exists some T ≥ t1 such that, for every t ≥ T , x∗  − x(t) > 0. Equivalently,
∞
x(t) ∈ B(0, x∗ ). By assumption t0 (s)
s ds < +∞, which by Proposition 2.1 implies

lim Φ(x(t)) = inf Φ. (51)


t→+∞ H

Let x(tn )  x̄ be a weakly converging sequence, with tn → +∞. Since Φ is convex continuous, it is lower semicontin-
uous for the weak topology, which gives

Φ(x̄) ≤ lim inf Φ(x(tn )) = inf Φ.


H
12
Hence x̄ ∈ argmin Φ. On the other hand, from
x(tn ) ≤ x∗ 
and the semicontinuity for the weak topology of the norm, we deduce that

x̄ ≤ lim inf x(tn ) ≤ x∗ .

We have x̄ ∈ argmin Φ, and x̄ ≤ x∗ , which implies x̄ = x∗ . As a consequence, the trajectory has a unique
sequential weak cluster point, namely x∗ , which implies that the whole trajectory converges weakly to x∗ . In order to
pass from the weak to the strong convergence, we use again the inequality

x(t) ≤ x∗ ,

which gives
lim sup x(t) ≤ x∗ .
Hence,
lim x(t) − x∗  = 0,
t→∞

which gives the claim in case (b).


Case (c) We suppose that for any T ≥ t1 there exists t ≥ T such that x(t) ∈ B(0, x∗ ), and there exists s ≥ T
such that x(s) ∈ / B(0, x∗ ). Equivalently, for any T ≥ 0 the trajectory {x(t) : t ≥ T } does not remain in the ball
B(0, x ), and does not remain in the complement of the ball B(0, x∗ ). It passes indefinitely from one domain to

the other (which reflects an oscillatory behavior). Thus, there exists a sequence (tn ), with tn → +∞, such that for all
n ∈ N, x(tn ) = x∗ . By a similar argument to that used in case (b) we obtain that the sequence (x(tn )) converges
strongly to x∗ . Hence,
lim x(tn ) − x∗  = 0,
n→∞

which gives the claim


lim inf x(t) − x∗  = 0.
t→∞


∞
Corollary 4.1. By taking (t) = t1r , with 0 < r < 2, we have limt→∞ t2 (t) = +∞, and t0 (t)
t dt < +∞. Hence, all
the conditions of Theorem 4.1 are satisfied. As a consequence, we obtain the conclusions of part B of Theorem 1.1
(presented in the introduction) .

Corollary 4.2. Taking (t) = tc2 , with α > 3 and c > 49 α(α − 3), all the conditions of Theorem 4.1 are satisfied, so
its conclusions are met. This shows once again the usefulness of taking α > 3.

Remark 4.1. Regarding the case (b), we can slightly weaken the condition, just assuming that the distance of x(t) to
the ball B(0, x∗ ) goes to zero (instead of assuming that the trajectory remains in the ball). The strong convergence
to the minimum norm solution is still satisfied.

4.2. A strong ergodic convergence result


The following situation corresponds to a very slow vanishing damping coefficient (t). It is a limiting case of the
situation examined in the previous paragraph. Note that we obtain a strong ergodic convergence result, but we don’t
know if the trajectory is minimizing.

Theorem 4.2. Let Φ : H → R be a convex continuously differentiable function such that argmin Φ is nonempty.
Suppose that α > 0. Let  : [t0 , +∞[→ R+ be a C 1 nonincreasing function such that limt→+∞ (t) = 0, and such that
 +∞
(t)
dt = +∞. (52)
t0 t

Let x(·) be a classical global solution of (3). Then, the following ergodic convergence property is satisfied
 t
1 (τ )
lim  t (τ )
x(τ ) − x∗ 2 dτ = 0, (53)
t→+∞ τ
t0 τ dτ t0

where x∗ is the element of minimum norm of argmin Φ. In particular, the following convergence property holds

lim inf x(t) − x∗  = 0. (54)


t→+∞

13
Proof. Our proof is an adaptation to our situation of the argument developed by Cominetti-Peypouquet-Sorin in
[31]. Let x∗ := projargmin Φ 0 be the unique element of minimum norm of the closed convex nonempty set argmin Φ,
and set
1
h(t) = x(t) − x∗ 2 . (55)
2
By a similar computation as in Proposition 2.1
α α
ḧ(t) + ḣ(t) = ẋ(t)2 + x(t) − x∗ , ẍ(t) + ẋ(t). (56)
t t
We use the function ft previously introduced in (33)

(t)
x → ft (x) := Φ(x) + x2 ,
2
and observe that

∇ft (x(t)) = ∇Φ(x(t)) + (t)x(t)


α
= −ẍ(t) − ẋ(t).
t
By the strong convexity property of ft , and the above relation, we infer

(t) ∗
ft (x∗ ) ≥ ft (x(t)) + ∇ft (x(t)), x∗ − x(t) + x − x(t)2
2
α (t) ∗
≥ ft (x(t)) + −ẍ(t) − ẋ(t), x∗ − x(t) + x − x(t)2 .
t 2
Equivalently,
α
x(t) − x∗ , ẍ(t) + ẋ(t) + (t)h(t) ≤ ft (x∗ ) − ft (x(t)). (57)
t
By definition of x , we have

(t)
ft (x(t) ) = Φ(x(t) ) + x(t) 2 (58)
2
(t)
≤ Φ(x(t)) + x(t)2 = ft (x(t)). (59)
2
Combining (57) and (59), we obtain
α
x(t) − x∗ , ẍ(t) + ẋ(t) + (t)h(t) ≤ ft (x∗ ) − ft (x(t) ). (60)
t
Since Φ(x∗ ) ≤ Φ(x(t) ), we have

(t) ∗ 2 (t)
ft (x∗ ) − ft (x(t) ) = Φ(x∗ ) + x  − Φ(x(t) ) − x(t) 2 (61)
2 2
(t)
≤ (x∗ 2 − x(t) 2 ). (62)
2
Combining (60) and (61), we get

α (t)
x(t) − x∗ , ẍ(t) + ẋ(t) + (t)h(t) ≤ (x∗ 2 − x(t) 2 ). (63)
t 2
Returning to (56) we obtain

α (t)
ḧ(t) + ḣ(t) + (t)h(t) ≤ ẋ(t)2 + (x∗ 2 − x(t) 2 ). (64)
t 2
Equivalently
(t) 1 d
(t)h(t) ≤ ẋ(t)2 + (x∗ 2 − x(t) 2 ) − α (tα ḣ(t)). (65)
2 t dt
After dividing by t
(t) 1 (t) 1 d
h(t) ≤ ẋ(t)2 + (x∗ 2 − x(t) 2 ) − α+1 (tα ḣ(t)). (66)
t t 2t t dt

14
Set
1
(x∗ 2 − x(t) 2 ),
δ(t) :=
2
which by (19) is nonnegative, and by the strong convergence property of the Tikhonov approximation satisfies

lim δ(t) = 0.
t→+∞

Let us integrate (66) on [t0 , t]. From (12), it follows that, for some positive constant C,
 t  t
(τ ) 1 d α
(h(τ ) − δ(τ ))dτ ≤ C − (τ ḣ(τ ))dτ. (67)
t0 τ t0 τ α+1 dτ

Integrating by parts twice this last integral, we obtain (for some other constant C)
 t t  t
1 d α 1 1
(τ ḣ(τ ))dτ = ḣ(s) + (α + 1) 2
ḣ(s)ds
t0 τ α+1 dτ s t0 t0 s
 t
1 α+1 1
= C + ḣ(t) + 2 h(t) + 2(α + 1) h(s)ds.
t t t0 s3

Since h is nonnegative, it follows that


 t
1 d α 1
(τ ḣ(τ ))dτ ≥ C + ḣ(t).
t0 τ α+1 dτ t

Combining the above inequality with (67), we obtain the existence of some positive constant C such that for all t ≥ t0
 t
(τ ) 1
(h(τ ) − δ(τ ))dτ ≤ C + |ḣ(t)|. (68)
t0 τ t

According to (11), we have sups≥t0 ẋ(s) < +∞. By definition of h(·), and elementary calculus, we deduce that

|ḣ(t)| ≤ ẋ(t)x(t) − x∗  ≤ sup ẋ(s)(x(t0 ) − x∗  + (t − t0 ) sup ẋ(s)). (69)


s≥t0 s≥t0

From this, we obtain the existence of some (other) positive constant C such that for all t ≥ t0

|ḣ(t)| ≤ C(1 + t),

which gives
1
sup |ḣ(t)| < +∞.
t≥t0 t

Returning to (68), we obtain the existence of some positive constant C such that for all t ≥ t0
 t
(τ )
(h(τ ) − δ(τ ))dτ ≤ C. (70)
t0 τ
t )  +∞ (t)
Let us divide (70) by t0 (τ
τ dτ , and let t → +∞. Since t0 t dt = +∞, and limt→+∞ δ(t) = 0, we obtain the
ergodic convergence result  t
1 (τ )
lim  x(τ ) − x∗ 2 dτ = 0. (71)
t→+∞ t (τ ) dτ t τ
t0 τ 0

In particular
lim inf x(t) − x∗  = 0.
t→+∞

 +∞ 1
Remark 4.2. The property t0 (t) t dt = +∞ is satisfied by (t) = (ln t)γ , for 0 < γ ≤ 1, and therefore the conclusions
of Theorem 4.2 are valid in this case.

15
5. An illustrative example
Let us examine simple situations where we are able to compute explicitly the trajectories of (AVD)α, , and hence
analyze their asymptotic behavior. We use a symbolic differential computation software to determine explicit solutions
for (AVD)α, in terms of classical functions, and Bessel functions of first and second type. We used WolframAlpha
Computational Knowledge Engine, available at http://www.wolframalpha.com.
Let Φ : R → R be the function which is identically zero, i.e., Φ(x) = 0 for all x ∈ R. The (AVD)α system (without
Tikhonov regularization term) writes
α
(AVD)α ẍ(t) + ẋ(t) = 0. (72)
t
An elementary computation shows that, for any α > 1, each trajectory of (72) converges. Its limit is equal to
t0
x(t0 ) + α−1 ẋ(t0 ), which depends on the Cauchy data.
Let us now examine the convergence properties of the trajectories of the corresponding (AVD)α, system,
α
(AVD)α, ẍ(t) + ẋ(t) + (t)x(t) = 0, (73)
t
which includes a vanishing Tikhonov regularization term. Note that the set of minimizers of Φ is the whole real line,
whose minimum norm element is precisely zero. Since the convergence of values is trivially satisfied in this case, the
only relevant question is to examine the convergence of trajectories toward zero. In all the following examples we take
as Cauchy data x(1) = 1 and ẋ(1) = 0.
We consider the case where (t) = t1r , which is analyzed in Theorem 1.1. We examine successively the ”slow
vanishing case” (t) = t1r , r < 2, and the ”very slow vanishing case” , then the ”fast vanishing case” (t) = t1r , r > 2,
and finally the ”critical case” (t) = t12 .

1
5.1. Slow vanishing case: (t) = .
t
The system writes ⎧
⎨ α 1
ẍ(t) + ẋ(t) + x(t) = 0.
t t (74)

x(1) = 1; ẋ(1) = 0.
This system falls within the scope of the ”slow vanishing case”, for which we know that the trajectories converge
to the solution with minimum norm (here the origin). The following table summarizes the results and shows the role
played by the value of the coefficient α. The results are expressed in terms of Jγ and Yγ which are the Bessel functions
1 1
of the first and the second kind, respectively, with parameter γ. We use that |Jγ (t)| = O(t− 2 ) and |Yγ (t)| = O(t− 2 )
(see [38, Section 5.11]).
• For α = 1 we get √ √
J1 (2)Y0 (2 t) − Y1 (2)J0 (2 t)
x(t) =
J1 (2)Y0 (2) − J0 (2)Y1 (2)
which gives
1
|x(t)| = O( 1 ).
t4
• For α = 2 we get
√ √
(Y0 (2) − Y1 (2) − Y2 (2)) J1 (2 t) + (−J0 (2) + J1 (2) + J2 (2)) Y1 (2 t)
x(t) = √
t [(J2 (2) − J0 (2))Y1 (2) + J1 (2)(Y0 (2) − Y2 (2))]
which gives
1
|x(t)| = O( 3 ).
t4
• For α = 3 we get
√ √
(Y1 (2) − 2Y2 (2) − Y3 (2)) J2 (2 t) + (−J1 (2) + 2J2 (2) + J3 (2)) Y2 (2 t)
x(t) =
t(J3 (2) − J1 (2))Y2 (2) + J2 (2)(Y1 (2) − Y3 (2))
which gives
1
|x(t)| = O(
5 ).
t4
In accordance with the conclusions of Theorem 1.1, we observe that in each case the trajectory converges to zero,
the solution with minimum norm. The following table summarizes the rate of convergence of the trajectories to
zero.
16
α 1 2 3 4
|x(t)| O( 11 ) O( 13 ) O( 15 ) O( 17 )
t4 t4 t4 t4

This suggests that these results obey a simple rule. Indeed, one can show that for any α > 0
1
|x(t)| = O( 2α−1 ),
t 4

1
which is in accordance with the above table. The above formula also suggests that α = 2 is a critical value (for the
above situation). Indeed, an approximate solution of

⎨ 1 1
ẍ(t) + ẋ(t) + x(t) = 0.
2t t (75)

x(1) = 1; ẋ(1) = 0
is given by √ √
x(t) = 0.909297 sin(2 t) − 0.416147 cos(2 t),
which clearly shows an oscillatory behavior, and fails to converge asymptotically.

1
5.2. Very slow vanishing case: (t) = 1+ln t.
The system writes ⎧
⎨ α 1
ẍ(t) + ẋ(t) + x(t) = 0.
t 1 + ln t (76)

x(1) = 1; ẋ(1) = 0.
 +∞
This system falls within the scope of the ”very slow vanishing case” t0 (t) t dt = +∞. In contrast to the other
cases which are examined in this section, we are not able to obtain an explicit form of the solution. We can only
compute the solution by approximate numerical methods. The following table summarizes the results, it shows the
convergence to the minimum norm solution, namely the zero element, and enlights the role played by the value of the
coefficient α. In this situation, there is a numerical evidence that taking α large improves the speed of convergence.

α 1 2 3 4
x(10) 0.319 0.038 0.04 −0.06
x(100) −0.138 −0.008 0.001 6 × 10−4
x(1000) 0.048 0.002 6 × 10−5 −2.7 × 10−6

1
5.3. Fast vanishing case: (t) = .
t3
The system writes ⎧
⎪ α 1
⎨ ẍ(t) + ẋ(t) + 3 x(t) = 0.
t t (77)


x(1) = 1; ẋ(1) = 0.
 +∞
This example falls within the scope of the ”fast vanishing case” t0 t(t)dt < +∞. For α > 3, we know from
Theorem 3.1 that the solution trajectory of (77) converges to a minimizer of Φ, which here can be any real number.
The following table summarizes the results in the case α = 4. It confirms that the limit exists, but is different from
the minimum norm solution, i.e., there is no asymptotic effect of the Tikhonov regularizating term. The results were
obtained using the following explicit form of the solution of (77)

32       
1 1 1
x(t) = C (Y2 (2) + 3Y3 (2) − Y4 (2)) J3 2 + (−J2 (2) − 3J3 (2) + J4 (2)) Y3 2 (78)
t t t
−1
C = ((J4 (2) − J2 (2)) Y3 (2) + J3 (2) (Y2 (2) − Y4 (2))) .
t 10 100 1000 10000
x(t) 0.74257 0.709214 0.70602 0.705703
1
 x α
Note that these numerical results are in accordance with formula (78). Indeed, for x close to zero, Jα (x) ≈ Γ(α+1) 2 ,
 2 α
and Yα (x) ≈ − Γ(α)
π x , which gives
J4 (2)Γ(3)
lim x(t) = −C .
t→+∞ π

17
1
5.4. Critical case: (t) = .
t2
The system writes ⎧
⎪ α 1
⎨ ẍ(t) + ẋ(t) + 2 x(t) = 0.
t t (79)


x(1) = 1; ẋ(1) = 0.
We are in the critical case, which is at the frontier of the two above situations. The following table summarizes the
results and shows the role played by the value of the coefficient α.

α 1 2 3  4 
1
√ 1√   √  √
1 12 (−3− 5)
√ √ √
x(t) cos(ln t) √ 3 sin 2 3 ln t + 3 cos 12 3 ln t ln t+1
10 t (5 + 3 5)t 5 − 3 5 + 5
t t


 ln t 
|x(t)| not conv. O( √1t ) O t O 3−1√5
t 2

Comments:
1
1. When (t) = r , the above results confirm that r = 2 is a critical exponent. Convergence to the solution with
t
minimum norm depends on r is greater or less than 2. The results also show the important role played by the
coefficient α. Recall that for (AVD)α system, it is shown that for a strongly convex potential function Φ, the
rate of convergence to the unique minimizer can be made arbitrarily fast with α large, see [11]. In the cases
above, we are close to this situation, which may explain why a similar phenomenon occurs. This is an interesting
question for future research.
2. Another natural question concerns the comparison with the results of [13] concerning the Tikhonov regularization
of the heavy ball with friction method (HBF) . In this case, the damping coefficient is a fixed positive number,
 +∞
and the trajectories converge to the minimum norm solution under the sole assumption t0 (t)dt = +∞. Let
us give an example where we compare the solutions of the two systems. A systematic study of this question is
an interesting subject for further research.

⎪ 3 1
⎨ ẍ(t) + ẋ(t) + x(t) = 0.
(AVD)α, t t (80)


x(1) = 1; ẋ(1) = 0.
and ⎧
⎪ 1
⎨ ÿ(t) + 3ẏ(t) + y(t) = 0.
(HBF) t (81)


y(1) = 1; ẏ(1) = 0.

t 10 20 50 100
x(t) −0.098 0.018 −0.010 0.006
y(t) 0.455 0.358 0.263 0.208

In this example, we can see that (AVD)α, outperforms the Tikhonov regularization of the heavy ball method.
Indeed, a too large damping coefficient makes the latter system similar to the steepest descent method, and
hence makes it relatively slow. Specifically, the idea behind the (AVD)α, and the accelerated gradient method
of Nesterov is to take a damping coefficient not too big, to enhance the inertial effect. This is a good strategy
for the values of t that are not too large. Combining with a restart method provides effective numerical method,
that would be interesting to study for the (AVD)α, system.

6. Conclusion, perspective

a) Within the framework of convex optimization, we presented a second-order differential system (AVD)α, whose
asymptotic behavior combines two distinct effects:
1. The asymptotically vanishing viscosity coefficient αt corresponds to a continuous version of the accelerated
gradient method of Nesterov. It is associated with a rapid minimization property, Φ(x(t)) − minH Φ ≤ tC2 .

18
2. The Tikhonov regularization with asymptotically vanishing coefficient (t) gives an asymptotic hierarchical min-
imization. In our context, it tends to make the trajectories of (AVD)α, converge strongly to the minimizer of
minimum norm.
These two properties are important in optimization, which justifies our interest to combine them into a single
dynamic system. However, our analysis shows that they are in some way antagonistic. We obtained the above
properties by requiring the Tikhonov parametrization t → (t) to verify the following asymptotic behavior:
 +∞
• Property (1) holds true when t → (t) satisfies t0 t(t)dt < +∞, which reflects a ”fast vanishing” property of
(t) as t → +∞. As a model example, (t) = t1r , with r > 2.
• Property (2) holds true when t → (t) satisfies limt→∞ t2 (t) = +∞, which reflects a ”slow vanishing” property
of (t) as t → +∞. As a model example, (t) = t1r , with r < 2.

It is an open question whether one can simultaneously obtain both above asymptotic properties in a simple dynamic
system: fast minimization and strong convergence towards the solution with minimum norm. With this regard, the
case (t) = tc2 is particularly interesting, because it is on the border of the two above situations. We know that
the minimization property holds in this case (Proposition 2.1), and that for α > 3, and c > 49 α(α − 3), then strong
convergence towards the minimum norm solution is satisfied (Theorem 4.1 and Corollary 4.1). This shows once again
the usefulness of taking α > 3. This case certainly requires further study.
Another puzzling question is to obtain the strong convergence of the whole trajectory, instead of the corresponding
property with the lower limit. This phenomenon was analyzed in [13] in the case of the heavy ball system, which
provides more damping and less oscillations. The situation in which we are not able to conclude, corresponds to
a trajectory that enters and goes out of the ball B(0, x∗ ) an infinite number of times, with a highly oscillating
behavior. It is likely that this behavior is very rare.
b) One may think using another type of damped inertial dynamic system, for example involving a geometric
damping as in [5], [37]. The numerical experiments suggest that, in accordance with [11], taking α large improves the
rate of convergence to the solution with minimum norm. Restarting may be also an efficient strategy, see [44], [51].
c) Passing from a smooth potential to a nonsmooth potential Φ offers interesting perspectives (unilateral mechanics,
PDE’s, control, optimization). From the numerical optimization point of view, the discretization of the second-order
differential inclusion
α
ẍ(t) + ẋ(t) + ∇Φ(x(t)) + ∂Ψ(x(t)) + (t)x(t)  0,
t
where Φ (resp. Ψ) is a smooth (resp. nonsmooth) potential naturally leads to a new class of forward-backward
algorithms. By a similar device as in [11], one obtains the inertial forward-backward algorithm with Tikhonov regu-
larization ⎧
⎨ yk = xk + k+α−1
k−1
(xk − xk−1 );
(82)

xk+1 = proxsΨ (yk − s(∇Φ(yk ) + k yk )) .
Since the Lyapunov techniques developed in this paper can be naturally extended to the case of a nonsmooth
potential, it is likely that a parallel analysis can be developed for this algorithm. Thus, the study of the above
algorithmic version of (AVD)α, is an interesting subject for further research.
d) In view of unifying the continuous (ode) and discrete (algorithmic) aspects, one might consider a parametrization
t → (t) with less regularity than in the paper. The case of absolutely continuous, or BV parametrization functions
would be interesting to study.

7. Appendix: Some auxiliary results

In this section, we present some auxiliary lemmas that are used in the paper. These results can be found in [11].
We reproduce them here for the convenience of the reader.
To establish the weak convergence of the solutions of (2), we will use Opial’s Lemma [45], that we recall in its
continuous form. This argument was first used in [25] to establish the convergence of nonlinear contraction semigroups.

Lemma 7.1. Let S be a nonempty subset of H, and let x : [0, +∞[→ H. Assume that
(i) for every z ∈ S, limt→∞ x(t) − z exists;
(ii) every sequential weak cluster point of x(t), as t → ∞, belongs to S.
Then x(t) converges weakly as t → ∞ to a point in S.

The following allows us to establish the existence of a limit for a real-valued function, as t → +∞:
19
Lemma 7.2. Let δ > 0, and let w : [δ, +∞[→ R be a continuously differentiable function which is bounded from below.
Assume
tẅ(t) + αẇ(t) ≤ g(t), (83)
for some α > 1, almost every t > δ, and some nonnegative function g ∈ L1 (δ, +∞). Then, the positive part [ẇ]+ of ẇ
belongs to L1 (t0 , +∞), and limt→+∞ w(t) exists.

Proof. Multiply (83) by tα−1 to obtain


dα 
t ẇ(t) ≤ tα−1 g(t).
dt
By integration, we obtain 
δ α |ẇ(δ)| 1 t
ẇ(t) ≤ + α sα−1 g(s)ds.
tα t δ
Hence, 
δ α |ẇ(δ)| 1 t
[ẇ]+ (t) ≤ + α sα−1 g(s)ds,
tα t δ
and so,  

∞ ∞
δ α |ẇ(δ)| 1 t
[ẇ]+ (t)dt ≤ + sα−1 g(s)ds dt.
δ (α − 1)δ α−1 δ tα δ

Applying Fubini’s Theorem, we deduce that


 ∞
 t  ∞
 ∞  ∞
1 α−1 1 α−1 1
s g(s)ds dt = dt s g(s)ds = g(s)ds.
δ tα δ δ s tα α−1 δ

As a consequence,  
∞ ∞
δ α |ẇ(δ)| 1
[ẇ]+ (t)dt ≤ + g(s)ds < +∞.
δ (α − 1)δ α−1 α−1 δ

Finally, the function θ : [δ, +∞) → R, defined by


 t
θ(t) = w(t) − [ẇ]+ (τ ) dτ,
δ

is nonincreasing and bounded from below. It follows that


 +∞
lim w(t) = lim θ(t) + [ẇ]+ (τ ) dτ
t→+∞ t→+∞ δ

exists. 

The following is a continuous version of Kronecker’s Theorem for series.

Lemma 7.3. Take δ > 0, and let f ∈ L1 (δ, +∞) be nonnegative and continuous. Consider a nondecreasing function
ψ : [δ, +∞[→]0, +∞[ such that lim ψ(t) = +∞. Then,
t→+∞

 t
1
lim ψ(s)f (s)ds = 0.
t→+∞ ψ(t) δ

Proof. Given  > 0, fix t sufficiently large so that


 ∞
f (s)ds ≤ .
t
t
Then, for t ≥ t , split the integral δ
ψ(s)f (s)ds into two parts to obtain
 t  t  t  t  t
1 1 1 1
ψ(s)f (s)ds = ψ(s)f (s)ds + ψ(s)f (s)ds ≤ ψ(s)f (s)ds + f (s)ds.
ψ(t) δ ψ(t) δ ψ(t) t ψ(t) δ t

Now let t → +∞ to deduce that  t


1
0 ≤ lim sup ψ(s)f (s)ds ≤ .
t→+∞ ψ(t) δ

Since this is true for any  > 0, the result follows. 


20
Acknowledgments With the support of ECOS grant C13E03. Effort sponsored by the Air Force Office of
Scientific Research, Air Force Material Command, USAF, under grant number FA9550-14-1-0056.
Supported for the second and third authors by the National Center for Scientific and Technical Research (CNRST
Morocco) under grant URAC01.

References

[1] S. Adly, H. Attouch, A. Cabot, Finite time stabilization of nonlinear oscillators subject to dry friction, Nonsmooth Mechanics and
Analysis (edited by P. Alart, O. Maisonneuve and R.T. Rockafellar), Adv. in Math. and Mech., Kluwer (2006), pp. 289–304.
[2] F. Alvarez, On the minimizing property of a second-order dissipative system in Hilbert spaces, SIAM J. Control Optim. 38 (4) (2000)
1102–1119.
[3] F. Alvarez, H. Attouch, An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator
with damping, Set-Valued Analysis, 9 (1-2) (2001) 3–11.
[4] F. Alvarez, H. Attouch, Convergence and asymptotic stabilization for some damped hyperbolic equations with non-isolated
equilibria, ESAIM Control Optim. Calc. Var. 6 (2001) 539–552.
[5] F. Alvarez, H. Attouch, J. Bolte, P. Redont, A second-order gradient-like dissipative dynamical system with Hessian-driven
damping. Application to optimization and mechanics, J. Math. Pures Appl. 81 (8) (2002) 747–779.
[6] F. Alvarez, A. Cabot, Asymptotic selection of viscosity equilibria of semilinear evolution equations by the introduction of a slowly
vanishing term, Discrete Contin. Dyn. Syst. 15 (2006) 921–938.
[7] H. Attouch, Viscosity solutions of minimization problems, SIAM J. Optim. 6 (3) (1996) 769–806.
[8] H. Attouch, L.M. Briceño-Arias, P.L. Combettes, A parallel splitting method for coupled monotone inclusions, SIAM J. Control
Optim. 48 (5) (2010) 3246–3270.
[9] H. Attouch, L.M. Briceño-Arias, P.L. Combettes, A strongly convergent primal-dual method for nonoverlapping domain
decomposition, Numer. Math. 133 (3) (2016) 443–470, ISSN: 0029-599X (Print) 0945-3245 (Online).
[10] H. Attouch, A. Cabot, P. Redont, The dynamics of elastic shocks via epigraphical regularization of a differential inclusion, Adv.
Math. Sci. Appl. 12 (1) (2002) 273–306.
[11] H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing
damping, Math. Program. Ser B, published online 24 March 2016, arXiv:1507.01367v1 [math.OC], arXiv:1507.04782 [math.OC].
[12] H. Attouch, R. Cominetti, A dynamical approach to convex minimization coupling approximation with the steepest descent method,
J. Differential Equations, 128 (2) (1996) 519–540.
[13] H. Attouch, M.-O. Czarnecki, Asymptotic control and stabilization of nonlinear oscillators with non-isolated equilibria, J. Differ-
ential Equations 179 (2002) 278–310.
[14] H. Attouch, M.-O. Czarnecki, Asymptotic behavior of coupled dynamical systems with multiscale aspects, J. Differential Equations
248 (2010) 1315–1344.
[15] H. Attouch, M.-O. Czarnecki, Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects, J.
Differential Equations, in press, http://dx.doi.org/10.1016/j.jde.2016.11.009, arXiv:1602.00232v1 [math.OC] 31 Jan 2016.
[16] H. Attouch, M.-O. Czarnecki, J. Peypouquet, Prox-penalization and splitting methods for constrained variational problems,
SIAM J. Optim. 21 (2011) 149–173.
[17] H. Attouch, M.-O. Czarnecki, J. Peypouquet, Coupling forward-backward with penalty schemes and parallel splitting for
constrained variational inequalities, SIAM J. Optim. 21 (2011) 1251–1274.
[18] H. Attouch, X. Goudou, P. Redont, The heavy ball with friction method. The continuous dynamical system, global exploration of
the local minima of a real-valued function by asymptotical analysis of a dissipative dynamical system, Commun. Contemp. Math. 2
(1) (2000) 1–34.
[19] H. Attouch, J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than
1
, to appear in SIOPT, arXiv:1510.08740v2 [math.OC] 1 Nov 2015.
k2
[20] J.-B. Baillon, R. Cominetti, A convergence result for non-autonomous subgradient evolution equations and its application to the
steepest descent exponential penalty trajectory in linear programming, J. Funct. Anal. 187 (2001) 263-273.
[21] H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert spaces , CMS Books in Mathematics,
Springer, (2011).
[22] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci. 2 (1)
2009 183–202.
[23] R. I. Bot, E. R. Csetnek, Forward-Backward and Tseng’s type penalty schemes for monotone inclusion problems, Set-Valued Var.
Anal. 22 (2014) 313–331.
[24] F. E. Browder, Existence and approximation of solutions of nonlinear variational inequalities, Proc. Nat. Acad. Sci. U.S.A. 56 (1966)
1080–1086.
[25] R.E. Bruck, Asymptotic convergence of nonlinear contraction semigroups in Hilbert spaces, J. Funct. Anal. 18 (1975) 15–26.

21
[26] A. Cabot, Inertial gradient-like dynamical system controlled by a stabilizing term, J. Optim. Theory Appl. 120 (2004) 275–303.
[27] A. Cabot, Proximal point algorithm controlled by a slowly vanishing term: Applications to hierarchical minimization, SIAM J.
Optim. 15 (2) (2005) 555–572.
[28] A. Cabot, H. Engler, S. Gadat, On the long time behavior of second order differential equations with asymptotically small
dissipation, Trans. Amer. Math. Soc. 361 (2009) 5983–6017.
[29] A. Chambolle, Ch. Dossal, On the convergence of the iterates of Fista, HAL Id: hal-01060130 https://hal.inria.fr/hal-01060130v3
Submitted on 20 Oct 2014.
[30] R. Cominetti, Coupling the proximal point algorithm with approximation methods, J. Optim. Theory Appl. 95 (3) (1997) 581–600.
[31] R. Cominetti, J. Peypouquet, S. Sorin, Strong asymptotic convergence of evolution equations governed by maximal monotone
operators with Tikhonov regularization, J. Differential Equations, 245 (2008) 3753–3763.
[32] A. Fiacco, G. McCormick, Nonlinear programming: Sequential Unconstrained Minimization Techniques, John Wiley and Sons, New
York, (1968).
[33] A. Haraux, M.A. Jendoubi, The convergence problem for dissipative autonomous systems, Springer Briefs in Mathematics, (2015).
[34] A. Haraux, M.A. Jendoubi, A Liapunov function approach to the stabilization of second-order coupled systems, (2016) arXiv preprint
arXiv:1604.06547.
[35] S.A. Hirstoaga, Approximation et résolution de problèmes d’équilibre, de point fixe et d’inclusion monotone. PhD thesis, Université
Pierre et Marie Curie - Paris VI, 2006. HAL Id: tel-00137228 https://tel.archives-ouvertes.fr/tel-00137228.
[36] M.A. Jendoubi, R. May, On an asymptotically autonomous system with Tikhonov type regularizing term, Archiv der Mathematik
95 (4) (2010) 389–399.
[37] M.A. Jendoubi, R. May, Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source
term, Appl. Anal. 94 (2) (2015) 435–443.
[38] N. N. Lebedev, Special functions and their applications. Revised edition, translated from the Russian and edited by Richard A.
Silverman. Unabridged and corrected republication. Dover Publications, Inc., New York, 1972. xii+308 pp.
[39] R. May, Asymptotic for a second order evolution equation with convex potential and vanishing damping term, arXiv:1509.05598.
[40] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2), Soviet Math. Dokl. 27 (1983)
372–376.
[41] Y. Nesterov, Introductory lectures on convex optimization: A basic course, volume 87 of Applied Optimization. Kluwer Academic
Publishers, Boston, MA, 2004.
[42] Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program. 103 (1) (2005) 127–152.
[43] Y. Nesterov, Gradient methods for minimizing composite functions, Math. Program. 140 (1) (2013) 125–161.
[44] B. O’Donoghue, E Candès, Adaptive restart for accelerated gradient schemes, Found. Comput. Math. 15 (3) (2015) 715–732.
[45] Z. Opial, Weak convergence of the sequence of successive approximations for nonexpansive mappings, Bull. Amer. Math. Soc. 73
(1967) 591–597.
[46] N. Parikh, S. Boyd, Proximal algorithms, Found. Trends Optim. 1 (2013) 123–231.
[47] J. Peypouquet, Convex Optimization in Normed spaces, Springer Briefs in Optimization, 2015.
[48] J. Peypouquet, S. Sorin, Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time,
J. Convex Anal. 17 (3-4) (2010) 1113–1163.
[49] B. Polyak, Introduction to Optimization, New York, NY: Optimization Software - Inc, Publications Division, 1987.
[50] S. Reich, Strong convergence theorems for resolvents of accretive operators in Banach spaces, J. Math. Anal. Appl. 15 (1980) 287–292.
[51] W. Su, S. Boyd, E. J. Candès, A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights.
NIPS, December 2014.
[52] A. N. Tikhonov, Doklady Akademii Nauk SSSR 151 (1963) 501–504, (Translated in ”Solution of incorrectly formulated problems
and the regularization method”. Soviet Mathematics 4 (1963) 1035–1038).
[53] A. N. Tikhonov, V. Y. Arsenin, Solutions of Ill-Posed Problems, Winston, New York, 1977.
[54] D. Torralba, Convergence epigraphique et changements d’échelles en analyse variationnelle et optimisation, Phd thesis, Université
Montpellier 2, 1996.

22

You might also like