You are on page 1of 104

Thesis for the Degree of Doctor of Philosophy

Some computational aspects of Markov processes

Alexey Lindo

Division of Mathematical Statistics


Department of Mathematical Sciences
Chalmers University of Technology
and University of Gothenburg
Gteborg, Sweden 2016

Some computational aspects of Markov processes

Alexey Lindo
ISBN 978-91-7597-326-5
Alexey Lindo, 2016.
Doktorsavhandlingar vid Chalmers tekniska hgskola
Ny serie nr 4007
ISSN 0346-718X
Department of Mathematical Sciences
Chalmers University of Technology
and University of Gothenburg
412 96 Gteborg
Sweden
Phone: +46 (0)31-772 10 00

Printed in Gteborg, Sweden, 2016

Some computational aspects of Markov processes


Alexey Lindo

Abstract
Markov processes are among the most widely used stochastic models both in theory
and applications. Therefore, we are especially interested in obtaining explicit results,
either analytically or numerically.
In the first paper of the thesis we consider a class of Markov processes called GaltonWatson branching processes. Linear-fractional Galton-Watson processes are among the
few cases where many characteristics of the process can be computed explicitly. We
extend the two-parameter linear-fractional family to a much richer four-parameter family of reproduction laws. The corresponding process also allows explicit computations,
with the possibility of an infinite mean, or even an infinite number of offspring.
The linear-fractional family can be otherwise generalized by introducing the types
for the individuals. Recently, Sagitov studied such processes with countably many types.
In our second paper, we study the linear-fractional processes with a general type space.
For this family of processes, we obtained transparent limit theorems for the subcritical,
critical and supercritical cases. Furthermore, for general linear-fractional processes we
proved an abstract Perron-Frobenius theorem, without any conditions on -algebra of
the type space and without employing the Nummelin-Athreya-Ney regeneration technique.
In the third paper, we develop new non-parametric estimation methods for the
compound Poisson distribution. The concerned estimation problem arises in the inference of Lvy processes recorded at equidistant time intervals. The key estimator is
based on the series decomposition of functionals of a measure and relies on the steepest
descent technique recently developed in the variational analysis of measures. We provide a computer implementation of the developed method together with a broad range
of simulation results.
The fourth paper of the thesis concerns limit theorems involving two functionals
for a random matrix with independent and uniformly distributed elements drawn from
the finite cyclic group of integers. Applying the Chen-Steins method we derive bounds
on total variation distance between a Poisson distribution and the distribution of the
functionals.

Keywords: Galton-Watson process, embeddability, Gumbel distribution, Markov chain


with a general state space, probability generating functional, infinitely divisible distribution, Lvy process, gradient method, Chen-Steins method, random matrix.

Preface
This thesis consists of the following papers.
(1) Sagitov, S., Lindo, A. (2016)
A special family of Galton-Watson processes with explosions.
In Branching Processes and Their Applications. Lect. Notes Stat.
Proc. (I.M. del Puerto et al eds.) Springer (to appear).
(2) Lindo, A., Sagitov, S. (2015)
General linear-fractional branching process with discrete time.
Preprint.
(3) Lindo, A., Zuyev, S., Sagitov, S. (2015)
Nonparametric estimation of infinitely divisible distributions based
on variational analysis on measures.
Preprint.
(4) Lindo, A., Sagitov, S. (2015)
Asymptotic results for the number of Wagners solutions to a generalised birthday problem.
Statistics and Probability Letters. 107, 356361.

iii

Acknowledgements

First of all, I would like to thank my advisers Serik Sagitov and Sergey
Zuyev for guiding and supporting me over the years. I was always amazed and
inspired by the ability of Serik to discard all irrelevant information and get
straight to the point. I would like to thank Serik for the long discussions we
had that helped me to grasp all the details of my work. Seriks criticism was
always constructive and helped me to obtain a higher standard in my research.
I hope that one day I will become as good an advisor as Serik was to me.
I am grateful to Sergey for always being enthusiastic about our research
and for providing thought-provoking ideas.
I am indebted to Aila Srkk for her ability to listen and help in the most
difficult situations.
I would like to offer my special thanks to Gerold Alsmeyer for inviting me
as a guest researcher to the University of Mnster.
I am grateful to our faculty members and especially to Peter Jagers for
his insightful comments on Paper I. I would like to thank Patrik Albin, Peter
Hegarty, Olle Hggstrm, Torgny Lindvall, Holger Rootzn, Mats Rudemo and
Jeff Steif for the wonderful graduate courses. I also express my thanks to Petter Mostad for giving me an opportunity to lecture and directing me through
important elements of teaching.
I would like to thank the guests of our department Kostya Borovkov, Petter
Guttorp, Ildar Ibragimov and Gnter Last for their time and valuable research
suggestions.
I wish to thank the administrative staff, and especially Lotta Fernstrm,
who knows the answer to every question and is always there to help.
I would like to thank my wonderful friends in the department. You have all
assisted me in my studies and finishing this work would have been impossible
without you.
I would like to thank my coaches Anders Holme from Gteborg Sim and
Evgeny Zubkov from CSKA Moscow for keeping my mind in a healthy body.
v

vi

ACKNOWLEDGEMENTS

My heartfelt gratitude to my parents Garislav and Irina. You knew that it


would be a long time for us without seeing each other, but you encouraged and
supported me along the way.
Anastasia, no ocean can separate me from my love for you.
Alexey Lindo
Gteborg, 2016

Contents
Abstract

Preface

iii

Acknowledgements

Part I. INTRODUCTION
1. Random walks, Poisson processes and Markov jumps
2. Markov branching processes
3. Linear birth and death process
4. Theta-branching process
5. Multi-type linear-fractional branching process
6. Estimation of the jump size distribution
7. Generalized birthday problem
8. Contributions of Alexey Lindo to the joint papers
Bibliography

1
3
4
5
6
9
10
12
13
15

Part II. PAPERS

19

Paper I. A special family of Galton-Watson processes with explosions


1. Introduction
2. Probability generating functions for theta-branching processes
3. Monotonicity properties
4. Basic properties of f G
5. Extinction and explosion times
6. The Q-process
7. Embedding into continuous time branching processes
Bibliography

23
23
25
26
28
31
33
35
38

Paper II. General linear-fractional branching processes with discrete time


1. Introduction
2. General linear-fractional distributions
3. Perron-Frobenius theorem

41
41
43
45

vii

viii

CONTENTS

4. Embedded Crump-Mode-Jagers process


5. Positive recurrence over the type space
6. Basic limit theorems for the LF-processes
Bibliography

47
49
51
53

Paper III. Nonparametric estimation of infinitely divisible distributions


based on variational analysis on measures
1. Introduction
2. Optimisation in the space of measures
3. Gradients of ChF loss function
4. Description of the CoF method
5. Algorithmic implementation of the steepest descent method
6. Simulation results for a discrete Lvy measure
7. Discussion
Bibliography

57
57
62
66
68
71
74
77
79

Paper IV. Asymptotic results for the number of Wagners solutions to a


generalised birthday problem
1. Introduction
2. Main results
3. Key recursion
4. Proof of Theorem 2.1
5. Proof of Theorem 2.2
Bibliography

83
83
85
87
89
90
91

Part I

INTRODUCTION

The general scheme of chain dependent random variables appeared for


the first time in the famous 1906 paper of A.A. Markov [63]. In his research,
Markov was motivated by a pure theoretical question. His original intention
was to show that independence is not a necessary condition for the weak law
of large numbers. Roughly speaking, a Markov process is the process with a
property that given its exact current state the future behaviour of the process is
not affected by the past. The first application of chain dependence is also due
to Markov. In [64], he statistically analyzed the text of a famous Russian novel
in verse, Eugene Onegin written by A.S. Pushkin. After studying the sequence
of 20,000 letters Markov estimated the probability p1 = 0.128 of a vowel following a vowel, p2 = 0.663 a vowel following a consonant. We note that the
calculations of these probabilities are quite tedious and they were performed
by hand. Markovs scientific work is characterized by great attention to concrete examples and their careful analysis. For a historical review of the life and
mathematical studies of Markov we refer to [4], [87].
The application of Markov chains to the statistical analysis of texts led
Claude Shannon to the discovery of information entropy [89]. In his work
Shannon used the Markov model to describe the statistical regularities of a text.
Nowadays, entropy is the cornerstone of information theory and for a textbook
treatment of the subject we refer to [13]. Markovs ideas were used in Googles
PageRank algorithm [75] and the effective data compression the LZMA algortihm [92, Section 3.14]. The Markov chain approach was applied to computer
performance evolution [90] and gave rise to the concept of the hidden Markov
model [95], [6]. Markov processes are widely used for modelling telecommunication and computer communication networks [35]. Applications of Markov
chains and general Markov processes are not limited to computer science; they
are used in physics [78], chemistry [97] and biology [27], [31], [42].
The rest of the introduction is structured as follows. In Section 1, we
present the key ideas of Markov processes. We describe Markov branching processes in continuous and discrete time settings in Section 2, and the linear birth
and death process in Section 3. In Section 4, we present the results from our
paper [82]. In Section 5, we review facts about multi-type branching processes
related to our studies in [59]. We describe the Lvy process and present some
results from our paper [58] in Section 6. In Section 7 we give an overview of
various generalizations of the classical birthday problem and then present the
key points of our probabilistic analysis [57] of Wagners algorithm for solving
the generalized birthday problem.

1. Random walks, Poisson processes and Markov jumps


Random walks and Poisson processes constitute two basic building blocks
of the theory of Markov processes. The term random walk was first used in 1905
by a pioneer of statistics K.Pearson in his letter to Nature [76]. The original
construction was formulated in a two-dimensional state space, but it can be
easily generalized to lower and higher dimensions. Let {Xi }
i=1 be a sequence of
independent random variables, that take only two values -1 and 1. A simple
random walk on the integers is a random sequence Sk = X1 + + Xk . The value
Sk depends on the past only through Sk1 , and therefore the random walk is a
Markov chain. A more general discrete-time random walk is obtained by letting
independent increments Xi have an arbitrary probability distribution [93].
The mathematical model of the Poisson process was invented in 1903 by a
Swedish mathematician F. Lundberg in his PhD thesis [60], where he derived
a forward equation for the Poisson process. Around 1910, the Poisson process
was used in the works of Erlang [20], Campbell [9] and Bateman [5], in models of random telephone traffic, short noise in vacuum tubes, and radioactive
decay, respectively. The name Poisson process appeared later. It was commonly
used by Stockholm University in the late 1930s [26] and first appeared in print
in the article of W. Feller [21] and in the PhD dissertation of O.Lundberg [61]
in 1940.
In 1929, de Finetti introduced the notions of independence and stationarity of increments for continuous time processes [15], [16], [17]. We say that
{X(t)}t0 is a process with independent increments if the random variables
X(t2 ) X(t1 ), X(t3 ) X(t2 ), . . . , X(tn ) X(tn1 )

are independent for any t1 < t2 < < tn . The process {X(t)}t0 has stationary
increments if the distribution of X(t + s) X(t) depends only on the length s of
the time interval [38, Section 1.3]. A Poisson process {N (t)}t0 with intensity
is a continuous time process with independent and stationary increments.
The size of an increment in any time interval of length t follows the Poisson
distribution with mean t.
The process {N (t)}t0 can be viewed as a counting process that represents
the number of events occurring up to time t. From the assumptions on increments, it follows that the waiting time between the arrival of two consecutive events has an exponential distribution. Moreover the whole sequence of
waiting times between arrivals is formed by independent and identically distributed exponential random variables.
Combining the two discussed models, the general random walk and Poisson process, we arrive at the compound Poisson process. The mathematical
definition was first given by F. Lundberg [60] in the same PhD thesis in which
the Poisson process was defined. Let {Xi }
i=1 be a sequence of independent and
3

identically distributed random variables. Independently of this sequence take


a Poisson process {N (t)}t0 with intensity > 0. Then, the corresponding compound Poisson process is X(t) := X1 + + XN (t) . Because the number of jumps
performed by a random walk up to time t is governed by a Poisson process,
the interarrival times are again exponential. Therefore, the compound Poisson
process inherits the memoryless property of the Poisson process.
2. Markov branching processes
The more general class of Markov processes can be obtained from the compound Poisson process by keeping the interarrival times independent and exponentially distributed, but allowing the parameter of this distribution to depend on the current state of the process. By doing this, we will lose the independence and stationarity of increments, but we will preserve the time-homogeneity.
The imposed extra dependence does not destroy the Markov property as the future conditioned on the current value is independent of the past.
An important subclass of processes with state dependent intensities is formed
by the Markov branching processes [1, Chapter III]. A mutual connection between these processes and compound Poisson processes is stated in [1, Section
III.11]. The Markov branching process {Z(t)}t0 is defined on non-negative integers. Given that the process is in a state i 1 the next jump will occur after
an exponential time with parameter i for some > 0, thus the intensities of
jumps depend linearly on the state of the process. The jump will be of size
k 1 with probability pk+1 , where {pk }
k=0 is a probability distribution with
P
i
generating function h(s) = i=0 pi s . From the construction of {Z(t)}t0 it follows that the Markov branching process is determined completely by parameter and generating function h.


The transition probabilities Pij (s) = P Z(t + s) = j|Z(t) = i of moving from
state i 0 to state j 0 are characterized by a branching property, which in
terms of generating functions takes the form

X
i
X
j
j
Pij (t)s =
P1j (t)s .
j=0

j=0

The term branching processes was introduced by Kolmogorov and Dmitriev in [46],
where the branching processes were treated as the Markov processes with a
countable number of states.
The branching processes are of special importance in biology, where they
are widely used to model the population size fluctuations [27], [31], [42]. They
also appear naturally in epidemic models, see e.g. a recent paper [52]. A
Markov branching process describes the size Z(t) at time t of a population of
independently evolving particles. Each particle lives for an exponential time
4

with parameter and at the moment of death is replaced by a random number


of offspring according to a probability distribution with generating function h.
The exponential life length distribution means that particles are ageless.
For some populations of organisms this assumption is rather realistic for
instance, if organisms are affected by predators or harsh environment so that
they do not live long enough to experience any age effects [27, Chapter 3].
In practice a continuous time Markov branching process is observed only
at discrete times. For > 0, consider a -skeleton process {Z(k)}
k=0 , k 0.

The discrete time Markov chain {Zk }k=1 is a branching process and is called
an embedded Galton-Watson process. Let Xin , i 1, j 0 be a doubly infinite sequence of independent and identically distributed random variables, defined on non-negative integers. The ordinary Galton-Watson process is Zn+1 =
X1n + + XZn n , n 0, with Z0 = 1. The Galton-Watson process has a more complicated structure then the continuous time Markov branching process, because
we cannot obtain all of the Galton-Watson processes as a -skeleton process [1,
III.12]. Important insights into the embeddability problem of Galton-Watson
processes and one easily verifiable test of embeddability is given in [36], [37].
A criterion of embeddability is proposed in [25].

3. Linear birth and death process


An important special case of the Markov branching processes is the linear
birth and death process [38, Example 1, Section 4.4]. Similarly to a simple
random walk, here the jump sizes are either 1 (birth) or 1 (death). Transitions
are characterized by birth and death intensities b, d > 0. Transition i i + 1
takes place at rate bi and i i 1 at rate di. Because the linear birth and death
process is a Markov branching process [1, Section III.5], we will use the same
notation {Z(t)}t0 . The same interpretation of branching processes in terms of
population sizes can be used and in this case the particle expected life length is
= b + d and infinitesimal offspring generating functions is h(s) = (b + ds 2 )/(b +
d). Therefore, we have the following probabilistic description of {Z(t)}t0 . Let
Z(0) = i be an initial state of the process; then after waiting for an exponential
time with parameter (b + d)i the coin is tossed with probability b/(b + d) of
heads and d/(b + d) of tails. If the coin lands heads, then the state of the process
is incremented by one unit and is decremented by one unit otherwise. Now the
process is either in state i + 1 or i 1 and we repeat the procedure.
For the linear birth and death process, its discrete skeleton Zn := Z(n),
> 0 is of special interest. The process {Zn }n1 is called the linear-fractional
process [1, Section III.5]. The name is suggested by the linear-fractional form
5

of the offspring generating function


1
a
=
+ c,
1 f (s) 1 s

where a := e(bd) and c := (1 e(bd) )/(b d). For the linear-fractional process the n-th iteration f n of the offspring generating function can be calculated
explicitly
an
1
=
+ c(1 + a + an1 ),
1 f n (s) 1 s

and in this case the n-step transition probability can be obtained as a closedform expression in terms of parameters a and c. Knowing iterations explicitly,
we can find most of the limit distributions exactly. For example, we can find
Yaglom limits and transition functions of the Q-process [82], describe the Martin boundary [74] and obtain results for the coalescent process [51]. Starting
from the seminal paper of Kozlov [48], linear-fractional generating functions
play a very special role in the theory of branching processes in a random environment. Knowing the closed form expression for the iterations we can simply
calculate the answer of some technical problem. For example, recently it was
shown that under the quenched approach the critical branching process in a
random environment dies out slowly [98]; at the same time under the annealed
approach the sudden extinction of the population may occur [99].
4. Theta-branching process
Iterations of linear-fractional generating functions provide important insights into the limiting behaviour of branching processes. Therefore, it is important to understand the mechanism behind the iterations of generating functions from this family. In the spirit of [86], we consider the functional equation
approach to explicit iterations of probability generating functions. Let
s
V (s) :=
,
1s
where clearly V is a generating function and it is easily invertible. The class of
linear-fraction generating functions can be viewed as a collection of functions
f satisfying the functional equation
(4.1)

V (f (s)) = aV (s) + V (p0 ),

where a = 1/f (1) and p0 = f (0). In this case, the solutions form a two-parameter
class of probability generating functions uniquely determined by the pair a
(0, ) and p0 [0, 1). It can be easily checked that
V (p0 ) = (1 a)V (q),
6

where q is the smallest non-negative root of equation f (x) = x. Equation (4.1)


is of a form known as the Abel equation [49].
The key idea is to find such a generating function
V (s) =

X
k=1

vk s k ,

vk 0,

V (s) < ,

s [0, 1),

that for a set of pairs (a, p0 ), relation (4.1) defines probability generating function f whose iterations f n have the form
an 1
.
a1
Given that the inverse V1 has the closed-form expression, the iterations can be
computed explicitly as


1 an
n
f n (s) = V1 a V (s) + V (p0 )
.
1a
In [82] we produced a new family of probability generating functions with
explicit iterations. The corresponding functions V are presented in the table
below.
V (f n (s)) = an V (s) + V (f n (0)),

V (f n (0)) = V (p0 )

V (s)
Parameters

(A s) A
a (0, 1), q [0, 1], (0, 1], A 1
||
||
A (A s)
a (0, 1), q [0, 1], [1, 0), A 1
log A log(A s) a (0, 1), q [0, 1], = 0, A 1
The presented family covers almost all known probability generating functions with explicit iterations and we call it the theta-family. We use the name
theta-process for the Galton-Watson process with the offspring generating function belonging to the theta-family.
To our knowledge, the only other example is due to Harris [28, Section
I.7.2], which falls into the framework of functional equations presented above.
For the Harris family of generating functions with explicit iterations,
the function V is

f n (s) = (an s k an + 1)1/k ,

a > 1,

V (s) = s k /(1 s k ).

The theta-family contains non-regular generating functions f (1) < 1 related to improper probability distributions. We can interpret the quantity
1 f (1) as the probability of producing the mutant individual. The thetabranching process has two absorbing states
T0 := inf{n : Zn = 0},

T1 := inf{n : Zn = },
7

where T0 is called the extinction time and T1 is the mutation time.


Using explicit iterations of generating functions we can for example study
the limit behaviour of T1 . In particular, we obtained the following results
in [82].
Theorem 1. Consider a theta-branching process with (1, 0] and A 1.
Let 0 and A 1 in such a way that
|| log

1
r,
A1

r [0, ].

Then for any fixed a (0, 1), q [0, 1), and y (, ),


y

lim P(T1 loga y|T1 < ) = e wa ,

where
=

||,
r (0, ],
1 1
(log A1 ) , r = 0,

w=

The limit is a Gumbel distribution with mean


constant.

1,
r {0} {},
r
1 e , r (0, ).

log w
log a , where

is the Euler-Mascheroni

In relation to [83], we can say that we are performing asymptotic analysis


of T1 by letting the mutation probability tend to zero.
All of the generating functions f in the theta-family are embeddable [82,
Section 7] into a continuous semigroup of probability generating functions
Ft+u (s) = Ft (Fu (s)),

t [0, ), u [0, ),

so that f n (s) = Fn (s), n = 1, 2, . . .. In terms of branching processes, we can say


that the discrete time theta-branching processes are emeddable into continuous
time Markov branching processes [1, Section III.6]. The non-regularity of some
of the generating functions in the theta-family here relates to the explosivness
of the corresponding continuous time process [1, Section III.2].
Further research topics. The class of theta-branching processes is rather
reach it contains processes with infinite variance, infinite mean and even nonregular processes. However, it will be interesting to find a generating function
with explicit iterations for which the corresponding branching process poses
some non-standard properties [70]. We believe that the presented functional
equation approach may help to produce new results in this direction. We are
also working with G. Alsmeyer on some financial applications of the thetabranching processes.
8

5. Multi-type linear-fractional branching process


In applications of branching processes, it is natural to attach a specific
attribute or attributes to each particle in the population. This attribute is
called type. Type can characterize for example energy for physical applications [88, Chapter XI] or genotype for biological applications [27], [28, Section
II.11].
First mathematical results concerning the multi-type branching process
were obtained by A.N. Kolmogorov and his PhD students in [46] mostly for continuous time processes and in [47] mostly for discrete time processes. As in the
works on single-type Galton-Watson processes, one of the main analytic tools
of study is the apparatus of generating functions. For an extensive treatment
of the branching process with a finite number of types we refer to [1, Chapter
V], [28, Chapter II], [88, Chapters IV-VI].
In the multi-type process particles reproduce independently, but their offspring production may depend on the type. We focus our attention on the
linear-fractional multi-type process. These processes as it follows from their
name have a linear-fractional reproduction law. The linear-fractional processes
are the only known examples of multi-type processes with explicit iterations of
the offspring generating functions. In a single-type setting, the linear-fractional
distribution is determined by two parameters. The first parameter gives the
surviving probability. The second one determines the expected number of produced offspring, which is geometric, conditioned on non-extinction. Harris
in [28, Section II.12.3] considered a two type linear-fractional process with explicit iterations. His key idea was that the mothers type determines only the
type of the first offspring particle and the type of the other particles is determined independently of the mothers type. Interestingly, not all of the two-type
linear-fractional processes are embeddable into continuous Markov branching
processes [34]. An example of Harris was generalized in [34], [77] for the process with a finite number of types. Further in [81] the linear-fraction process
with a countable number of types was introduced and studied. See [42, Chapter 7] for biological applications of branching processes with countable number
of types.
The multi-type processes mentioned so far in this section are particular examples of Markov chains with a countable number of states. If we consider that
a type belongs to a general measurable space (E, E), then we obtain a general
branching process. We refer to monographs [28, Chapter III], [56], [71], [88,
Chapter XII] and articles [32], [33]. Consequently, general branching processes
are particular examples of Markov chains on general space; the comprehensive
information can be found in [8], [65], [73], [79].

The theory of general branching processes is closely connected to the theory


of point processes [40], [56] and theory of random walks. Identifying type with
a position in a space E we can see the general branching process as the branching random walk. Consequently, basic questions of recurrence and transience
dichotomy appear [10]. To obtain the answer, we can use the theory of Markov
chains in general spaces. The fundamental renewal techniques were developed
in [2], [72], but their theory involves some technical conditions. First, we need
from -algebra E to be countably generated. It was pointed out in [91] that
without this assumption construction of a general theory is almost impossible.
Second, the classical condition of irreducibly must be properly adapted. We
note that there exists an alternative approach to the proofs [69]. For the linearfractional process, the recurrence and transience conditions are very explicit.
The other characteristic of the multi-type process is its asymptotic mean
behaviour. In the case of a general process it is determined by an abstract version of the classical Perron-Frobenius theorem. For the linear-fractional process
the asymptotic mean distribution over the type space can be calculated without
employing facts from the general theory.
The nice probabilistic interpretation of these results follows from the existence of the embedded CMJ (Crump-Mode-Jagers) [31] process.
Further research topic. Many of the results for the general linear-fractional
branching process can be obtained by rather simple calculations. Thus, it seems
possible to describe explicitly the quasi-stationary distributions and Martin
boundary of the process. It is also interesting to investigate the relations among
the linear-fractional process and models studied in [50].
6. Estimation of the jump size distribution
The problem of constructing a continuous time analogue of a discrete time
random walk appeared in the 1920s and was successfully solved in the works
of de Finetti, Khintchine, Kolmogorov, and Lvy.
In [15], [16], and [17], de Finetti studied continuous time processes that
inherit two main properties of the discrete time random walk, namely the independence and stationarity of increments. The discovered class of processes
is now called the Lvy processes. For a definition and a detailed exposition of
the theory of Lvy processes we refer to [84]. Loosely speaking the Lvy process {Wt }t0 is the continuous time stochastic process with independent and
stationary increments that starts at 0. From the definition of the Lvy process
for every n > 0 we have
Wt = Wt/n + (W2t/n Wt/n ) + + (Wt W(n1)t/n ),

where the summands are independent and identically distributed. This observation led de Finetti to another central notion of the theory, the notion of
10

infinite divisibility. The random variable W is infinitely divisible [94] if for


every n > 0 it can be represented as the sum
(6.1)

W := Wn,1 + Wn,2 + + Wn,n

of n independent and identically distributed random variables. Infinite divisibility is actually a distributional property. For example, Normal, Poisson,
negative binomial, Gamma, Gumbel and compound Poisson distributions are
all infinitely divisible. De Finettis wonderful insights were that increments of
the Lvy processes are necessarily infinitely divisible and characteristic functions can be used to prove it. Kolmogorov confirmed de Finettis conjecture for
processes with finite second moments in [44], [45]. In full generality, the result
was obtained in the works of Lvy [54], [55] and Khintchine [43]. In particular,
they showed that the characteristic function of an infinitely divisible random
variable is of the from Ee iW = e() with
Z
22
+ (e ix 1 ix 1I{|x|<} )(dx),
() = ia
2
where the real number a is a drift term, 2 0 is the diffusion coefficient and
measure is called the Lvy measure. The above representation is called the
canonical Lvy-Khintchine representation. Moreover Lvy fully described the
structure of continuous time processes with independent and stationary increments. He showed that not only increments of the Lvy process are infinitely
divisible, but that the converse is also true and for a given infinitely divisible
distribution we can construct the Lvy process {Wt }t0 , with W1 having that
distribution [84, Theorem 7.10]. This result can be viewed as an embeddability criterion for a random walk, namely the random walk can be embedded
into a Lvy process if and only if its increments are infinitely divisible. For a
brief historical exposition of the subject we refer to [62] and Notes to Chapter
2 in [84].
In practice, we cannot observe the trajectory of the Lvy process {W (t)}t0
continuously. In the inference of Lvy processes, there exist two main approaches to the data collection, called low frequency and high frequency sampling. Imagine that we observe the embedded random walk {W (k)}
k=0 up to
time T at sampling rate > 0. In the low frequency approach, the data is collected by observing the process at the constant frequency . In other words,
observations are equidistant. The asymptotic results are obtained by letting
the size of the observation window T go to infinity. Taking the high frequency
approach, we keep the size observation window T fixed, but increase the rate
at which measurements are collected. The importance of choosing the proper
sampling rate is described in [19].
The embedded random walk {W (k)}
k=0 has necessarily infinitely divisible increments and consequently the statistical inference problem reduces to
11

the recovery of the Lvy triple (a, 2 , ). The first two elements of the triple,
(a, 2 ) describe the diffusive part of the Lvy process {W (t)}t0 and the positive
measure governs the pure-jump part of the process. The complexity of the
problem comes from its non-parametric nature and the intrinsic properties of
the space of measures. Note that the set of finite positive measures on the real
line is closed in the space of signed measures only under conic combinations;
moreover, it contains no inner points. Recently, a gradient descent method of
constrained optimization in the space of measures was presented in [68].
In [58], we used the low-frequency approach, assuming that the Lvy process {Wt }t0 is observed at equidistant time points and consequently we have
a sample from an infinitely divisible distribution. We proposed two methods, which we call ChF (Characteristic Function Fitting) and CoV (Convolution
Method).
The ChF method targets the general problem of the Lvy triple estimation.
It is based on fitting the infinitely divisible characteristic function to the empirical characteristic function [23] constructed from the observations.
The second method is intended specifically for the compound Poisson processes. This method is based on the connection between compound Poisson
processes and associated Poisson point processes on the plane [39, Section 16.9]
and Taylor-like expansions developed for the expectations functionals of the
Poisson point processes [53], [67]. We note that every pure discontinuous Lvy
process can be approximated by a compound Poisson process with any degree
of accuracy [11].
It is still an open question whether space and time are discrete or continuous [30]. Nevertheless, it is common to consider that the physical process
makes discrete changes on a microscopic scale and use the diffusion approximations for macroscopic behaviour, see for example [96]. In finance, we are
certain that characteristics like prices are discontinuous and therefore should
be modeled by jump processes [11]. Therefore, pure-jump processes form a
very important subclass of Lvy processes.
Further research topics. The low frequency sampling approach also plays
an important role in the statistical inference of continuous time Markov branching processes [41]. We believe that by using the obtained results and a connection between branching processes and compound Poisson processes, new nonparametric statistical methods can be obtained. We are planning to investigate
this possibility in the future. It is also interesting to investigate the statistical
properties of the ChF and CoV estimators.
7. Generalized birthday problem
Richard von Mises introduced the birthday problem [66] in 1939. He asked
how many people should gather before the probability of having two people
12

born on the same day of the year exceeds 50 percent. The surprising answer is
only 23. The total number of pairs of people that share the same birthday can
be approximated by the Poisson distribution, see for example [3, Section 5.3].
A related, to problem setting, cryptograpic attack called birthday attack [85]
is justified by this probabilistic observation. Mathematically, a pair x1 , x2 is
called a collision for a given function f if f (x1 ) = f (x2 ). The birthday attack
uses some random process to find a collision in the cryptographic hash functions.
There are various generalizations of the birthday problem, with an arbitrary number of days in a year, with multiple matches, with multiple types of
individuals, see e.g. [24] and references therein. One of the generalizations,
particularly important for cryptography, was proposed by Wagner in [100]. In
the same article, Wagner presented an effective method of finding some of the
solutions of this generalized birthday problem. An important quantity of interest for any algorithm is its average time complexity [12]. In [57], we reformulated Wagners generalization of the birthday in terms of random matrices over
a finite cyclic group of integers. Then, the number of solutions to the problem
is the value of a certain functional of the random matrix. It turns out that the
Chen-Stein method can be used to bound the total variation distance between
the distribution of the considered functionals and Poisson distribution. From
this and other results presented in our paper [57], the answer to the question of
average time complexity can be derived. We want to emphasize that the effectiveness of the Chen-Stein method for this problem follows from the fact that
Wagners algorithm is built upon the full binary tree. It also connects our work
to the research on other tree-based algorithms like binary search [18], [29] and
quick sort [80].
Further research topic. The Wagners algorithm can be seen as a coalescent
process on a binary tree. It is interesting to investigate the connection of our
results presented in [57] with the recent works [7], [14].

8. Contributions of Alexey Lindo to the joint papers


The problem statement of Paper I and Paper II comes from Professor S. Sagitov. The current author together with Professor S. Sagitov participated in the
theoretical development of the ideas presented in the papers. All the necessary
calculations were performed by the current author. The current author participated in the writing of Paper I and Paper II under the close supervision of
Professor S. Sagitov.
The problem statement of Paper III comes from Professor S. Zuyev. The
current author under the supervision of Professor S. Zuyev participated in the
theoretical development and substantially contributed to the writing under the
13

supervision of Professor S. Sagitov. The current author played a major role in


obtaining numerical and simulation results.
The initial formulation of the problem from Paper IV belongs to Professor V.V. Vatutin and Professor A.M. Zubkov. In particular the current author
obtained the total variation bound and is grateful to Professor S. Sagitov for
streamlining the proofs. The current author together with Professor S. Sagitov participated in the writing part. The current author also performed the
numerical simulations that are partially presented in the Paper IV.

14

Bibliography
[1] Athreya, K.B., Ney, P.E. (1972) Branching processes. Springer.
[2] Athreya, K.B., Ney, P.E. (1978) A new approach to the limit theory of recurrent Markov
chains. Trans. Amer. Math. Soc. 245, 493501.
[3] Barbour, A.D., Hols, L., Janson, S. (1992) Poisson approximation. Oxford Studies in Probability. Clarendon Press.
[4] Basharin, G.P., Langville, A.N., Naumov, V.A. (2004) The life and work of A.A. Markov.
Linear Algebra Appl. 386, 326.
[5] Bateman, H. (1910) Note on the probability distribution of -particles. Phil. Mag. 20(6), 698
704.
[6] Baum, L.E., Petrie, T. (1966) Statistical inference for probabilistic functions of finite state
Markov chains. Ann. Math. Statist. 37(6), 15541563.
[7] Benjamini, I., Lima, Y. (2014) Annihilation and coalescence on binary trees. Stoch. Dyn. 14(3).
[8] Borovkov, A.A. (1998) Ergodicity and stability of stochastic processes. Wiley.
[9] Campbell, N.R. (1909) The study of discontinuous phenomena. Proc. Cam. Phil. Soc. 15, 117
136.
[10] Chung, K.L. (1960) Markov chains with stationary transition probabilities. Springer-Verlag.
[11] Cont, R., Tankov, P. (2003) Financial modeling with jump processes. Chapman and Hall/CRC.
[12] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009) Introduction to Algorithms . 3rd
ed., MIT Press and McGraw-Hill.
[13] Cover, T.M., Thomas, J.A. (2006) Elements of information theory. 2nd ed., Wiley-Interscience.
[14] Debs, P., Haberkorn, T. (2015) Diseases transmission in a z-ary tree. arXiv:1507.06483.
[15] de Finetti, B. (1929) Sulle funzioni a incremento aleatorio. Rendiconti della Reale Academia
Nazionale dei Lincei. (Ser. VI) 10, 163168.
[16] de Finetti, B. (1929) Sulla possibilit di valori eccezionali per una legge di incrementi
aleatori. Rendiconti della Reale Academia Nazionale dei Lincei. (Ser. VI) 10, 325329.
[17] de Finetti, B. (1929) Integrazione delle funzioni a incremento aleatorio. Rendiconti della Reale
Academia Nazionale dei Lincei. (Ser. VI) 10, 548553.
[18] Devroye, L. (2005) Applications of Steins method in the analysis of random binary search
trees. In Steins method and Applications. Institute for Mathematical Sciences Lecture Notes
Series. World Scientific Press. 247297.
[19] Duval, C. (2014) When is it no longer possible to estimate a compound Poisson process?
Electron. J. Statist. 8(1), 274301.
[20] Erlang, A.K. (1909) The theory of probabilities and telephone conversations. Nyt. Tidsskr.
Mat. 20, 3341.
[21] Feller, W. (1940) On the integr-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48, 488515.
[22] Feller, W. (1968) An introduction to probability theory and its applications. vol. 1, John Wiley
& Sons.
[23] Feuerverger, A., Mureika, R.A. (1977) The empirical characteristic function and its applications. Ann. Statist. 5, 8897.
[24] Gorroochurn, P. (2012) Classic problems of probability. Wiley-Blackwell.
[25] Goryainov, V.V. (1993) Fractional iteration of probability generating functions and imbedding discrete branching processes in continuous processes. Russian Acad. Sci. Sb. Math. 79(1),
4761.
[26] Guttorp, P., Thorarinsdottir, T.L. (2012) What happened to discrete chaos, the quenouille
process, and the sharp markov property? Some history of stochastic point processes. International Statistical Review 80(2), 253268.

15

[27] Haccou, P., Jagers, P., Vatutin, V.V. (2005) Branching processes: variation, growth and extinction of populations. Cambridge University Press.
[28] Harris, T.E. (1963) The theory of branching processes. Springer, Berlin.
[29] Holmgren, C., Janson, S. (2015) Limit laws for functions of fringe trees for binary search
trees and random recursive trees. EJP 20, 151.
[30] Hossenfelder, S. (2013) Minimal length scale scenarios for quantum gravity. Living Rev.
Relativity 16.
[31] Jagers, P. (1975) Branching processes with biological applications. John Wiley and Sons.
[32] Jagers, P. (1989) General branching processes as Markov fields. Stoch. Proc. Appl. 32(2), 183
212.
[33] Jagers, P. Nerman, O. (1996) The asymptotic composition of supercritical, multi-type
branching populations. Sminaire de probabilits de Strasbourg. 30, 4054.
[34] Joffe, A., Letac, G. (2006) Multitype linear fractional branching process. J. Appl. Probab.
43(4), 10911106.
[35] Kaj, I. (2002) Stochastic modelling in broadband communication systems. Society for Industrial
and Applied Mathematics.
[36] Karlin, S., McGregor, J. (1968a) Embeddability of discrete time simple branching processes
into continuous time branching processes. TAMS 132, 115136.
[37] Karlin, S., McGregor, J. (1968b) Embedding iterates of analytic functions with two fixed
points into continuous groups. TAMS 132, 137145.
[38] Karlin, S., Taylor, H.M. (1975) A first course in stochastic processes. Academic Press.
[39] Karlin, S., Taylor, H.M. (1981) A second course in stochastic processes. Academic Press.
[40] Kerstan, J., Matthes, K., Mecke, J. (1978) Infinitely divisible point processes. Wiley.
[41] Kiedling, N. (1975) Maximum likelihood estimation in the birth-and-death process. Ann.
Stat. 3(2), 363372.
[42] Kimmel, M., Axelrod, D.E. (2015) Branching processes in biology. 2ed., Springer.
[43] Khintchine, A.Ya. (1937) A new derivation of a formula by P. Lvy. Bulletin of the Moscow
State University. 1, 15 (in Russian).
[44] Kolmogorov, A.N. (1932) Sulla forma generale di un processo stocastico omogeneo (Un
problema di Bruno de Finetti). Rendiconti della Reale Academia Nazionale dei Lincei. (Ser. VI)
15, 805808.
[45] Kolmogorov, A.N. (1932) Ancora sulla forma generale di un processo stocastico omogeneo.
Rendiconti della Reale Academia Nazionale dei Lincei. (Ser. VI) 15, 866869.
[46] Kolmogorov, A.N., Dmitriev, N.A. (1947) Branching random processes. Dokl. Akad. Nauk.
SSSR 56, 710.
[47] Kolmogorov, A.N., Sevastianov, B.A. (1947) Calculation of final probabilities of branching
random processes. Dokl. Akad. Nauk. SSSR 56, 783786.
[48] Kozlov, V.M. (1977) On the asymptotic behavior of the probability of non-extinction for
critical branching processes in a random environment. Theory Probab. Appl. 21(4), 791804.
[49] Kuczma, M., Choczewski, B., Ger, R. (1990) Iterative functional equations. Encyclopedia of
mathematics and its applications. Cambridge University Press.
[50] Lambert, A. (2010) The contour of splitting trees is a Lvy process. Ann. Probab. 38(1), 348
395.
[51] Lambert, A., Popovic, L. (2013) The coalescent point process of branching trees. Ann. Appl.
Probab. 23(1), 99144.
[52] Lambert, A., Trapman, P. (2013) Splitting trees stopped when the first clock rings and Vervaats transformation. J. Appl. Probab. 50(1), 208227.
[53] Last, G. (2014) Perturbation analysis of Poisson processes. Bernoulli 20(2), 486513.

16

[54] Lvy, P. (1934) Sur les intgrales dont les lments sont des variables alatoires indpendentes. Annali della Regia Scuola Normale di Pisa. (Ser. II) 3, 337366.
[55] Lvy, P. (1935) Observation sur un prcedent mmoire de lauteur. Annali della Regia Scuola
Normale di Pisa. (Ser. II) 4, 217218.
[56] Liemant, A., Matthes, K., Wakolbinger, A. (1988) Equilibrium distributions of branching processes. Springer.
[57] Lindo, A., Sagitov, S. Asymptotic results for the number of Wagners solutions to a generalised birthday problem. Statistics and Probability Letters. 107, 356361.
[58] Lindo, A., Zuyev, S., Sagitov, S. Nonparametric estimation of infinitely divisible distributions based on variation analysis on measures. arXiv:1510.04968.
[59] Lindo, A., Sagitov, S. General linear-fractional branching processes with discrete time.
arXiv:1510.06859.
[60] Lundberg, F. I. Approximerad framstllning af sannolikhetsfunktionen. II. terfrskring af
kollektivrisken (PhD Dissertation, Uppsala University, Uppsala, Sweden). Almqvist & Wiksell.
[61] Lundberg, O. On random processes and their application to sickness and accident statistics (PhD
Dissertation, Stockholm University, Stockholm, Sweden). Almqvist & Wiskell.
[62] Mainardi, F., Rogosin, S. (2006) The origin of infinitely-divisible distributions: from de
Finettis problem to Lvy-Khinchine formula. Math. Methods Econ. Finance 1, 3755.
[63] Markov, A.A. (1906) Extension of the law of large numbers to dependent quantities (in Russian). Izv. Fiz.-Matem. Obsch. Kazan. Univ. (2nd Ser.) 15, 135156.
[64] Markov, A.A. (1907) An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains (in Russian). Izv. Akad. Nauk (VI Ser.) 7, 153163.
[65] Meyn, S., Tweedie, R.L. (2009) Markov chains and stochastic stability. 2ed., Cambridge University Press.
[66] Mises, R. von. (1939) ber Aufteilungs- und Besetzungs-Wahrscheinlichkeiten. Revue de la
Facult des Sciences de lUniversit dIstanbul: Sciences naturelles 4, 145-163.
[67] Molchanov, I., Zuyev, S. Variational analysis of functionals of Poisson processes. Mathematics of Operations Research 25(3), 485508.
[68] Molchanov, I., Zuyev, S. (2004) Optimisation in space of measures and optimal design.
ESAIM: Probability and Statistics 8, 1224.
[69] Nagaev, S.V. (2015) The spectral method and the central limit theorem for general Markov
chains. Doklady Mathematics. 91(1), 5659.
[70] Nagaev, S.V., Wachtel, V. (2007) The critical Galton-Watson process without further power
moments. J. Appl. Probab. 44(3), 753769.
[71] Nerman, O. (1984) The growth and composition of supercritical branching populations on general
type spaces. Dep Mathematics, Chalmers U. Tech. and Gothenburg U. 1984-4.
[72] Nummelin, E. (1978) A splitting technique for Harris recurrent Markov chains. Z. Wahrsch.
Verw. Gebiete 43(4), 309318.
[73] Nummelin, E. (1984) General irreducible Markov chains and non-negative operators. Cambridge
University Press.
[74] Overbeck, L. (1994) Martin boundaries of some branching processes. Ann. Inst. Henri
Poincar Probab. Stat. 30(2), 181195.
[75] Page, L., Brin, S., Motwani, R., Winograd, T. (1998) The PageRank citation ranking: bringing
order to the web. Technical report. Stanford InfoLab.
[76] Pearson, K. (1905) The problem of the random walk. Nature 72, 294.
[77] Pollak, E. (1974) Survival and extinction times for some multitype branching processes.
Adv. Appl. Prob. 6, 446462.
[78] Reichl, L.E. (2009) A modern course in statistical physics. 3rd ed., Wiley-VCH.

17

[79] Revuz, D. (1984) Markov chains. 2ed., North-Holland.


[80] Rsler, U., Rschendorf, L. (2001) The contraction method for recursive algorithms. Algorithmica 29(1), 333.
[81] Sagitov, S. (2013) Linear-fractional branching processes with countably many types. Stoch.
Proc. Appl. 123, 29402956.
[82] Sagitov, S., Lindo, A. (2016) A special family of Galton-Watson processes with explosions.
In Branching Processes and Their Applications. Lect. Notes Stat. Proc. (I.M. del Puerto et al
eds.) Springer (to appear).
[83] Sagitov, S., Serra, M.C. (2009) Multitype Bienayme-Galton-Watson processes escaping extinction. Adv. in Appl. Probab. 41, 225246.
[84] Sato, K. (2013) Lvy processes and infinitely divisible distributions. 2nd ed., Cambridge University Press.
[85] Schneier, B. (2015) Applied cryptography: protocols, algorithms and source code in C. Johm
Wiley & Sons.
[86] Seneta, E. (1969) Functional equations and the Galton-Watson process. Adv. in Appl. Probab.
1(1), 142.
[87] Seneta, E. (1996) Markov and the birth of chain dependence theory. International Statistical
Review 64, 255263.
[88] Sevastianov, B.A. (1971) Branching processes. Nauka (in Russian).
[89] Shannon, C. (1948) A mathematical theory of communication. Bell System Technical Journal
27(3), 379423.
[90] Sherr, A.L. (1965) An analysis of time-shared computer systems. Ph.D. thesis, Massachusetts
Institute of Technology.
[91] Shurenkov, V.M. (1989) Ergodic Markov processes. Nauka. (in Russian).
[92] Solomon, D. (2004) Data compression: the complete reference. 3rd ed., Springer.
[93] Spitzer, F. (1976) Principles of random walk. Springer-Verlag.
[94] Steutel, F.W., van Harn, K. (2004) Infinite divisibility of probability distributions on the real
line. Marcel Dekker.
[95] Stratonovich, R.L. (1960) Conditional Markov processes. Theor. Probab. Appl. 5(2), 156178.
[96] Uchaikin, V.V., Zolotarev, V.M. (1999) Chanse and stability: stable distributions and their applications. Walter de Gruyter.
[97] Van Kampen N.G. (2007) Stochastic processes in physics and chemistry. 3rd ed., North Holland.
[98] Vatutin, V.V. Kyprianou, A.E. (2008) Branching processes in random environment die
slowly. Fifth Colloquium on Mathematics and Computer Science. Discrete Math. Theor. Comput.
Sci. Proc., AI, Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 375395.
[99] Vatutin, V.V., Wachtel, V. (2010) Sudden extinction of a critical branching process in a
random environment. Theory Probab. Appl., 54(3), 466484.
[100] Wagner, D. (2002) A generalized birthday problem. Crypto 2002. Springer. 288303.

18

Part II

PAPERS

PAPER I

A special family of Galton-Watson processes with explosions


Serik Sagitov and Alexey Lindo

Branching Processes and Their Applications.


Lect. Notes Stat. Proc.,(I.M. del Puerto et al eds.) Springer, 2016.
(to appear)

PAPER I

A special family of Galton-Watson processes with


explosions
Serik Sagitov and Alexey Lindo

Abstract. The linear-fractional Galton-Watson processes is a well known case


when many characteristics of a branching process can be computed explicitly.
In this paper we extend the two-parameter linear-fractional family to a much
richer four-parameter family of reproduction laws. The corresponding GaltonWatson processes also allow for explicit calculations, now with possibility for
infinite mean, or even infinite number of offspring. We study the properties
of this special family of branching processes, and show, in particular, that in
some explosive cases the time to explosion can be approximated by the Gumbel distribution.

1. Introduction
Consider a Galton-Watson process (Zn )n0 with Z0 = 1 and the offspring
number distribution
pk = P(Z1 = k),

k 0.

The properties of this branching process are studied in terms of the probability
generating function
f (s) = p0 + p1 s + p2 s2 + . . . ,
where it is usual to assume that f (1) = 1, however, in this paper we allow for
f (1) < 1 so that a given particle may explode with probability p = 1f (1). The
probability generating function f n (s) = E(s Zn ) of the size of the n-th generation
is given by the n-fold iteration of f (s)
f 0 (s) = s,

f n (s) = f (f n1 (s)),

n 1,

and therefore it is desirable to have a range of probability generating functions


f whose iterations can be computed explicitly.
23

A special family of Galton-Watson processes


The best known case of explicit calculations is the family of linear-fractional
Galton-Watson processes with
ps
, s [0, (1 p)1 ),
f (s) = p0 + (1 p0 )
1 (1 p)s
representing the family of modified geometric distributions
pk = (1 p0 )(1 p)k1 p,

k 1,

fully characterized by just two parameters: p0 [0, 1) and p (0, 1]. In Section 2
for each [1, 1] we introduce a family G of functions with explicit iterations
containing the linear-fractional family as a particular case. In Section 3 we
demonstrate that all f G are probability generating functions with f (1)
1. A Galton-Watson processes with the reproduction law whose probability
generating function belongs to G will be called a theta-branching process.
The basic properties of the theta-branching processes are summarized in
Section 4, where it is shown that this family is wide enough to include the cases
of infinite variance, infinite mean, and even non-regular branching processes
with explosive particles.
Recall that the basic classification of the Galton-Watson processes refers to
the mean offspring number m = EZ1 . Let q [0, 1] be the smallest non-negative
root of the equation f (x) = x and denote by
T0 = inf{n : Zn = 0}
the extinction time of the branching process. Then q = P(T0 < ) gives the
probability of ultimate extinction. For m 1 and p1 < 1, the extinction probability is q = 1, while in the supercritical case m > 1, we have q < 1.
If f (1) < 1, then the Galton-Watson process is a Markov chain with two
absorption states {0} and {}. In this case the branching process either goes
extinct at time T0 or explodes at the time
T1 = inf{n : Zn = },
with
P(T1 n) = 1 f n (1),

P(T1 < ) = 1 q,

where the latter equality is due to f n (1) q. In Section 5, using explicit formulas for f n (s) we compute the distribution of the absorption time
T = T0 T1 .

Note that in the regular case, we have P(T1 = ) = 1 and therefore, T T0 .


Observe also that the case f (1) < 1 has other, biologically more relevant interpretations. For example in the multitype setting, T1 can be viewed as the time
of the first mutation event, see [7].
24

A special family of Galton-Watson processes


Also in Section 5 we consider a situation when the explosion of a single
particle has a small probability, so that T1 takes large values in explosion scenarios. We show that in such a case the time to explosion can be asymptotically
characterized with help of a Gumbel distribution. In Section 6 we study the
Q-processes for the theta-branching processes extending the classical definition to the non-regular case. Our explicit calculations demonstrate that in the
non-regular case the behavior of a branching process is more similar to that of
the subcritical rather than supercritical regular case. Using these results on the
Q-processes we derive the conditional limits of the theta-branching processes
conditioned on non-absorption.
A remarkable property of the linear-fractional Galton-Watson processes is
that they can be embedded into the linear birth-death processes. In Section 7
we establish embeddability of theta-branching processes.
2. Probability generating functions for theta-branching processes
Using an alternative parametrization for the linear-fractional probability
generating functions, we obtain
a
1
(2.1)
=
+ c, s [0, 1),
1 f (s) 1 s
where

p
1p
, c=
.
1 p0
1 p0
This observation immediately implies that the n-fold iteration f n of the linearfractional f is also linear-fractional
a=

an
1
=
+ c(1 + a + . . . + an1 ).
1 f n (s) 1 s

The key idea of this paper is to expand the family (2.1) by


(2.2)

(A f (s)) = a(A s) + c,

s [0, A),

with the help of two extra parameters (A, ) which are invariant under iterations.
Definition 2. Let (1, 0) (0, 1]. We say that a probability generating
function f belongs to the family G if
f (s) = A [a(A s) + c]1/ ,

where one of the following three options holds

0 s < A,

(i) a 1,
c > 0,
(0, 1], A = 1,
(ii) a (0, 1), c = (1 a)(1 q) , q [0, 1), A = 1,
(iii) a (0, 1), c = (1 a)(A q) , q [0, 1], A > 1.
25

A special family of Galton-Watson processes


Definition 2 can be extended to the case = 0 by the following continuity
argument: for a (0, 1)
A [a(A s) + (1 a)(A q) ]1/ A (A q)1a (A s)a ,

0.

Definition 3. We say a probability generating function f belongs to


the family G0 if for some a (0, 1),

f (s) = A (A q)1a (A s)a ,

0 s < A,

where either A = 1, q [0, 1), or A > 1, q [0, 1],


the family f G1 if for some q [0, 1] and a (0, 1),
f (s) = as + (1 a)q,

0 s < .

Definition 4. A Galton-Watson process with the reproduction law whose probability generating function f G , [1, 1], will be called a theta-branching
process.
It is straightforward to see, cf. Section 4, that each of the families G is
invariant under iterations: if f G , then f n G for all n 1. The fact, that
the functional families in Definitions 2 and 3 are indeed consist of probability
generating functions with f (1) 1, is verified in Section 3.
Parts of the G families were mentioned earlier in the literature as examples
of probability generating functions with explicit iterations. Clearly, G1 G1 is
the family of linear-fractional probability generating functions. Examples in
[10] leads to the case A = 1 and [0, 1), which was later given among other
examples in Ch. 1.8 of [8]. The case A = 1 and (0, 1) was later studied in [9].
A special pdf with = 1/2,

f (s) = 1 (a 1 s + 1 a)2 , a (0, 1),

can be found in [2] on page 112, as an example of non-regular Galton-Watson


processes.
Notice that there is a version of linear-fractional Galton-Watson processes
with countably many types of particles, see [5]. It is an open problem to expand
the theta-branching processes with (1, 1) to the multitype setting.
3. Monotonicity properties

It is straightforward to see that each f G0 is a probability generating function with


f (s) = (A q)1a a(A s)a1 ,

f (n) (s) = (A q)1a a(1 a) . . . (n 1 a)(A s)an , n 2,


26

A special family of Galton-Watson processes


and
p0 = A (A q)1a Aa ,

p1 = (A q)1a aAa1 ,
na1
, n 2.
pn = pn1
nA
Therefore, (pn )n1 are monotonely decreasing with
n 

Y
1+a
a
1a n
1
pn = aA (A q) A
,
k
k=2

n 2,

so that pn const An n1a as n .

Proposition 5. Let (1, 0) (0, 1) and f G . Then f is a probability


generating function with f (1) 1 such that
p0 = A (aA + c)1/ ,
p1 = a(a + cA )11/ ,

and for n 2,

n1 

X
cA i
Bi,n ,
pn =

1+

a
+
cA

(a + cA ) n!

aAn+1

i=1

where all Bi,n = Bi,n () are non-negative and, for n 2, satisfy the recursion
Bi,n = (n 2 i)Bi,n1 + (1 + i)Bi1,n1 , i = 1, . . . , n 1,

with B0,n = Bn,n = 0 for n 1, and B1,2 = 1 + .


Proof. In terms of
(s) :=
we have

A f (s)
= [a + c(A s) ]1/ ,
As

(s) = c(A s)1 (s)1+ ,

f (s) = a(s)1+ ,
f (s) = (1 + )ac(A s)1 (s)1+2 ,

f (s) = (1 + )(1 )ac(A s)2 (s)1+2 + (1 + )(1 + 2)ac2 (A s)22 (s)1+3 .

and more generally,


f

(n)

(s) =

n1
X
i=1

Bi,n ac i (A s)in+1 (s)1+(i+1) ,

n 2,

where Bi,n are defined in the statement. To finish the proof it remains to apply
the equality pn = f (n) (0)/n!.

27

A special family of Galton-Watson processes


In the linear-fractional case we have pk pk+1 for all k 1. The next extension of this monotonicity property was first established in [9].
Corollary 6. Let (0, 1) and f G with A = 1. Then pk pk+1 for all
k 1.

Proof. Put

X
g(s) = (s 1)f (s) = p0 +
(pk1 pk )s k
k=1

From
g(s) = s 1 + (1 s)2 [a + c(1 s) ]1/ ,

g (s) = 1 + c(1 s)+1 [a + c(1 s) ]11/ 2(1 s)[a + c(1 s) ]1/


= c(1 f (s))1+ + 2f (s) 1,

g (s) = (2 c(1 + )(1 f (s)) )f (s),

we see that g (s) 0, since

G(s) := 2 c(1 + )(1 f (s)) 2 c(1 + )(1 p0 ) = 2

c(1 + )
> 0.
a+c

Furthermore,
G (s) = c(1 + )(1 f (s))1 f (s)

is absolutely monotone (as a product of two absolutely monotone functions),


implying that g (s) is absolutely monotone, so that
k(k 1)(pk1 pk ) 0,

k 2.

4. Basic properties of f G

In this section we distinguish among nine cases inside the collection of families {G }11 and summarize the following basic fomulas: f n (s), f (1), f (1),
f (1). In all cases, except Case 1, we have a = f (q). The following definition,
cf [3], explains an intimate relationship between the Cases 3-5 with A = 1 and
the Cases 7-9 with A > 1.
Definition 7. Let A > 1 and a probability generating function f be such that
f (A) A. We call

f (sA) X

=
pk Ak1 s k
f (s) :=
A
k=0

= q.
Clearly,
the dual generating function for f and denote q = qA1 , so that f(q)

= f (q).
f (q)
28

A special family of Galton-Watson processes


Case 1: (0, 1], a (1, ),

f n (s) = 1 [an (1 s) + (an 1)d]1/ ,

d (0, ).

The corresponding theta-branching process is subcritical with m = a1/ . If


(0, 1), then f (1) = and for = 1 we have f (1) = 2(a 1)a2 d.
Case 2: (0, 1], a = 1,

f n (s) = 1 [(1 s) + nc]1/ ,

c (0, ).

The corresponding theta-branching process is critical with either finite or infinite variance. If (0, 1), then f (1) = and for = 1 we have f (1) = 2c.
This is the only critical case in the whole family of theta-branching process.
Case 3: (0, 1], a (0, 1),
h
i1/
f n (s) = 1 an (1 s) + (1 an )(1 q)
,

q [0, 1).

The corresponding theta-branching process is supercritical with m = a1/ . If


(0, 1), then f (1) = , and for = 1 we have f (1) = 2a2 (1 a)(1 q)1 .
Case 4: = 0, a (0, 1),

f n (s) = 1 (1 q)1a (1 s)a ,

q [0, 1).

The theta-branching process is regular supercritical with infinite mean.


Case 5: (1, 0), a (0, 1),
h
i1/||
f n (s) = 1 an (1 s)|| + (1 an )(1 q)||
,

q [0, 1).

The theta-branching process is non-regular with a positive


and infinite f (1).

1 f (1) = (1 a)1/|| (1 q)

Case 6: = 1, a (0, 1),

f n (s) = an s + (1 an )q,

q [0, 1].

If q = 1, then the theta-branching process becomes a pure death process with


mean m = a and f (1) = 0. If q < 1, then the theta-branching process is nonregular with a positive
1 f (1) = (1 a)(1 q),

f (1) = a and f (1) = 0.


29

A special family of Galton-Watson processes


Case 7: (0, 1], a (0, 1), A > 1,

f n (s) = A [an (A s) + (1 an )(A q) ]1/ ,

q [0, 1].

If q = 1, then the corresponding theta-branching process is subcritical with the


offspring mean m = a and
f (1) = (1 + )a(1 a)(A 1)1 .

If q [0, 1), the theta-branching process is non-regular with a positive


and

1 f (1) = (A 1)([a + (1 a)(A q) (A 1) ]1/ 1),


f (1) = a[a + (1 a)(A q) (A 1) ]1/1 ,

f (1) = (1 + )a(1 a)(A q) (A 1)1 [a + (1 a)(A q) (A 1) ]1/2 .

We have f (A) = A, and the dual generating function has the form of the Case 3:
]1/ .
f(s) = 1 [a(1 s) + (1 a)(1 q)

Case 8: = 0, a (0, 1), A > 1,

f n (s) = A (A q)1a (A s)a ,

q [0, 1].

If q = 1, the theta-branching process is subcritical with the offspring mean m =


a and
f (1) = a(1 a)(A 1)1 .
If q [0, 1), the theta-branching process is non-regular with a positive

and

1 f (1) = (A q)1a (A 1)a (A 1),


f (1) = a(A q)1a (A 1)a1 ,

f (1) = a(1 a)(A q)1a (A 1)a2 .

We have f (A) = A, and the dual generating function belongs to the Case 4:
1a (1 s)a .
f(s) = 1 (1 q)

Case 9: (1, 0), a (0, 1), A > 1,


i1/||
h
,
f n (s) = A an (A s)|| + (1 an )(A q)||

q [0, 1].

If q = 1, then the theta-branching process is subcritical with the offspring mean


m = a and
f (1) = (1 ||)a(1 a)(A 1)1 .
If q [0, 1), the theta-branching process is non-regular with a positive
1 f (1) = [a(A 1)|| + (1 a)(A q)|| ]1/|| (A 1),
30

A special family of Galton-Watson processes


and

f (1) = a[a + (1 a)(A q)|| (A 1)|| ]1/||1 (0, 1),

f (1) = (1 ||)a(1 a)(A q)|| (A 1)||1 [a + (1 a)(A q)|| (A s)|| ]1/||2 .


With
f (A) = A (1 a)1/|| (A q) (q, A),
the dual generating function takes the form of the Case 5:
|| ]1/|| .
f(s) = 1 [a(1 s)|| + (1 a)(1 q)
5. Extinction and explosion times
Recall that T = T0 T1 , and in the regular case T = T0 . In the non-regular
case, when f (1) < 1, from
P(n < T0 < ) = q f n (0),

P(n < T1 < ) = f n (1) q,

we obtain

P(n < T < ) = f n (1) f n (0).

For our special family of branching processes we compute explicitly the distribution functions of the times T0 , T1 , T .
Cases 1-4. In these regular cases we are interested only in the extinction
time:
n/

a
[1 + d dan ]1/ ,

1/ ,

(1 + cn)


P(n < T0 < ) =
n (1 (1 q) )]1/ 1 ,

(1

q)
[1

(1 q)[(1 q)an 1],

Case 1,
Case 2,
Case 3,
Case 4.

Cases 5, 7, 9. In these cases

P(n < T0 < ) = (A q)([1 an (1 (A q) A )]1/ 1),




P(n < T1 < ) = (A q) 1 [1 an (1 (A q) (A 1) )]1/ ,


n
1/
n

1/
P(n < T < ) = (A q) [1 a (1 (A q) A )]
[1 a (1 (A q) (A 1) )]
.
Case 6. In this trivial case
P(n < T0 < ) = an q,

and for q (0, 1),

P(n < T1 < ) = an (1 q),

E(T0 |T0 < ) = E(T1 |T1 < ) = E(T ) =


31

P(n < T < ) = an .


1
.
1a

A special family of Galton-Watson processes


Case 8. In this case
n

P(n < T0 < ) = (A q)[(A q)a Aa 1],


n

P(n < T1 < ) = (A q)[1 (A q)a (A 1)a ],


n

P(n < T < ) = (A q)1a [Aa (A 1)a ].

Theorem 8. Consider a theta-branching process with (1, 0] and A 1.


Let 0 and A 1 in such a way that
1
r, r [0, ].
A1
Then for any fixed a (0, 1), q [0, 1), and y (, ),
|| log

lim P(T1 loga y|T1 < ) = e wa ,

where
=

||,
r (0, ],
1 1
(log A1 ) , r = 0,

w=

The limit is a Gumbel distribution with mean


constant.

1,
r {0} {},
r
1 e , r (0, ).

log w
log a , where

is the Euler-Mascheroni

Proof. In view of
P(T1 n|T1 < ) =
it suffices to verify that
h

i1/|| A 1
Aqh

,
1 an (1 (A 1)|| (A q)|| )
1q
1q

1 ay (1 (A 1)|| )

i1/||

e wa .

Indeed, if r = , then (A 1)|| 0, and


h
i1/||
y
ea .
1 ||ay (1 (A 1)|| )

If r (0, ), then (A 1)|| er , and


h
i1/||
y
r
e a (1e ) .
1 ||ay (1 (A 1)|| )

Finally, if r = 0, then

1 (A 1)|| ||/,

and therefore
h

1 ay (1 (A 1)|| )
32

i1/||

e a .


A special family of Galton-Watson processes


Corollary 9. If A = 1 and (1, 0), then for any fixed a (0, 1) and q
[0, 1),
y
lim P(T1 loga || y|T1 < ) = e a , y (, ),
0

If = 0 and A = 1 + e 1/ , > 0, then for any fixed a (0, 1) and q [0, 1),
y

lim P(T1 loga y|T1 < ) = e a ,

y (, ).

6. The Q-process
As explained in Ch I.14, [1], for a regular Galton-Watson process with transition probabilities Pn (i, j), one can define another Markov chain with transition
probabilities
jqji Pn (i, j)
, i 1, , j 1,
Qn (i, j) :=
ni
where = f (q). The new chain is called the Q-process, and from


X
f (sq) f n (sq) i1
s d
Qn (i, j)s j = n i (f ni (sq)) = s n

q
f n (q)
iq ds
j1

we see that the Q-process is a Galton-Watson process with the dual reproducf (sq)
tion q and an eternal particle generating a random number of ordinary
f (sq)

particles with E(s ) = f (q) , see [3]. The Q-process in the regular case is interpreted in [1] as the original branching process "conditioned on not being extinct
in the distant future and on being extinct in the even more distant future".
Exactly the same definition of the Q-process makes sense in the non-regular
case, only now the last interpretation should be based on the absorption time
T rather than on the extinction time T0 . Indeed, writing Pj () = P(|Z0 = j) we
get for j 1,
j

and therefore,

Pj (T > n) = f n (1) f n (0),


j

Pi (Z1 = j1 , . . . , Zn = jn |T > n + k) = Pi (Z1 = j1 , . . . , Zn = jn )

f k n (1) f k n (0)

i
i
f n+k
(1) f n+k
(0)

In the non-regular case, as k we have f k (0) q and f k (1) q. Thus,


repeating the key argument of Ch I.14, [1] for the derivation of the Q-process,
jn q jn
Pi (Z1 = j1 , . . . , Zn = jn |T > n + k) Pi (Z1 = j1 , . . . , Zn = jn ) n i ,
iq
we arrive in the limit to a Markov chain with the transition probabilities Qn (i, j).
33

A special family of Galton-Watson processes


By Theorem 3 from Ch. I.11 in [1],
n Pn (i, j) iqi1 j ,
where Q(s) =

j1 j s

i, j 1,

satisfies
Q(f (s)) = Q(s),

Q(q) = 0.

P
In the critical case as well as in the subcritical case with
k=2 pk k log k =
the solution is trivial: Q(s) 0. Otherwise, Q(s) is uniquely defined by the
above equation with an extra condition Q (q) = 1, so that the Q-process has a
stationary distribution given by
Qn (i, j) jqj1 j ,
with
X

jq j1 j s j = sQ (sq).

j1

These facts concerning Q(s) remain valid even in the non-regular case. It
is easy see from (2.2) that for our family with , 0 and A > q, the generating
function
Q(s) = (A s) (A q) ,

is determined by parameters (, A) and is independent of a = . Similarly, for


= 0 we have
As
.
Q(s) = log
Aq

This leaves us with two cases when A = q = 1. In the critical Case 2 the answer
is trivial: Q(s) 0. In the subcritical Case 1, we have = a1/ and
(1 f (s)) + d = [(1 s) + d],

which yields
Q(s) = [(1 s) + d]1/ .

From these calculations it follows, in particular, that for our family of branching processes, in all subcritical cases, the classical x log x moment condition
holds:

X
pk k log k < .
k=2

Using these explicit formulas for Q(s) we can easily find the conditional
probability distributions
lim P(Zn = j|T > n) = bj ,

34

j 1.

A special family of Galton-Watson processes


For all cases, except the critical Case 2, we have
X
Q(sq)
.
bj s j = 1
Q(0)
j1

Turning to the Case 2, recall that for any critical Galton-Watson process, there
exists a limit probability distribution
lim P(Zn = j|T0 = n + 1) = wj ,

such that
X
j1

j 1,

f n (sp0 ) f n (0)
.
n f n (p0 ) f n (0)

wj s j = lim

Since
we obtain

f n (sp0 ) = 1 [(1 s(1 [1 + c]1/ )) + nc]1/ ,


X
j1

[1 s(1 [1 + c]1/ )] 1
wj s =
.
c
j

7. Embedding into continuous time branching processes


Recall that a Galton-Watson processes with generating functions f n is called
embeddable, if there is a semigroup of probability generating functions
(7.1)

Ft+u (s) = Ft (Fu (s)),

t [0, ), u [0, ),

such that f n (s) = Fn (s), n = 1, 2, . . .. Although not every Galton-Watson process is


embeddable, see Ch. III.6 in [1], in this section we demonstrate that all thetabranching processes are embeddable.
Behind each semigroup (7.1) there is a continuous time Markov branching
process with particles having exponential life lengths with parameter, say, .
Each particle at the moment of death is replaced by a random number of new
particles having a probability generating function
h(s) = h0 + h2 s 2 + h3 s 3 + . . . .
For such a continuous time branching process (Zt )t[0,) the probability generating function Ft (s) = Es Zt satisfies
Z Ft (s)
dx
= t
(7.2)
h(x) x
s

(see [6] for a recent account of continuous time Markov branching processes).
Our task for this section is for
Peach f G to find a pair (h, ) such that f (s) =
F1 (s). We will denote by =
k=2 khk the corresponding offspring mean number and by q the minimal nonnegative root of the equation h(s) = s which gives
35

A special family of Galton-Watson processes


the extinction probability of the continuous time branching process.
Cases 1-3. For a pair (0, 1] and (0, 1 + 1 ], put

h(s) = 1 (1 s) +
(1 s)1+ .
1+
Taking successive derivatives of h it easy to see that it is a probability generating function with h (0) = 0. Next we show that using this h as the offspring
probability generating function for the continuous time branching process we
can recover f (s) for the theta-branching processes as F1 (s) by choosing and
adapted to Cases 1- 3.
Case 1. For a given pair a (0, 1) and d (0, ), put
=

(1 + )d
,
(1 + )d + 1

= [(1 + 1 )d + 1 ] ln a.

In this subcritical case, applying (7.2) we obtain for s [0, 1)


Z Ft (s)
Z Ft (s)
1
d log 1x
dx
=
,
t =

(1 )(1 x) + 1+ (1 x)1+
1 + 1+ e log(1x)
s
s

yielding the desired formula


1/
t

t
.
Ft (s) = 1 a (1 s) + (a 1)d

Case 2. For a given c (0, ), put = 1 and = (1 + 1 )c. Then by (7.2),


we get

1/
Ft (s) = 1 (1 s) + ct
.
(1)(1+)

)1/ and the proposed h can be


Case 3. If > 1, then q = 1 (

rewritten as
(1 s)1+ (1 q) (1 s)
h(s) = s +
.
1 + (1 q)
For a given pair a (0, 1) and q [0, 1) choosing
= [(1 + 1 )(1 q) 1 ] ln a1

and applying (7.2), we obtain

Ft (s) = 1 [at (1 s) + (1 at )(1 q) ]1/ .

It is easy to see that f (s) = F1 (s) covers the whole subfamily G corresponding
to the Cases 1-3.

Notice that if = 1, then h(s) = 1 2 + 2 s 2 generates the linear birth and


death process with h (1) = . If (0, 1), then h (1) = .
36

A special family of Galton-Watson processes


Case 4. Consider a supercritical reproduction law with infinite mean
h(s) = s + (1 s)
For h0 [0, 1) this can be rewritten as

ln(1 s) ln(1 q)
.
1 ln(1 q)

h(s) = h0 + (1 h0 )

X
k=2

sk
.
k(k 1)

In this form with h0 = 0, the generating function h appeared in [4] as the reproduction law of an immortal branching process. Earlier in [8], this reproduction
law was introduced as
h(s) = 1 (1 h0 )(1 s)(1 ln(1 s)).

To see that the theta-branching process in the Case 4 is embeddable into the
Markov branching process with the above mentioned reproduction law, use the
first representation of h and apply (7.2). As a result we obtain for s , q,
Z Ft (s)
Z Ft (s)
ln(1 x)
dx
t
=
=
1 ln(1 q)
(1 x)(ln(1 x) ln(1 q))
ln(1 q) ln(1 x)
s
s
= ln[ln(1 s) ln(1 q)] ln[ln(1 Ft (s)) ln(1 q)].

Putting = (1 ln(1 q)) ln a1 , we derive

Ft (s) = 1 (1 q)1a (1 s)a .

Cases 5, 7, 9. In these three cases the corresponding h and are given by


an extension of the formulas for the Case 3:
h(s) = s +

(A s)1+ (A q) (A s)
,
(1 + )A (A q)

= [(1 + 1 )A (A q) 1 ] ln a1 .

Turning to Definition 7 we see that this h in the Case 7 is dual to the h in the
Case 3, and in the Case 9 it is dual to that of the Case 5.
Case 6. In this trivial case the corresponding continuous time branching
process is a simple death-explosion process with h(s) = q and = ln a1 .
Case 8. Similarly to the Case 4 we find that the pair
h(s) = s + (A s)
lead to

ln(A s) ln(A q)
,
1 + ln A ln(A q)

= (1 + ln A ln(A q)) ln a1 ,
t

Ft (s) = A (A q)1a (A s)a .

Observe that this h is dual to that of the Case 4.


37

A special family of Galton-Watson processes


Bibliography
[1] Athreya, K.B., Ney, P.E. (1972) Branching Processes. Springer, Berlin.
[2] Harris, T.E. (1963) The Theory of Branching Processes. Springer, Berlin.
[3] Klebaner, F., Rsler, U., Sagitov, S. (2007) Transformations of Galton-Watson processes and
linear fractional reproduction. Adv. Appl. Prob. 39, 10361053.
[4] Lagers, A.N., Martin-Lf, A. (2006) Genealogy for supercritical branching processes. J.
Appl. Probab. 43(4), 10661076.
[5] Sagitov, S. (2013) Linear-fractional branching processes with countably many types. Stoch.
Proc. Appl. 123, 29402956.
[6] Sagitov, S. (2015) Tail generating functions for Markov branching processes.
arXiv:1511.05407.
[7] Sagitov, S., Serra, M.C. (2009) Multitype Bienayme-Galton-Watson processes escaping extinction. Adv. Appl. Prob. 41, 225246.
[8] Sevastianov, B.A. (1971) Branching Processes. Nauka, Moscow (in Russian).
[9] Tokarev, D. (2007) Galton-Watson Processes and Extinction in Population Systems. PhD thesis. Monash University, Melbourne, Australia. Monash University.
[10] Zolotarev, V.M. (1957) More exact statements of several theorems in the theory of branching
processes. Theory Prob. Appl. 2, 245253.

38

PAPER II

General linear-fractional branching process with discrete time


Alexey Lindo and Serik Sagitov

preprint.

PAPER II

General linear-fractional branching processes with


discrete time
Alexey Lindo and Serik Sagitov

Abstract. We study a special class of Bienaym-Galton-Watson processes with


a general type space. The special feature of this class is an embedded linearfractional Crump-Mode-Jagers process whose tree contour is described by an
alternating random walk. For such discrete time branching processes with a
general type space we establish transparent limit theorems for the subcritical,
critical and supercritical cases. These results extend recent findings in [15] for
the linear-fractional branching processes with countably many types.

1. Introduction
Multi-type branching processes with a general measurable space (E, E) of
possible types of individuals, were addressed in monographs [10, 11, 16], see
also paper [2]. Notably, in [7], [8], and [12] the authors develop a full-fledged
theory for the general supercritical branching processes with age dependence.
These results rely upon the generalisations of the Perron-Frobenius theorem for
irreducible non-negative kernels [14] and Markov renewal theorems [1]. Therefore, a typical limit theorem for general branching processes involves technical
conditions of irreducibility imposed on the reproduction law connecting the
elements of the type space E.
This paper deals with the general Bienaym-Galton-Watson processes, that
is branching processes of particles with non-overlapping generations. Denote
by Zn (A) the number of n-th generation particles whose types belong to A E.
The particles are assumed reproduce independently according to a reproduction law allocating offspring types by a random algorithm regulated by the
parental type. A key characteristic of the multi-type reproduction law is the
expectation kernel
(1.1)

M(x, A) := Ex Z1 (A),
41

x E,

A E,

General linear-fractional branching processes


where the conditional expectation operator
Ex [] := E[|Z0 = x ],

x (A) := 1{xA} ,

is indexed by the type x of the ancestral particle. Thus, the measure M(x, dy)
gives the mean offspring numbers to a mother of type x. The measure-valued
Markov chain {Zn (dx)}n0 has the mean value kernels
M n (x, A) = Ex Zn (A),

x E,

A E,

computed as the powers of the kernel M(x, dy)


Z
0
n
M (x; A) := x (A), M (x; A) :=
M n1 (y, A)M(x, dy),
E

n 1.

Here and elsewhere the integrals are always taken over the type space E.
More specifically, we study, what we call, LF-processes, branching particle systems characterised by general linear-fractional distributions. At the expense of the restricted choice for the particle reproduction law, we are able to
obtain explicit Perron-Frobenius asymptotic formulas using a straightforward
argument without directly referring to the general Markov chain theory. Our
approach develops the ideas of [15] dealing with the countably infinite type
space E.
An LF-process has a reproduction law parametrised by a triplet (K, , m)
consisting of a sub-stochastic kernel K(x, dy), probability measure (dy), and
a number m (0, ). Given the ancestral particle type x, the total offspring
number Z1 (E) is assumed to follow a linear-fractional distribution
s
.
Ex sZ1 (E) = p0 (x) + (1 p0 (x))
1 + m ms
Here the probability of having no offspring p0 (x) = Px (Z1 (E) = 0) depends on
the parental type, while the geometric number of offspring beyond the first one


1
Z1 (E)1
Ex s
| Z1 (E) > 0 =
1 + m ms
has mean m independently of x. (Even though all the Z1 (E) = k offspring of
the ancestral particle are produced instantaneously, we assume that they are
somehow labeled from 1 to k.)
The assignment of types to k = Z1 (E) offspring is done independently using conditional distribution K(x, dy)/K(x, E) for the first one, and distribution
(dy) for the other k 1 offspring. These quite restrictive assumptions produce
an important feature of the LF-process in that its kernel (1.1) has a particular
structure
(1.2)

M(x, A) = K(x, A) + K(x, E)m(A),

where the first term, K(x, A), is the contribution of the first offspring and the
second term is the joint contribution of other offspring.
42

General linear-fractional branching processes


The framework of LF-processes has a reasonable level of generality: it
is broad enough to contain a variety of interesting examples, yet restrictive
enough to allow for transparent limit theorems shedding light onto the general
theory of multi-type branching processes.
Our presentation in Section 2 starts with a more formal definition of general linear-fractional distributions. It is shown that for an arbitrary generation,
the LF-process has a linear-fractional distribution, see Theorem 11. Then in
Section 3 we obtain a transparent form, Theorem 13, of the Perron-Frobenius
theorem for the powers of the kernel (1.2) using a generating function approach
adapted from [15].
The more straightforward treatment of LF-processes developed here becomes possible due to the existence of an inherent linear-fractional CrumpMode-Jagers process, described in Section 4. An alternative picture of the LFprocess as a branching random walk over the state space E provides with new
insight into our model. Namely, one can think in terms of individuals walking
over E obeying the Markov transition rules with kernel K(x, dy). Each individual alive at the current moment, produces a geometric number of offspring
with mean m. All newborn individuals have independent starting positions
allocated according to the common distribution (dy).
In Section 5 we demonstrate how the simple conditions and clear statements of the Perron-Frobenius Theorem 13 relate to less transparent general
results of this kind summarised, from example, in [13] and [14].
Finally, Section 6 presents three basic limit theorems for the subcritical,
critical, and supercritical LF-processes. The obtained asymptotic formulas are
clearly expressed in terms of the defining triplet (K, , m).
2. General linear-fractional distributions
Definition 10. An integer-valued random measure Z(dx) on (E, E) is said to
have a linear-fractional distribution and denoted Z lf(, , m), where (dx) and
(dx) are two non-random measures such that (E) 1 and (E) = 1, and m is a
positive number, if
P(Z(E) = 0) = 1 (E),

mk1
,
P(Z(E) = k|Z(E) > 0) =
(1 + m)k

k 1,

and conditionally on Z(E) = k,


d

Z(A) = X1 (A) + . . . + Xk (A),

AE

where X1 , X2 , . . . are independent random points on E with


P(X1 A) = (A)/(E),

P(Xi A) = (A), i 2.
43

General linear-fractional branching processes


The distribution of Z(dx) is compactly described by its generating functional EhZ , where for a given probe function h : E (0, 1], we denote

Z
Z
Z(dy) ln h(y) .
h := exp
The generating functional is a natural extension of the probability generating
P
function, in that with a simple probe function h(x) = ki=1 si x (Ai ) for a given
arbitrary measurable partition {A1 , . . . , Ak } of E, we have
Z(A1 )

EhZ = Es1

Z(Ak )

sk

For the distribution from Definition 10 the generating functional has the following linear-fractional form
R
h(y)(dy)
R
EhZ = 1 (E) +
.
1 + m m h(y)(dy)
Theorem 11. Consider an LF-process {Zn (dy)} with
R
h(y)K(x, dy)
R
.
(2.1)
Ex hZ1 = 1 K(x, E) +
1 + m m h(y)(dy)
Then for each n 1,

(2.2)

Ex hZn = 1 Kn (x, E) +

h(y)Kn (x, dy)


R
,
1 + mn mn h(y)n (dy)

where the triplet (Kn n , mn , ) is uniquely specified by the triplet (K, , m) via the
relations
n1 Z
X
mn = m
(2.3)
M k (x, E)(dx),
k=0

(2.4)

n (A) = m1
n m

n1 Z
X
k=0

(2.5)

Kn (x, A) = M n (x, A)

M k (x, A)(dx),

mn
M n (x, E)n (A).
1 + mn

The proof of Theorem 11 is essentially the same as that of Theorem 3


in [15]. A sketch of the proof is postponed until the end of Section 4.
Corollary 12. The following transparent formula holds for the survival probability of the LF-process
(2.6)

Px (Zn (E) > 0) = (1 + mn )1 M n (x, E).


44

General linear-fractional branching processes


Furthermore, conditionally on the ancestral type x and the survival event {Zn (E) >
0}, we have Zn lf(n , n , mn ), where n (dy) = Kn (x, dy)/Kn (x, E).
Proof. From (2.2), we find

Px (Zn (E) > 0) = Kn (x, E),


which together with (2.5) yields the stated formula for the survival probability.
The second claim is another consequence of the formula (2.2):
R


h(y)Kn (x, dy)
R
.
(2.7)
Ex hZn |Zn (E) > 0 = Kn (x, E)1
1 + mn mn h(y)n (dy)


3. Perron-Frobenius theorem
According to Theorem 11, the asymptotic behaviour of the LF-process is
determined by the asymptotic behaviour of M n (x, A) as n , which is the
subject of the this section. Here we analyse the growth rate of M n (x, A) in terms
of the generating functions

X
X
(s)
n n
(s)
M (x, A) =
s M (x, A), K (x, A) =
s n K n (x, A),
n=0

n=0

K n (x, A)

where the kernel powers


are defined similarly to M n (x, A). A key tool
in this analysis is the generating function
Z
X
n
(3.1)
f (s) =
dn s , dn = K n (x, E)(dx), n 1,
n1

whose radius of convergence will be denoted by


Denote

R := inf{s > 0 : f (s) = }.

Es = {x E : K (s) (x, E) < }, s (0, ).


From (3.1), we see that
Z
(3.2)
K (s) (x, E)(dx) = 1 + f (s),
which implies that (Es ) = 1, provided f (s) < .

Theorem 13. Suppose that f (R ) 1/m so that there is a unique R (0, )


satisfying mf (R) = 1. Then
u(x) = (1 + m)(K (R) (x, E) 1)1{xER } ,
Z
m
(A) =
K (R) (y, A)(dy),
1+m
45

General linear-fractional branching processes


are well-defined and satisfy
Z
1+m
u(x)(dx) =
,
m

(E) = 1,

Put = R1 . For x ER , we have


Z
u(y)M(x, dy) = u(x),

u(x)(dx) = mRf (R).

M(y, A)(dy) = (A).

Moreover, if f (R) < , then


Rn M n (x, A)

u(x)(A)
,
mRf (R)

n ,

and if f (R) = , then Rn M n (x, A) 0.


Proof. All parts of this statement except the last one are checked by straightforward calculations. In particular, using (3.2), we obtain
Z
1+m
,
u(x)(dx) = (1 + m)f (R) =
m
and also
Z

u(x)(dx) = m
=m

n=1

Z Z

K n (x, E)K (R) (y, dx)(dy)

kdk Rk = mRf (R).

k=1

To prove the last part we show first that


(3.3)

M (s) (x, A) = K (s) (x, A) + (K (s) (x, E) 1)m

M (s) (y, A)(dy).

Indeed, using (1.2), we find


Z
Z
n
n1
M (x, A) = M (y, A)K(x, dy) + mK(x, E) M n1 (y, A)(dy).
Reiterating this relation we obtain
n

M (x, A) = K (x, A) + m

n
X
i=0

K (x, E)

ni

(y, A)(dy) m

M n (y, A)(dy),

which leads to (3.3) as we go from the sequences to their generating functions.


46

General linear-fractional branching processes


By iterating (3.3) once we obtain
(s)

(s)

(s)

M (x, A) = K (x, A) + (K (x, E) 1)m K (s) (y, A)(dy)


Z
+ (K (s) (x, E) 1)m2 f (s) M (s) (y, A)(dy).
Assuming mf (s) < 1 and reiterating we obtain
m
(K (s) (x, E) 1)
M (x, A) = K (x, A) +
1 mf (s)
(s)

(s)

K (s) (y, A)(dy).

If we now apply Lemma 14 below with


a(s) = mf (sR),

b(s) = m(K

(sR)

(x, E) 1)

K (sR) (y, A)(dy),

then using a (1) = mRf (R), we derive


u(x)(A)

mRf (R)
R (M (x, A) K (x, A))
0
n

if f (R) < ,
if f (R) = .

It remains to observe that Rn K n (x, A) 0 as n for all x ER and A E.

The next lemma is a Tauberian theorem taken from [5, Chapter XIII.10].
P
n
14. Let a(s) =
n=0 an s be a probability generating function and b(s) =
P Lemma
n
n=0 bn s is a generating function for a non-negative sequence so that a(1) = 1 while
P
b(s)
n
b(1) (0, ). Then the non-negative sequence (cn ) defined by
n=0 cn s = 1a(s) is

such that cn

b(1)
a (1)

as n .

4. Embedded Crump-Mode-Jagers process


A crucial feature of the LF-process is the existence of an embedded linearfractional Crump-Mode-Jagers (CMJ) process described next, cf [15, Section 3].
All the key entities involved in the Perron-Frobenius Theorem 13 have a transparent probabilistic meaning in terms of this CMJ-process. Recall that a singletype CMJ-process models an asexual population with overlapping generations,
see [9]. The CMJ-model is described in terms of individuals rather than particles, since a CMJ-population is set out in the real time framework, in contrast
to the generation-wise setting for the Bienaym-Galton-Watson process.
With a given triplet (K, , m) consider the LF-process stemming from a particle whose type is chosen by the distribution (dx). The key idea for embedding a CMJ-process is to define an ancestral individual whose life history represents the evolution of the first offspring lineage for the ancestral particle. By
the first offspring lineage we mean the sequence of descendants consisting of
the first child, first grandchild, and so on until this lineage halts by a particle
47

General linear-fractional branching processes


having no children. Thus defined ancestral individuals life length L has the tail
probabilities P(L > n) = dn with the tail generating function (3.1). In particular,
we get
P(L 1) = 1, EL = f (1),

All direct offspring particles in this lineage are treated as the originators of
the offspring individuals to the ancestral one. As a result, the ancestral individual produces random numbers of offspring at times 1, . . . , L1. The corresponding litter sizes are mutually independent and have the same geometric distribution with mean m. The newborn individuals live independently according to
the same life law as their mothers. Thus defined CMJ-process has population
size at time n coinciding with the generation size Zn (E) of the LF-process with
parameters (K, , m) starting from a particle whose type has distribution (dx).
This single-type CMJ-process conceals the information on the types of the
particles. To recover this information we introduce additional labelling of individuals using the types of underlying particles. The evolution of a labeled
individual over the type space E can be described by a Markov chain whose
state space E = E is the particle type space E augmented with a graveyard
state. The transition probabilities of such a chain are given by a stochastic ker {}) = 1 K(x, E) for x E, and K(,

nel K extending K such that K(x,


{}) = 1.
In terms of this Markov chain the life length L is the time until absorption at
the graveyard given the initial state distribution (dx) supported by E.
Turning to Theorem 13, we see that the Perron-Frobenius root is defined
by the equation f (1/) = 1/m that puts together two ingredients of the CMJindividual reproduction law: its life-length tail generating function f and the
mean offspring number per unit of time m. Clearly, the larger is the life-length
and the larger is m, the larger gets the population growth rate . Note that
in general, the growth rate is not given by the mean value = mf (1) for the
number of offspring produced by a CMJ-individual during its whole life.
In the age-dependent setting the population growth rate is usually described by the so called Malthusian parameter . Using (7) from [9] one can
compute from the equation

e n an = 1,

n=1

where an stands for the mean number of offspring produced by the ancestral
individual at time n.
mf (e ) = 1,
Since an = mdn , we conclude that, provided the Malthusian parameter exists,
we have
= ln = ln R.
48

General linear-fractional branching processes


For a given , the mean age at childbearing, see [6] and [9], corresponding
to the average generation length, in our case is computed as

X
=m
ndn e n = mRf (R),
n=1

so that is either finite or infinite depending on whether f (R) is finite or not.


Proof. of Theorem 11
The embedded CMJ-process generates random trees having tree contours
of simple structure. As one goes around a CMJ-tree, one performs an alternating random walk, where an instantaneous upward jump, whose size is distributed as the life length L, is followed by a geometric number of downward
unit-jumps until a one hits the nearest branch turning the random walk upwards, see [15, Section 3] for details.
In terms of such a contour process around the tree truncated at the observation level n, the LF-process size at time n is given by the number of the
alternating random walk excursions on the level n. The number of such excursions once the level n is reached is geometric and independent of the ancestral
type. This implies the linear-fractional nature of the distribution for the random measure Zn (dx), see [15, Section 4] for details.
Having established the stated linear-fractional distribution property using
the contour process argument, it remains to verify that the relations (2.3) (2.5) indeed specify the triplet defining the n-th generation distribution. The
key relation (2.4) relies on the spinal representation trick explained in Section
7.1 of [15]. The expression (2.3) for mn is a straightforward corollary of (2.4),
while (2.5) is obtained from the analog of (1.2)
M n (x, A) = Kn (x, A) + Kn (x, E)mn n (A),
according to which
M n (x, E) = Kn (x, E)(1 + mn ).

5. Positive recurrence over the type space
According to Theorem 13, if f (R) < (equivalently < ), then we can
distinguish among three familiar reproduction regimes for branching process
subcritical LF-process: if mf (1) < 1, equivalently < 1, R > 1, or < 0,
then the expected generation size decreases exponentially as n ;
critical LF-process: if mf (1) = 1, equivalently = 1, R = 1, or = 0,
then the expected generation size measure stabilises;
supercritical LF-process: if mf (1) > 1, equivalently > 1, R < 1, or
> 0, then the expected generation size increases exponentially as n .
49

General linear-fractional branching processes


Recall that R is defined by mf (R) = 1 provided f (R ) 1/m. We extend
this definition by putting R = R in the case f (R ) < 1/m. The next lemma gives
another perspective at the meaning of the parameter R.
Lemma 15. The power series M (s) (x, A) have the same radius of convergence R
for all A E for -almost every x E.
Proof. Integrating (3.3) with respect to measure and using (3.2), we find
Z
Z
Z
(s)
(s)
M (x, A)(dx) = K (x, A)(dx) + mf (s) M (s) (y, A)(dy).
It follows that if mf (s) < 1, then
R
Z
K (s) (x, A)(dx)
(s)
,
(5.1)
M (x, A)(dx) =
1 mf (s)
R
R
implying M (s) (x, A)(dx) < . On the other hand, if mf (s) 1, then M (s) (x, A)(dx) =
for all A E. Turning to the definition of R, we conclude that the statement
is true.

Using the terminology of the theory of general Markov chains and irreducible kernels [14, Ch 3.3], our Lemma 15 implies that R is the convergence
parameter of the kernel M(x, dy). Furthermore, we see that if f (R ) 1/m, then
M(x, dy) is R-recurrent, while if f (R ) 1/m, then M(x, dy) is R-transient.
The R-recurrent case is further split in two sub-cases. According to Theorem 13, the R-recurrent kernel M(x, dy) is R-null recurrent if f (R) = , and
R-positive recurrent if f (R) < , cf [14, Ch 5]. Observe also that the function
u(x) and measure (dy) introduced in Theorem 13 are invariant function and
measure for the kernel M(x, dy).
The theory of irreducible kernels is built around the so-called minorisation
condition. It turns out that the key relation for our model (1.2) automatically
produces a relevant minimisation condition
M(x, A) mK(x, E)(A).
In this context the pair (K(x, E), (dy)) is called an atom for the kernel M(x, dy).
The existence of an atom allows to construct an embedded renewal process [14,
Ch 4] and carry over most of the results from the theory of countable matrices
to the general state space. The approach of this paper allows for the kernels
satisfying (1.2) to circumvent the use of such general theory for obtaining the
Perron-Frobenius type theorem.
In the next section we derive three limit theorems for the LF-processes in
the R-positively recurrent case.
50

General linear-fractional branching processes


6. Basic limit theorems for the LF-processes
Combining Theorems 11 and 13 we establish three propositions stated under the following common assumption
(6.1)

R (0, ),

x ER ,

f (R) < .

These propositions are basic asymptotic results for the general LF-processes
extending similar statements for the countably infinite E in [15, Section 6].
Proposition 16. Assume (6.1) and let < 1. Then as n ,
(6.2)

Px (Zn (E) > 0)

1 mf (1) n
u(x).
(1 + m)

Furthermore, conditionally on the ancestral type x and the survival event {Zn (E) >
,
m),
where
0}, the distribution of Zn converges to lf(,
Z
m(1 + f (1))
1

, (dy)
=
K (1) (x, A)(dx),
m =
1 mf (1)
1 + f (1)
Z 

m

K (R) (x, A) K (1) (x, A) (dx).


(dy)
=
1 mf (1)

Proof. From (2.3) and (5.1) we obtain


mn

m(1 + f (1))
,
1 mf (1)

which together with (2.6) implies (6.2).


The statement on the convergence of the conditional distribution of Zn follows from (2.7). Indeed, by (2.4), we have
R
Z
K (1) (x, A)(dx)
m
(1)
n (A)
.
M (x, A)(dx) =
1 + f (1)
m
On the other hand, using (2.5), we find



m(1 + f (1))(A)
n u(x)
Kn (x, A)
(A)
,

1+m
yielding
Kn (x, A)
m

Kn (x, E)
1 mf (1)

Z 

(R)

(x, A) K

(1)


(x, A) (dx).


Proposition 17. Assume (6.1) and let = 1. Then we have


Px (Zn (E) > 0) n1 (1 + m)1 u(x).
51

General linear-fractional branching processes


R
For any measurable probe function w with w(y)(dy) (0, ), and for any x 0,
we have
R
 w(y)Z (dy)

n
Px R
> nx|Zn (E) > 0 ex/(1+m) .
w(y)(dy)

In other words, conditionally on non-extinction n1 Zn (dy) weakly converges to


X(dy), where X is exponentially distributed with mean 1 + m.
Proof. Lemma 14 and relations (2.3), (5.1) imply that in the critical case
mn n(1 + m) 1 .

Moreover, by (2.4) and (2.5) we get

n1 Z

m X
n (A) =
mn
k=0

Kn (x, E) =

M k (x, A)(dx) (A),

u(x)
1
M n (x, E)
,
1 + mn
n(1 + m)

Thus, (2.6) gives the stated asymptotics for the survival probability, and the
weak convergence follows from the next corollary of (2.7):
R
w(y)
n
 R w(y)

)n (dy)
1

(1

e
1

,
Ex e n Zn (dy) |Zn (E) > 0 =
R
w(y)
1
+
I
n
w
) (dy)
1 + m (1 e
n

where Iw = (1 + m) 1 w(y)(dy).

Proposition 18. Assume (6.1), let > 1, and put c = ( 1)/(1 + m). Then
Px (Zn (E) > 0) cu(x),

R
For any measurable function w : E (, ) with w(y)(dy) (0, ), and for
any x 0, we have
R

 w(y)Z (dy)
n
n
> x|Zn (E) > 0 e xc .
Px R
w(y)(dy)

Proof. From (2.3), (3.2), and (5.1) we see that

X
n=1

mn s n1 =

m(1 + f (s))
.
(1 mf (s))(1 s)

Thus, Lemma 14 with cn = Rn mn , a(s) = mf (Rs), and b(s) =


mn n (1 + m) 1 ( 1)1 .
52

m(1+f (sR))
1sR

entails

General linear-fractional branching processes


This together with (2.6) gives the stated formula for the survival probability.
The assertion on weak convergence is proved in a similar way as in the critical
case above.

Bibliography
[1] Alsmeyer, G. (1994) On the Markov renewal theorem. Stoch. Proc. Appl. 50, 37-56.
[2] Athreya, K. and Kang, H. (1998). Some limit theorems for positive recurrent Markov Chains
I and II. Adv. in Appl. Probab. 30, 693722.
[3] Athreya, K. and Ney, P. (1972) Branching processes, John Wiley & Sons, London-New YorkSydney.
[4] Curtiss, J. H. (1942) A Note on the Theory of Moment Generating Functions. Ann. Math. Stat.
13, 430433.
[5] Feller, W. (1959). An introduction to probability theory and its applications, Vol I, 2nd ed. John
Wiley & Sons, London-New York-Sydney.
[6] Jagers, P. (1975) Branching processes with biological applications, Wiley, New-York.
[7] Jagers, P. (1989) General branching processes as Markov fields. Stoch. Proc. Appl. 32, 183 212
[8] Jagers, P. and Nerman, O. (1996) The asymptotic composition of supercritical multi-type
branching populations. Springer Lecture Notes in Mathematics, 1626, 40 - 54.
[9] Jagers, P. and Sagitov, S. (2008) General branching processes in discrete time as random trees.
Bernoulli 14, 949962.
[10] T. E. Harris. (1963) The Theory of Branching Processes, Springer, Berlin.
[11] Liemant, A. Matthes, K. and Wakolbinger, A. (1988) Equilibrium distributions of branching
processes (Vol. 34), Akademie-Verlag.
[12] Nerman, O. (1984) The Growth and Composition of Supercritical Branching Populations on
General Type Spaces. Preprint Dep. Mathematics, Chalmers U. Tech. and Gothenburg U. 19844.
[13] Meyn, S. Tweedie, R.L. (2010) Markov Chains and Stochastic Stability, Cambridge University
Press, London, 2ed.
[14] Nummelin, E. (1984) General Irreducible Markov Chains and Non-negative Operators, Cambridge
University Press, London.
[15] Sagitov, S. (2013) Linear-fractional branching processes with countably many types. Stoch.
Proc. Appl. 123, 29402956.
[16] B. A. Sevastianov. (1971) Branching Processes, Nauka, Moscow. (in Russian).

53

PAPER III

Nonparametric estimation of infinitely divisible distributions


based on variational analysis on measure
Alexey Lindo, Sergei Zuyev and Serik Sagitov

preprint.

PAPER III

Nonparametric estimation of infinitely divisible


distributions based on variational analysis on
measures
Alexey Lindo, Sergei Zuyev and Serik Sagitov

Abstract. The paper develops new methods of non-parametric estimation a


compound Poisson distribution. Such a problem arise, in particular, in the
inference of a Lvy process recorded at equidistant time intervals. Our key estimator is based on series decomposition of functionals of a measure and relies
on the steepest descent technique recently developed in variational analysis of
measures. Simulation studies demonstrate applicability domain of our methods and how they positively compare and complement the existing techniques.
They are particularly suited for discrete compounding distributions, not necessarily concentrated on a grid nor on the positive or negative semi-axis. They
also give good results for continuous distributions provided an appropriate
smoothing is used for the obtained atomic measure.

1. Introduction
The paper develops new methods of non-parametric estimation of the distribution of compound Poisson data. Such data naturally arise in the inference
of a Lvy process which is a stochastic process (Wt )t0 with W0 = 0 and time
homogeneous independent increments. Its characteristic function necessarily
has the form Ee iWt = e t() with
Z
22
+ (e ix 1 ix 1I{|x|<} )(dx),
(1.1)
() = ia
2
where > 0 is a fixed positive number, a R is a drift parameter, 2 [0, ) is
the variance of the Brownian motion component, and the so-called Lvy measure
satisfying
Z
(1.2)
({0}) = 0,
min{1, x2 } (dx) < .
57

Nonparametric estimation of infinitely divisible distributions


Here and below the integrals are taken over
the whole R unless specified othR
erwise. In a special case with = 0 and (,) |x|(dx) < , we get a pure jump
Lvy process characterised by
Z
(1.3)
() = (e ix 1)(dx),
or equivalently,
() = ia +

(e

ix

1 ix 1I|x|< )(dx),

a=

x(dx).
(,)

In an even more restrictive case with a finite total mass kk := (R), the Lvy
process becomes a compound Poisson process with times of jumps being a Poisson
process with intensity kk, and the jump sizes being independent random variables with distribution kk1 (dx). Details can be found, for instance, in [22].
Suppose the Lvy process is observed at regularly spaced times producing
a random vector (W0 , Wh , W2h , . . . , Wnh ) for some time step h > 0. The consecutive increments Xi = Wih W(i1)h then form a vector (X1 , . . . Xn ) of independent
random variables having a common infinitely divisible distribution with the
characteristic function () = e h() , and thus can be used to estimate the distributional triplet (a, , ) of the process. Such inference problem naturally
arises in in financial mathematics [8], queueing theory [1], insurance [17] and
in many other situations, where Lvy processes are used.
By the Lvy-It representation theorem [22], every Lvy process is a superposition of a Brownian motion with drift and a square integrable pure jump
martingale. The latter can be further decomposed into a pure jump martingale with the jumps not exceeding in absolute value a positive constant and
a compound Poisson process with jumps or above. In practice, only a finite
increment sample (X1 , . . . , Xn ) is available, so there is no way to distinguish between the small jumps and the Brownian continuous part. Therefore one usually chooses a threshold level > 0 and attributes all the small jumps to the
Brownian component, while the large jumps are attributed to the compound
Poisson process component (see, e.g. [2] for an account of subtleties involved).
Provided an estimation of the continuous and the small jump part is done,
it remains to estimate the part of the Lvy measure outside of the interval
(, ). Since this corresponds to the compound Poisson case, estimation of
such is usually called decompounding which is the main object of study in
this paper.
Previously developed methods include discrete decompounding approach
based on the inversion of Panjer recursions as proposed in [5]. [7], [10] and
[12] studied the continuous decompounding problem when the measure is
assumed to have a density. They apply Fourier inversion in combination with
58

Nonparametric estimation of infinitely divisible distributions


kernel smoothing techniques for estimating an unknown density of the Lvy
measure. In contrast, we do not distinguish between discrete and continuous
in that our algorithms based on direct optimisation of functionals of a measure
work for both situations on a discretised phase space of . However, if one
sees many small atoms appearing in the solution which fill a thin grid, this
may indicate that the true measure is absolutely continuous and some kind of
smoothing should yield its density.
Specifically, we propose a combination of two non-parametric methods for
estimation of the Lvy measure which we call Characteristic Function Fitting
(ChF) and Convolution Fitting (CoF). ChF deals with a general class of Lvy
processes, while CoF more specifically targets the pure jump Lvy process characterised by (1.3).
The most straightforward approach is to use the moments fitting, see [14]
and [9], or the empirical distribution function
n

1X

1I{Xk x}
Fn (x) =
n
k=1

to infer about the triplet (a, , ). Estimates can be obtained by maximising


the likelihood ratio (see, e.g. [24]) or by minimising some measure of proximity
between F(x) and Fn (x), where the dependence on (a, , ) comes through F via
the inversion formula of the characteristic function:
Z y
1
lim
exp{h() ix}d.
F(x) F(x 0) =
2 y y
For the estimation, the characteristic function in the integral above is replaced
by the empirical characteristic function:
n

1 X iXk
n () =
e
.
n
k=1

Algorithms based on the inversion of the empirical characteristic function and


on the relation between its derivatives were proposed in [25]. For a comparison
between different estimation methods, see a recent survey [23]. Note that inversion of the empirical characteristic function, in contrast to the inversion of
its theoretical counterpart, generally leads to a complex valued measure which
needs to be dealt with.
Instead, equipped with the new theoretical and numeric optimisation methods developed recently for functionals of measures (see [20] and the references
therein), we use the empirical characteristic function directly: the ChF estimator for the compounding measure or, more generally, of the whole triplet
59

Nonparametric estimation of infinitely divisible distributions


In contrast, the proposed CoF estimation method is not using the empirical
characteristic function and is based on Theorem 20 below which presents the
convolution
Z
2
F (x) = F(u)F(y u)du
as a functional of . It has an explicit form of an infinite Taylor series in direct
products of , but truncating it to only the first k terms we build a loss function
(k)
LCoF by comparing two estimates of F 2 : the one based on the truncated series
and the other being the empirical convolution Fn2 . CoF is able to produce nearly
k when large values of k are taken, but this also drastically
optimal estimates
increases the computation time.
A practical combination of these methods recommended by this paper is
k using CoF with a low value of k, and then apply ChF with
k as the
to find
starting value. The estimate for such a two-step procedure will be denoted by
k in the sequel.

To give an early impression of our approach, let us demonstrate the performance of the ChF methods on the famous data by Ladislaus Bortkiewicz who
collected the numbers of Prussian soldiers killed by a horse kick in 10 cavalry
corps over a 20 years period [4]. The counts 0, 1, 2, 3, and 4 were observed
109, 65, 22, 3 and 1 times, with 0.6100 deaths per year and cavalry unit. The
author argues that the data are Poisson distributed which corresponds to the
measure concentrated on the point {1} (only jumps of size 1) and the mass
being the parameter of the Poisson distribution which is then estimated by the
sample mean 0.61. Figure 2 on its left panel presents the estimated Lvy measures for the cutoff values k = 1, 2, 3 when using CoF method. For the values of
k = 1, 2, the result is a measure having many atoms. This is explained by the
fact that the accuracy of the convolution approximation is not enough for this
3 essentially concentrated at {1}
data, but k = 3 already results in a measure
3 k = 0.6098. In Section 6
thus supporting the Poisson model with parameter k
we return to this example and explain why the choice of k = 3 is reasonable.We
observed that the convergence of the ChF method depends critically on the
choice of the initial measure, especially on its total mass. However, the proposed combination of CoF followed by ChF demonstrates (the right plot) that
1 which is as good as
this two-step (faster) procedure results in the estimate
3.

The rest of the paper has the following structure. Section 2 introduces
the theoretical basis of our approach a constraint optimisation technique in
the space of measures. In Section 3 we perform analytic calculations of the
gradient of the loss functionals needed for the implementation of ChF. Section 4
develops the necessary ingredients for the CoF method and proves the main
61

Nonparametric estimation of infinitely divisible distributions


analytical result of the paper, Theorem 20. In Section 5 we give some details
on the implementation of our algorithms in R-language. Section 6 contains a
broad range of simulation results illustrating performance of our algorithms.
We conclude by Section 7, where we summarise our approach and give some
practical recommendations.

0.0

0.1

0.2

0.3

0.4

0.5

0.6
0.0

0.1

0.2

0.3

0.4

0.5

^ || = 0.6099
||
~3
||1|| = 0.6109

~3
1

0.6

^ || = 0.4589
||
1
^ || = 0.5881
||
2
^
||3|| = 0.6099

1
^

2
^
3

10

10

Figure 2. The Bortkiewicz horse kick data. Left panel: comparison of CoF estimates for k = 1, 2, 3. Right panel: comparison of the estimate by CoF with k = 3 and a combination of
CoF with k = 1 followed by ChF.
2. Optimisation in the space of measures
In this section we briefly present the main ingredients of the constrained
optimisation of functionals of a measure. Further details can be found in [18]
and [19].
In this paper we are dealing with measures defined on the Borel subsets
of R. Recall that any signed measure can be represented in terms of its Jordan decomposition: = + + , where + and are orthogonal non-negative
measures. The total variation norm is then defined to be kk = + (R) + (R).
Denote by M and M+ the class of signed, respectively, non-negative measures
with a finite total variation. The set M then becomes a Banach space with sum
and multiplication by real numbers defined set-wise: (1 +2 )(B) := 1 (B)+2 (B)
and (t)(B) := t(B) for any Borel set B and any real t. The set M+ is a pointed
cone in M meaning that the zero measure is in M+ and that 1 + 2 M+ and
t M+ as long as 1 , 2 , M+ and t 0.
62

Nonparametric estimation of infinitely divisible distributions


A functional G : M 7 R is called Frchet or strongly differentiable at M
if there exists a bounded linear operator (a differential) DG()[] : M 7 R such
that
(2.1)

G( + ) G() = DG()[] + o(kk) M.

If for a given M there exists a bounded function G( ; ) : R R such that


Z
DG()[] = G(x; ) (dx) for all M,
then such G(x; ) is called the gradient function for G at . Typically, and it is
the case for the functionals of measure we consider here, the gradient function
does exist so that the differentials do have an integral form. R
For example, an integral of a bounded function G() = f (x)(dx) is already a bounded linear functional of R so that G(x; ) = f (x) for any . More
generally, for a composition G() = v( f (x)(dx)), where v is a differentiable
function, the gradient function can be obtained by the Chain rule:

Z

f (y) (dy) f (x).


(2.2)
G(x; ) = v
The functional G for this example is strongly differentiable if both functions v
and f are bounded.
The estimation methods we develop here are based on minimisation of various loss functions over the class of possible Lvy measures with a finite mass.
Specifically, we consider minimisation of a strongly differentiable functional

(2.3)

L() min

subject to M+ , H() C,

where the last constraint singles out the set of Lvy measures, i.e. the measures
satisfying (1.2). This corresponds to taking C = {0} R being a cone in R2 and
Z


(2.4)
H() = ({0}), min{1, x2 } (dx) .
Theorem 19. Suppose L : M R is strongly differentiable at a positive finite
measure satisfying (1.2) and possess a gradient function L(x; ). If such
provides a local minimum of L over M+ H 1 (C), then

L(x; ) 0 for all x R \ {0},


(2.5)

L(x; ) = 0 a.e.

Proof. First-order necessary criteria for constrained optimisation in a Banach


space can be derived in terms of tangent cones. Let A be a subset of M and
A. The tangent cone to A at is the following subset of M:
TA () = lim inf t 1 (A ).
t0

63

Nonparametric estimation of infinitely divisible distributions


Recall that the lim infn An for a family of subsets (An ) in a normed space is
the set of the limits of all converging sequences {an } such that an An for all
n. Equivalently, TA () is the closure of the set of such M for which there
exists an = () > 0 such that + t A for all 0 t .
By the definition of the tangent cone, if is a point of minimum of a strongly differentiable function G over a set A then one must have
(2.6)

DG()[] 0 for all TA ().

Indeed, assume that there exists TA () such that DG()[] := < 0. Then
there is a sequence of positive numbers tn 0 and a sequence n A such that
= limn tn1 (n ) implying n because k n k = tn (1+o(1))kk 0. Since
any bounded linear operator is continuous, we also have
DG()[] = DG()[lim tn1 (n )] = lim tn1 DG()[n ] = .
n

Furthermore, by (2.1),
thus

DG()[n ] = G(n ) G() + o(k n k) = G(n ) G() + o(tn ),


G(n ) G() = tn (1 + o(1)) < tn /2

for all sufficiently small tn . Thus in any ball of there is a n A such that
G(n ) < G() so that is not a point of a local minimum of G over A.
Next step is to find a sufficiently rich class of measures belonging to the
tangent cone to the set L := M+ H 1 (C) of all possible Lvy measures. For
this, notice that for any L, the Dirac measure x belongs to TL () since
+ tx L for any t 0 as soon as x , 0. Similarly, given any Borel B R, the
negative measure |B = ( B), which is the restriction of onto B, is also
in the tangent cone TL (), because for any 0 t 1 we have t B L.
Since G(x; ) is a gradient function, the necessary condition (2.6) becomes
Z
G(x; ) (dx) 0 for all TL ()
and substituting = x above we immediately obtain the inequality in (2.5).
Finally, taking = B yields
Z
G(x; ) (dx) 0.
B

Since this is true for any Borel B, then G(x; ) 0 -almost everywhere which,
combined with the previous inequality, gives the second identity in (2.5).

A rich class of functions of a measure represent the expectation of a functionals of a Poisson process.
64

Nonparametric estimation of infinitely divisible distributions


Let N be the space of locally finite counting measures on a Polish space
X which will be a subset of an Euclidean space in this paper. Let N be the
smallest -algebra which makes all the mappings 7 (B) Z+ for N
and compact sets B measurable. A Poisson point process with the intensity
measure is a measurable mapping from some probability space into [N , N ]
such that for any finite family of disjoint compact sets B1 , . . . , Bk , the random
variables (B1 ), . . . , (Bk ) are independent and each (Bi ) following Poisson
distribution with parameter (Bi ). We use notation PPP(). From the
definition, E(B) = (B), this is why the parameter measure of a Poisson
process is indeed the intensity measure of this point process. To emphasise
the dependence of the distribution on , we write the expectation as E in the
sequel.
Consider a measurable function G : N R and define the difference operator
Dz G() := G( + z ) G(), N .
For the iterations of the difference operator

Dz1 ,...,zn G = Dzn (Dz1 ,...,zn1 G),


and every tuple of points (z1 , . . . , zn ) X n , it can be checked that
X


Dz1 ,...,zn G() =
(1)n|J| G + jJ zj ,
J{1,2,...,n}

where |J| stands for the cardinality of J. Define


T G(z1 , . . . , zn ) := E Dz1 ,...,zn G().
Suppose that the functional G is such that there exists a constant c > 0 satisfying


|G nj=1 zj | c n
for all n 1 and all (z1 , . . . zn ). It was proved in [19, Theorem 2.1] that in the
case of finite measures , if then expectation E+ G() exists then
Z

X
1
T G(z1 , . . . , zn ) (dz1 ) . . . (dzn ).
(2.7)
E+ G() = E G() +
n! X n
n=1

Generalisations of this formula to infinite and signed measures for square integrable functionals can be found in [16]. A finite order expansion formula
can be obtained by representing the expectation above in the form E+ G() =
E E [G( + ) ] where and are independent Poisson processes with
intensity measures and , respectively, and then applying the moment expansion formula by [3, Theorem 3.1] to G( + ) viewed as a functional of
65

Nonparametric estimation of infinitely divisible distributions


with a given . This will give
Z
k
X
1
T G(z1 , . . . , zn ) (dz1 ) . . . (dzn )
(2.8) E+ G() = E G() +
n! X n
n=1
Z
1
+
T G(z1 , . . . , zk+1 ) (dz1 ) . . . (dzk+1 ).
(k + 1)! X k+1 +
3. Gradients of ChF loss function
The ChF method of estimating the compounding distribution or more
generally, the tripplet (a, , ) of the infinite divisible distribution, is based on
fitting the empirical characteristic function. The corresponding loss function
LChF is given by (1.4). It is everywhere differentiable in the usual sense with
respect to the parameters a, and in Frchet sense with respect to measure .
Aiming at the steepest descent gradient method for obtaining its minimum, we
compute in this section the gradients of LChF in terms of the following functions
Z
q1 (, x) := cos(x) 1, Q1 (, ) := q1 (, x)(dx);
Z
q2 (, x) := sin(x) x 1I{|x|<} , Q2 (, a, ) := a + q2 (, x)(dx).
Using this notation, the real and imaginary parts of an infinitely divisible distributions characteristic function = 1 + i2 can be written down as
1 (; a, , ) = e h{Q1 (,)
2 (; a, , ) = e h{Q1

2 2 /2}

(,) 2 2 /2}

cos{hQ2 (, a, )},
sin{hQ2 (, a, )}.

After noticing that n = n,1 + i n,2 , with


n

n,1 () =

1X
cos(Xj ),
n

n,2 () =

j=1

1X
sin(Xj ),
n
j=1

the loss functional LChF can be written as


Z n
Z n
o2
o2
LChF (a, , ) =
1 (; a, , ) n,1 () ()d+ 2 (; a, , ) n,2 () ()d.
From this representation, the following tree sets of formulae are obtained
in a straightforward way.
66

Nonparametric estimation of infinitely divisible distributions


(1) The partial derivative of the loss functional with respect to a is equal
to
Z

LChF (; a, , ) = 2 {1 (; a, , ) n,1 ()} 1 (; a, , )()d


a
a
Z

+ 2 {2 (; a, , ) n,2 ()} 2 (; a, , )()d,


a
where
2 2

1 (; a, , ) = he h{Q1 (,) /2} sin{hQ2 (, a, )},


a
2 2

2 (; a, , ) = he h{Q1 (,) /2} cos{hQ2 (, a, )}.


a

(2) The partial derivative of the loss functional with respect to is equal
to
Z

LChF (; a, , ) = 2 {1 (; a, , ) n,1 ()} 1 (; a, , )()d

+ 2 {2 (; a, , ) n,2 ()} 2 (; a, , )()d,

where
2 2

1 (; a, , ) = h 2 e h{Q1 (,) /2} cos{hQ2 (, a, )},

2 2

2 (; a, , ) = h 2 e h{Q1 (,) /2} sin{hQ2 (, a, )}.

(3) Expression for the gradient function corresponding to the Frchet derivative with respect to the measure is obtained using the Chain
rule (2.2):
Z
LChF (x; ) = 2 {1 (; a, , ) n,1 ()]}1 ()[x, ]()d
Z
+ 2 {2 (; a, , ) n,2 ()}2 ()[x, ]()d,
where the gradients of i () := i (; a, , ), i = 1, 2, with respect to
the measure are given by
n




o
2 2
1 ()(x; ) = he h{Q1 (,) /2} cos hQ2 (, a, ) q1 (, x) sin hQ2 (, a, ) q2 (, x) ,
n




o
2 2
2 ()(x; ) = he h{Q1 (,) /2} sin hQ2 (, a, ) q1 (, x) + cos hQ2 (, a, ) q2 (, x) .
67

Nonparametric estimation of infinitely divisible distributions


4. Description of the CoF method
As it was alluded in the Introduction, the CoF method uses a representation of the convolution as a function of the compounding measure. We now
formulate the main theoretical result of the paper on which the CoF method is
based.
Theorem 20. Let Wt be a pure jump Lvy process characterised by (1.3) and
F(y) = Fh (y) be the cumulative distribution function of Wh . Then one has
nZ
X
h
2
F (y) = F(y) +
U
F(y)(dx1 ) . . . (dxn ),
(4.1)
n! Rn x1 ,...,xn
n=1

where Ux1 F(y) = F(y x) F(x) and


(4.2)

Ux1 ,...,xn F(y) := Uxn (Ux1 ,...,xn1 F(y)) =

(1)i|J| F(y jJ xj ).

J{1,2,...,n}

The sum above is taken over all the subsets J of {1, 2, . . . , n} including the empty set.

Proof. To prove the theorem, we use a coupling of Wt with a Poisson process


on R+ R driven by the intensity measure = , where is the Lebesgue
measure on R+ . For each realisation = j zj with zj = (tj , xj ), denote by t
the restriction of onto [0, t] R. Then, the Lvy process can be represented as
Z tZ
x(ds dx).
Wt = (tj ,xj ) t xj =
0

For a fixed arbitrary y R and a point configuration = j (ti ,xj ) , consider a


functional Gy defined by
 X

Gy () = 1I
xj y
(tj ,xj )

and notice that for any z = (t, x),


(4.3)

 X

Gy ( + z ) = 1I
xj y x = Gyx ().
(tj ,xj )

Clearly, the cumulative distribution function of Wh can be expressed as an


expectation

 X
F(y) = P
xj y = E Gy ( h ).
(tj ,xj ) h

Let = [0, h] and = [h, 2h] . Then

E + Gy () = P{W2h y} = P{Wh + Wh y} = F 2 (y),


68

Nonparametric estimation of infinitely divisible distributions


where Wh = W2h Wh . Observe also that by iteration of (4.3),
X


(1)n|J| E Gy + jJ zj
T Gy (z1 , . . . , zn ) = E Dz1 ,...,zn Gy () =
J{1,2,...,n}

J{1,2,...,n}

(1)n|J| F(y jJ xj ) = Ux1 ,...,xn F(y).

It remains now to apply expansion (2.7) to complete the proof:


F 2 (y) = E + Gy ()
Z

X
1
U
F(y) (dt1 dx1 ) . . . (dtn dxn )
= F(y) +
n! (R+ R)n x1 ,...,xn
n=1
nZ
X
h
= F(y) +
U
F(y) (dx1 ) . . . (dxn ).
n! Rn x1 ,...,xn
n=1


The empirical convolution of a sample (X1 , . . . , Xn ),
1 X
(4.4)
Fn2 (y) := n
1I{Xi + Xj y}.
2 1i<jn

is an unbiased and consistent estimator of F 2 (x), see [15].


The CoF-method looks for a finite measure that minimises the following
loss function
(4.5)
Z
Z 
k
2
X
hi
(k)
2

F (y)(dx1 ) . . . (dxi ) Fn (y) (y)dy.


U
LCoF () =
F(y) +
i! Ri x1 ,...,xi n
i=1

The infinite sum in (4.1) is truncated to k terms in (4.5) for computational


reasons. The error introduced by the truncation can be accurately estimated by
bounding the remainder term in the finite expansion formula (2.8). Alternatively, turning to (4.1) and using 0 F(y) 1, we obtain that |Ux1 ,...,xi F(y)| 2i1
for all i = 1, 2, . . . , yielding

1 X
hk+1 Z
(2hkk)i


Ux1 ,...,xk+1 F(y)(dx1 ) . . . (dxk+1 )
.
(k + 1)! Rk+1
2
i!
i=k+1

Notice that the upper bound corresponds to a half the distribution tail P{Z
k + 1} of a Poisson random variable, say Z Po(2hkk). Thus, to have a good
estimate with this method, one should either calibrate the time step h (if the
data are coming from the discretisation of a Lvy process trajectory) or to use
higher k to make the remainder term small enough. For instance, for the horse
69

Nonparametric estimation of infinitely divisible distributions


kick data considered in Introduction 1, h = 1 and kk = 0.61. The resulting
error bounds for k = 1, 2, 3 are 0.172, 0.062 and 0.017, respectively, which shows
that k = 3 is rather adequate cutoff for this data. Since hkk is the mean number
of jumps in the strip [0, h] R, in practice one should aim to choose h so that
to have only a few jumps with high probability. If, on the contrary, the number
of jumps is high, their sum by the Central Limit theorem would be close to
the limiting law which, in the case of a finite variance of jumps, is Normal and
so depends on the first two moments only and not on the entire compounding
distribution. Therefore an effective estimation of /kk is impossible in this
case, see [11] for a related discussion.
As with the ChF method, the CoF algorithm relies on the steepest descent
approach. The needed gradient function has the form
(k)

LCoF (x; )
Z 
k

X
i Z
h
Ux1 ,...,xi Fn (y)(dx1 ) . . . (dxi ) Fn2 (y) (y, x, )(y)dy,
= 2h
Fn (y) +
i! Ri
i=1

where
(y, x, ) =

k1 j Z
X
h
j=0

j!

Rj

(Ux,x1 ,...,xj Fn )(y)(dx1 ) . . . (dxj ).

This formula follows from the Chain rule (2.2) and the equality
Z
k

X
hj

(U
F )(y)(dx1 ) . . . (dxj ) (x; )

j! Rj x1 ,...,xj n
j=1

=h

k1 j Z
X
h
j=0

j!

Rj

(Ux,x1 ,...,xj Fn )(y)(dx1 ) . . . (dxj ).

To justify the last identity, it suffices to see that for any integrable symmetric
function u(x1 , . . . , xj ) of j 1 variables,
Z


u(x1 , . . . , xj )(dx1 ) . . . (dxj ) (x; )
Rj
Z
u(x, x1 , . . . , xj1 )(dx1 ) . . . (dxj1 ),
=j
Rj1

70

Nonparametric estimation of infinitely divisible distributions


which holds due to
Z
Z
u(x1 , . . . , xj )( + )(dx1 ) . . . ( + )(dxj )
Rj

j Z
X
k=1

=j

Rj

Rj

Rj

u(x1 , . . . , xj )(dx1 ) . . . (dxj )

u(x1 , . . . , xj )(dx1 ) . . . (dxk1 )(dxk )(dxk+1 ) . . . (dxj ) + o(kk)

u(x, x1 , . . . , xj1 )(dx)(dx1 ) . . . (dxj1 ) + o(kk).

5. Algorithmic implementation of the steepest descent method


In this section we describe the algorithm implementing the gradient descent method, which was used to obtain our simulation results presented in
Section 6.
Recall that the principal optimisation problem has the form (2.3), where
the functional L() is minimised over M+ subject to the constraints on
being a Lvy measure. For computational purposes the measure M+ is
replaced by its discrete approximation which has a form of a linear combination
= li=1 i xi of Dirac measures on a finite regular grid x1 , . . . , xl R, xi+1 =
xi + 2. Specifically, for a given measure , the atoms of are given by
1 := ((, x1 + )),
(5.1)

i := ([xi , xi + )),
l := ([xl , )).

for i = 2, . . . , l 1,

Clearly, the larger is l and the finer is the grid {x1 , . . . , xl } the better is approximation, however, at a higher computational cost.
Respectively, the discretised version of the gradient function L(x; ) is the
vector
g = (g1 , . . . , gl ), gi := L(xi ; ), i = 1, . . . , l.
(1)

For example, the cost function L = LCoF with (y) 1 has the gradient
Z 
Z

(1)
2

LCoF (x; ) = 2h
Fn (y) Fn (y) + Fn (y z)(dz) Fn (y x) dy.

The discretised gradient for this example is the vector g with the components
Z 
l

X
2

Fn (y xj )j Fn (y xi ) dy, i = 1, . . . , l.
(5.2)
gi = 2h
Fn (y) Fn (y) +
j=1

Our main optimisation algorithm has the following structure: In the master algorithm description above, the line 3 uses the necessary condition (2.5)
as a test condition for the main cycle. In the computer realisations we usually
71

Nonparametric estimation of infinitely divisible distributions


Steepest descent algorithm
Input: initial vector
1: function GoSteep()
2:
initialise
 the discretised gradient g (L(x1 ; ), . . . , L(xl ; ))
3:
while mini gi < 2 or max{i : i >1 } gi > 2 do
4:
choose a favourable step size depending on L and
5:
compute new vector MakeStep(, , g)
6:
compute gradient at the new : g (L(x1 ; ), . . . , L(xl ; ))
7:
end while
8:
return
9: end function
want to discard the atoms of a negligible size: for this purpose we use a zerovalue threshold parameter 1 . We use another threshold parameter 2 to decide
when the coordinates of the gradient vector are sufficiently small. For the examples considered in the next section, we typically used the following values:
1, 1 = 102 and 2 = 106 . The key MakeStep subroutine, mentioned on
line 5, is described below. It calculates the admissible steepest direction of
size k k and returns an updated vector + .
Algorithm for a steepest descent move
Input: maximal step size , current variable value and current gradient
value g
1: function MakeStep(, , g)
2:
initialise the optimal step 0
3:
initialise the running coordinate i 0
4:
initialise the total mass available E
5:
while ((E > 0) and (i l)) do
6:
if gi > |gl | then
7:
i max(i , E)
8:
E E i
9:
else
10:
l E
11:
E0
12:
end if
13:
i i +1
14:
end while
15:
return +
16: end function
72

Nonparametric estimation of infinitely divisible distributions

The
Pl MakeStep subroutine looks for a vector which minimises the linear
form i=1 gi i appearing in the Taylor expansion

L( + ) L() =

l
X

gi i + o(||).

i=1

This minimisation is subject to the following linear constraints


l
X
i=1

|i | ,

i i ,

i = 1, . . . , l.

The just described linear programming task has a straightforward solution


given below.
For simplicity we assume that g1 . . . gl . Note that this ordering can
always be achieved by a permutation of the components of the vector g and
respectively, . Assume also that the total mass of is bigger than the stepsize
. Define two indices
i1
X
ig = max{i : gi |gl |}, and i = max{i :
j < }, > 0.
j=1

If i ig , then the coordinates of are given by

i
for i i ,

Pi 1

i :=
j=1 j for i = i + 1,

0
for i i + 2,
and if i > ig , then

i ,

0
i :=

P ig

j=1
j

for i ig ,
for ig < i < l,
for i = l.

The presented algorithm is realised in the statistical computation environment R (see [21]) in the form of a library mesop which is freely downloadable
from one of the authors webpage.1

1http://www.math.chalmers.se/~sergei/download.html

73

Nonparametric estimation of infinitely divisible distributions


6. Simulation results for a discrete Lvy measure
To illustrate the performance of our estimation methods we generated samples of size n = 1000 for compound Poisson processes driven by different kinds
of Lvy measure . For all examples in this section, we implement three versions of the CoF with h = 1, k = 1, 2, 3 and 1. We also apply ChF using
the estimate of CoF with k = 1. Observe that CoF with k = 1 can be made particularly fast because here we have a non-negative least squares optimisation
problem.
Poisson compounding distribution. Here we consider the simplest possible
Lvy measure (dx) = 1 (dx) which corresponds to a standard Poisson process
with parameter 1. Since all the jumps are integer valued and non-negative,
it is logical take the non-negative integer grid for possible atom positions of
the discretised . This is the way we have done it for the horse kick data
analysis. However, to test the robustness of our methods, we took the grid
{0, 1/4, 2/4, 3/4, . . .}. As a result the estimated measures might place some
mass on non-integer points or even on negative values of x to compensate for
inaccurately fitted positive jumps. We have chosen to show on the graphs the
discrepancies between the estimated and the true measure. An important indicator of the effectiveness of an estimation is the closeness of the total masses
and kk. For = 1 , the probability to have more than 3 jumps is approxkk
imately 0.02, therefore we expect that k = 3 would give an adequate estimate
for this data. Indeed, the left panel of Figure 3 demonstrates that the CoF with
k = 3 is quite effective in detecting the jumps of the Poisson process compared
to k = 2 and especially to k = 1 which generate large discrepancies both in atom
sizes and in the total mass of the obtained measure. Observe also the presence
of artefact small atoms at large x and even at some non-integer locations.
The right panel shows that a good alternative to a rather computationally demanding CoF method with k = 3, is a much faster combined CoFChF
1 measure is used as the initial measure in the ChF algorithm.
method when
1 is almost idetical to
3 , but also has the total mass
The resulting measure
closer to the target value 1. The total variation distances between the estimated
measure and the theoretical one are 0.435, 0.084 and 0.053 for k = 1, 2, 3, respectively. For the combined method it is 0.043 which is the best approximation in the total variation to the original measure.
Compounding distribution with positive and negative jumps. Figure 4 presents
the results on a compound Poisson process with jumps of sizes 1, 1, 2 having
respective probabilities 0.2, 0.2 and 0.6, so that = 0.21 + 0.21 + 0.62 . The
overall intensity of the jumps is again kk = 1. The presence of negative jumps
canceling positive jumps creates an additional difficulty for the estimation task.
This phenomenon explains why the approximation obtained with k = 2 is worse
74

Nonparametric estimation of infinitely divisible distributions

~3
1

0.08

|||| = 1
^ || = 0.6153
||
1
^ || = 0.8605
||
2
^ || = 0.927
||
3

|||| = 1
^ || = 0.927
||
~3
||1|| = 0.9514

0.2

0.02

0.00

0.0

0.02

0.2

0.04

0.4

0.06

0.6

^
^

1
^
^

2
^
^
3

Figure 3. Simulation results for a Poisson Po(1) compounding


distribution corresponding to concentrated at point 1, with
total mass 1. Left panel: the differences between ({x}) and
k ({x}) obtained by CoF with k = 1, 2, 3. Zero
their estimates
values of the differences are not plotted. Right panel: compar 3 with
1 obtained by ChF initiated at
1 . Notice the
ison of
drastic change in the vertical axis scale as we go from the left
to the right panel.

than with k = 1 and k = 3: two jumps of sizes +1 and -1 sometimes cancel each
other, which is indistinguishable from no jumps case. Moreover, -1 and 2 added
together is the same as having a single size 1 jump. The left panel confirms that
going from k = 1 through k = 2 up to k = 3 improves the performance of CoF
although the computing time increases drastically. The corresponding total
k to the theoretical distribution are 0.3669, 0.6268 and
variation distances of
0.1558. The combined method gives the distance 0.0975 and according to the
right plot is again a clear winner in this case too. It is also much faster.
Unbounded compounding distribution. On Figure 5 we present the simulation results for a discrete measure having an infinite support N. For the
computation, we limit the support range for the measures in question to the interval x [2, 5]. As the left panel reveals, also in this case the CoF method with
k = 3 gives a better approximation than k = 1 or k = 2 (the total variation distances to the theoretical distribution is 0.1150 compared to 0.3256 and 0.9235,
respectively) and the combined faster method gives even better estimate with
1 , ) = 0.0386. Interestingly, k = 2 was the worst in terms of the total
dTV (
75

Nonparametric estimation of infinitely divisible distributions

|||| = 1
^ || = 0.6122
||
1
^ || = 0.9019
||
2
^ || = 0.9041
||

~3
1

0.10

0.4

^
^

1
^
^

2
^
^
3

|||| = 1
^ || = 0.9041
||
~3
||1|| = 1.038

0.4

0.2

0.00

0.0

0.05

0.2

Figure 4. Simulation results for = 0.21 + 0.21 + 0.62 .


Left panel: the differences between ({x}) and their estimates
k ({x}) obtained by CoF with k = 1, 2, 3. Right panel: compari
3 with
1
son of

|||| = 1
^ || = 0.6406
||
1
^ || = 1.2047
||
2
^
|| || = 0.9539

~3
1

0.10

0.4

^
^

1
^
^

2
^
^
3

|||| = 1
^ || = 0.9539
||
~3
||1|| = 1.0279

0.05

0.4

0.00

0.2

0.0

0.05

0.2

Figure 5. Simulation results for a shifted Poisson distribution


({x}) = e 1 /(x 1)! for x = 1, 2, . . .. Left panel: the differences
k ({x}) obtained by CoF
between ({x}) and their estimates
3 with
1 obwith k = 1, 2, 3. Right panel: comparison of
1.
tained by ChF initiated at
76

Nonparametric estimation of infinitely divisible distributions


variation distance. We suspect that the pairing effect may be responsible: the
jumps are better fitted with a single integer valued variable rather than the sum
of two. The algorithm may also got stuck in a local minimum producing small
atoms at non-integer positions.
Finally, we present simulation results for two cases of continuously distributed jumps. The continuous measures are replaced by their discretised versions given by (5.1). In the examples below the grid size is = 0.25.
Continuous non-negative compounding distribution. Figure 6 summarises
our simulation results for the compound Poisson distribution with the jump
sizes following the exponential Exp(1) distribution. The left plot shows that
also in this case the accuracy of approximation increases with k. Observe that
3 , ) = 0.0985 is comparable with the discretithe total variation distance dTV (
3 is
sation error: dTV (, ) = 0.075. A Gaussian kernel smoothed version of
presented at the right plot of Figure 6. The visible discrepancy for small values of x is explained by the fact that there were no sufficient number of really
small jumps in the simulated sample to give the algorithm sufficient grounds
to put more mass around 0. Interestingly, the combined algorithm produced a
3 ) value of the LChF , but a larger total
measure with a smaller (compared to
variation distance from . Optimisation in the space of measures usually tends
to produce atomic measures since these are boundary points of the typical con 1 has smaller number of atoms than does and
straint sets in M. Indeed,
still it better approximates the empirical characteristic function of the sample.
It shows that the case of optimisation in the class of absolutely continuous measures should be analysed differently by characterising their tangent cones and
deriving the corresponding steepest descent methods. Additional conditions
on the density must also be imposed, like Lipschitz kind of conditions, to make
the feasible set closed in the corresponding measure topology.
Gaussian compounding distribution. Figure 7 takes up the important example of compound Poisson processes with Gaussian jumps. Once again, the
k improve as k increases, and the combined method gives an estiestimates
3.
mate similar to
7. Discussion
In this paper we proposed and analysed new algorithms based on the characteristic function fitting (ChF) and convoluted cumulative distribution function fitting (CoF) for non-parametric inference of the compounding measure of
a pure-jump Lvy process. The algorithms are based on the recently developed
variational analysis of functionals of measures and the corresponding steepest descent methods for constraint optimisation on the cone of measures. CoF
methods are capable of producing very accurate estimates, but at the expense
77

0.30

0.0

0.00

0.05

0.2

0.10

0.15

0.4

0.20

0.25

0.8

|||| = 1
^ || = 0.6192
||
1
^ || = 1.0362
||
2
^ || = 0.9018
||
~3
||1|| = 0.9047

1
^

2
^

~3
1

0.6

0.35

Nonparametric estimation of infinitely divisible distributions

0.0

0.0

0.2

0.1

0.4

0.2

0.6

|||| = 1
^ || = 0.5847
||
1
^ || = 0.8805
||
2
^ || = 0.8439
||
~3
||1|| = 0.8751

1
^

2
^

~3
1

0.3

0.8

Figure 6. Simulation results for a compound Poisson process


with jump intensity 1 and jump sizes having an exponential
distribution with parameter 1. Left plot: obtained measures
for various algorithms, the right plot: the theoretical exponen 1 measure.
tial density and the smoothed version of

Figure 7. Left plot: Estimated compounding measure for a


simulated sample with jump sizes having the standard Normal distribution with mean 0.5 and variance 0.25. Right plot:
the theoretical Gaussian density and the smoothed version of
3 measure.

78

Nonparametric estimation of infinitely divisible distributions


of growing computational complexity. The ChF method critically depends on
the initial approximation measure due to highly irregular behaviour of the objective function. We have shown that the problems of convergence of the ChF
algorithms can often be effectively overcome by choosing the sample measure
(discretised to the grid) as the initial approximation measure. However, a better alternative, as we demonstrated in the paper, is to use the measure obtained
by the simplest (k = 1) CoF algorithm. This combined CoFChF algorithm is
fast and in majority of cases produces a measure which is closest in the total variation to the measure under estimation and thus this is our method of
choice.
The practical experience we gained during various tests allows us to conclude that the suggested methods are especially well suited for estimation of
discrete jump size distributions. They work well even with jumps that take
both positive and negative values not necessarily belonging to a regular lattice,
demonstrating a clear advantage over the existing methods, see [5], [6]. Use
of our algorithms for continuous compounding distributions require more trial
and error in choosing the right discretisation grid and smoothing procedures
to produce good results which should be then compared or complemented to
the direct methods of the density estimation like in [12], [25].
Bibliography
[1] Asmussen, S. (2008) Applied Probability and Queues, Stochastic Modelling and Applied Probability, Springer, New York.
[2] Asmussen, S. and Rosisky J. (2001) Approximations of small jumps of lvy process with a
vew towards simulation. J. Appl. Prob., 38, 482493.
[3] Blaszczyszyn, B. Merzbach, E. and Schmidt, V. (1997) A note on expansion for functionals
of spatial marked point processes. Statist. Probab. Lett., 36(3), 299306.
[4] Bortkiewicz. L. (1898) Das gesetz der kleinen zahlen. Monatshefte fr Mathematik und Physik,
9(1), A39A41.
[5] Buchmann, B and Grbel, R. (2003) Decompounding: an estimation problem for poisson
random sums. Ann. Statist., 31(4), 10541074.
[6] Buchmann, B. and Grbel, R. (2004) Decompounding poisson random sums: Recursively
truncated estimates in the discrete case. Annals of the Institute of Statistical Mathematics, 56(4),
743756.
[7] Comte, F. Duval, C. and Genon-Catalot V. (2014) Nonparametric density estimation in compound poisson processes using convolution power estimators. Metrika, 77(1), 163183.
[8] Cont, R. and Tankov, P. (2003) Financial Modelling with Jump Processes. Chapman &
Hall/CRC.
[9] Carrasco, M., Florens, J.P. (2000) Generalization of gmm to a continuum of moment conditions. Economic Theory 16 797834.
[10] Duval, C. (2013) Density estimation for compound poisson processes from discrete data. Stochastic Processes and their Applications, 123(11), 39633986.
[11] Duval, C. (2014) When is it no longer possible to estimate a compound poisson process?
Electron. J. Statist., 8(1), 274301.

79

Nonparametric estimation of infinitely divisible distributions


[12] van Es, B. Gugushvili, S. and Spreij, P. (2007) A kernel type nonparametric density estimator
for decompounding. Bernoulli, 13(3), 672694.
[13] Feuerverger, A. and McDunnough, P. (1981) On the efficiency of empirical characteristic
function procedures. J. Royal. Stat. Soc., Ser. B, 43, 2027.
[14] Feuerverger, A. and McDunnough, P. (1981) On some Fourier methods for inference. J.
American Stat. Assoc., 76, 379387.
[15] Frees, E.W. (1986) Nonparametric renewal function estimation. Ann. Statist., 14(4), 1366
1378.
[16] Last, G. (2014) Perturbation analysis of Poisson processes. Bernoulli, 20(2), 486513.
[17] Mikosch, T. (2009) Non-Life Insurance Mathematics: An Introduction with the Poisson Process.
Universitext. Springer.
[18] Molchanov, I. and Zuyev, S. (2000) Tangent sets in the space of measures: With applications
to variational analysis. Journal of Mathematical Analysis and Applications, 249(2), 539 552.
[19] Molchanov, I. and Zuyev, S. (2000) Variational analysis of functionals of Poisson processes.
Mathematics of Operations Research, 25(3), 485508.
[20] Molchanov, I. and Zuyev, S. (2002) Steepest descent algorithms in a space of measures. Statistics and Computing, 12, 115 123.
[21] R Core Team (2015) R: A Language and Environment for Statistical Computing. R Foundation
for Statistical Computing, Vienna, Austria. n
[22] Sato, K. (1999) Lvy Processes and Infinitely Divisible Distributions (Cambridge Studies in Advanced Mathematics). Cambridge University Press, 1st edition.
[23] Sueishi, N. and Nishiyama, Y. (2005) Estimation of Lvy processes in mathematical finance: A
comparative study. In A. Zerger and R.M. Argent, editors, MODSIM 2005 International Congress on Modelling and Simulation., pages 953959.
[24] Quin, J. and Lawless, J (1994) Empirical likelihood and general estimating equations. Ann.
Statist., 22, 300325.
[25] Watteel, R.N. and Kulperger, R.J. (2003) Nonpatrametric estimation of the canonical measure for infinitely divisible distributions. J. Statist. Comp. Simul., 73(7), 525542.

80

PAPER IV

Asymptotic results for the number of Wagners solutions to a


generalised birthday problem
Alexey Lindo and Serik Sagitov

Statistics and Probability Letters


107, 356361, 2015.

PAPER IV

Asymptotic results for the number of Wagners


solutions to a generalised birthday problem
Alexey Lindo and Serik Sagitov

Abstract. We study two functionals of a random matrix A with independent


elements uniformly distributed over the cyclic group of integers {0, 1, . . . , M
1} modulo M. One of them, V0 (A) with mean , gives the total number of
solutions for a generalised birthday problem, and the other, W (A) with mean
, gives the number of solutions detected by Wagners tree based algorithm.
We establish two limit theorems. Theorem 2.1 describes an asymptotical
behaviour of the ratio / as M . Theorem 2.2 gives bounds for the total
variation distance between Poisson distribution and distributions of V0 and
W.

1. Introduction
Let (N , M, L) be three natural numbers larger than or equal to 2. Assume
that we have a random matrix
(1.1)

A = (aij ), 1 i L, 1 j N

with independent elements aij which are uniformly distributed on {0, 1, . . . , M


1}. Let J = {1, . . . , L}N be the set of matrix positions, so that |J| = LN . For each
b {0, 1, . . . , M 1}, define Vb Vb (A) as the number of vectors i = (i1 , . . . , iN ) J
with
M

ai1 ,1 + . . . + aiN ,N = b,
M

where the sign = means equality modulo M. Clearly,


the assumption of uniform distribution,
(1.2)

PM1
b=0

:= E(V0 ) = LN M 1 .

The problem of finding all V0 zero-sum vectors


(1.3)

ai = (ai1 ,1 , . . . , aiN ,N ),
83

i = (i1 , . . . , iN ) J

Vb = LN , so that by

Asymptotic results for the number of Wagners solutions


for a given matrix A, can be viewed as a generalised birthday problem. It arises
naturally in a variety of situations including cryptography, see [7] and references therein; ring linear codes [3]; abstract algebra, where in the theory of
modules it is related to the notion of annihilators, see e.g. [4]. This problem
can be solved only by exhaustive search and is N P-hard [6]. Wagner [7] proposed a subexponential algorithm giving hope to quickly detect at least some
of the solutions to these kinds of problems.
Assume that N = 2n , n 1 and M = 2m + 1, m n. It will be convenient to
use the symmetric form
Dm := {2m1 , . . . , 1, 0, 1, . . . , 2m1 }

of {0, 1, . . . , M 1} as the set of possible values for aij . Wagners algorithm has a
binary tree structure, see Figure 1, starting from N leaves at level n and moving
toward the top of the tree at level 0. For a given a vector x = (x1 , . . . , x2n ) with
xj Dm the algorithm searches for the value
(n)

(1.4)

Hn (x) := x1 Dmn {},

obtained recursively in a way explained next (the special state indicates that
(0)
the algorithm is terminated and a solution is not found). Put xj xj . For
x(n)
1
x(n-1)
1

x(n-1)
2

...
x(1)

x(1)
1

x1

...

x 2 x3

x4

x(1)n-1

x(1)
n-1
2

-1

x 2n-3

x2n-2 x2n-1

x 2n

Figure 1. Wagners algorithm


(h)

h = 1, . . . , n and j = 1, . . . , 2nh , let xj = b if there exists such a b Dmh that


(h1)

(h1) M

x2j1 + x2j

= b,
(h1)

(h)

and put xj = otherwise. In particular, if xk


(h)

indices k {2j 1, 2j}, then xj = .


84

= for at least one of the two

Asymptotic results for the number of Wagners solutions


A vector x will be called a Wagners solution to the generalised birthday
problem, if Hn (x) = 0. The total number W W (A) of Wagners solutions
among the vectors (1.3) has mean
:= E(W ) = LN pn,m ,
where
pn,m := P(Hn (ai ) = 0),

i J.

The proportion of Wagners solutions can be characterised by the ratio of the


means
Rn,m := / = (2m + 1)pn,m ,

(1.5)

where , defined by (1.2), is the mean total number of solutions. Clearly, Rn,m is
the conditional probability of a given zero-sum random vector to be Wagners
solution.
There is a growing number of papers studying the properties of various
tree based algorithms with some of them, in particular [5], suggesting further
developments of Wagners approach. The main results of this paper are stated
in the next section. Theorem 2.1 gives an integral recursion for calculating the
limit for the key ratio (1.5). Theorem 2.2 gives an upper bound for the total
variation distance between Poisson distribution Po() and L(V0 ), distribution
of V0 , as well as a bound for the total variation distance between Po() and
L(W ). Recall that the total variation distance between the distributions of Z+ valued random variables X and Y , where Z+ = {0, 1, 2, . . .}, is given by
dTV (L(X), L(Y )) = sup |P(X A) P(Y A)|.
AZ+

Among related results concerning speed of convergence for functional of random matrices over finite algebraical structures we can only name a recent paper
[2].
2. Main results
Define a sequence of polynomials {n (x)}n1 by
Z x
Z 2n
(2.1)
n (x) :=
n1 (u)n1 (x u)du + 2
n1 (u)n1 (u x)du,
0

with 1 (x) 1.

Theorem 2.1. For any fixed natural number n,


Rn,m n (0),

m ,

where the limit is obtained from the integral recursion (2.1).


85

Asymptotic results for the number of Wagners solutions


To illustrate Theorem 2.1, take N = 16, L = 1000, and M = 1045 . Then
the expected number of zero-sum vectors is = 1000. In practice, finding all
zero-sum vectors out of LN = 1048 candidates is a time consuming task. In this
example we have n = 4 and m is approximately 150. Judging from Figure 2
illustrating the typical values for the proportion factor Rn,m using numerical
computations based on the recursions for (3.2) presented in the next section,
out of a thousand solutions the Wagner algorithm will catch no more than one.

0.50000
n=2

0.04818
n=3

n=4

0.00023
3

11

13

15

Figure 2. The ratio of the means (1.5) for n = 2, 3, 4 are plotted


as functions of m. The limits predicted by Theorem 2.1 are
indicated by horizontal dotted lines.

Theorem 2.2. For a random matrix (1.1) consider the number V0 of vectors
M

(1.3) such that ai1 ,1 + . . . + aiN ,N = 0. Then


dTV (L(V0 ), P o()) 2(1 e )M 1 ,
where = LN M 1 . Furthermore, if N = 2n and M = 2m + 1, m > n, then with
= LN pn,m
dTV (L(W ), P o()) 4(1 e )N L1 .
86

Asymptotic results for the number of Wagners solutions


According to Theorem 2.2, Poisson approximation for L(V0 ) works well
when M 1. For L(W ), a sufficient condition for the corresponding upper
bound to be small is N LN 1 M.
3. Key recursion
Consider a backward recursion
(3.1)

vi (j) =

j
X
k=0

vi+1 (k)vi+1 (j k) + 2

2i
X

k=j+1

vi+1 (k)vi+1 (k j)

involving a system of vectors (vi (0), . . . , vi (2i1 )) for i 1. In particular, we have


2
(0) + 2
vi (0) = vi+1

2i
X

2
(k).
vi+1

k=1

(m)

For 1 i m 1, denote by vi (j) the unique solution of (3.1) determined by


the following frontier condition
vm1 (0) = = vm1 (2m2 ) = (1 + 2m )1 .
(m)

By the forthcoming Corollary 21, we can write pn,m = vmn (0) so that
(m)

Rn,m = (1 + 2m )vmn (0),

(3.2)

n = 1, . . . , m 1.

Lemma 3.1. Let 1 n m 1 and Hn (x) be defined by (1.4). Assuming that x


is a random vector with independent component uniformly distributed over Dm , put
M

pi,m (j) := P(Hi (x) = j).


Then
p1,m (2m2 ) = = p1,m (2m2 ) = (2m + 1)1 ,

and for 2 i m 1 and 0 j 2mi1 , we have pi,m (j) = pi,m (j) with pi,m (j)
satisfying the recursion
pi,m (j) =

j
X
k=0

pi1,m (k)pi1,m (j k) + 2

mi
2X

k=j+1

pi1,m (k)pi1,m (k j).

Proof. There are exactly M = 2m +1 different ordered pairs of numbers from the
set Dm that add modulo M up to a given j Dm1 . These pairs have the form:
for j = 0,
(2m1 + k, 2m1 k), k = 0, . . . , 2m ,
87

Asymptotic results for the number of Wagners solutions


for j = 1, . . . , 2m2 ,
(2m1 + k, 2m1 + j k 1), k = 0, . . . , j 1,
(2m1 + k, 2m1 + j k), k = j, . . . , 2m ,

and for j = 2m2 , . . . , 1,

(2m1 k, 2m1 + j + k + 1), k = 0, . . . , |j| 1,


(2m1 k, 2m1 + j + k), k = |j|, . . . , 2m .

Since these pairs appear with equal probability M 2 , the first claim follows.
On the other hand, for a given j Dmi with i 2, there are only M |j|
different ordered pairs of numbers from the set Dmi+1 that add modulo M up
to j. These pairs have the form:
(2mi + k, 2mi + j k),

(2mi k, 2mi + j + k),

This yields for j = 1, . . . , 2mi1 ,


pi,m (j) =

mi+1
2X

k=j

k = j, . . . , 2mi+1 ,

j = 0, . . . , 2mi1 ,

k = |j|, . . . , 2mi+1 ,

j = 2mi1 , . . . , 1.

pi1,m (2mi + k)pi1,m (2mi k + j),

2mi+1

pi,m (j) =

pi1,m (2mi k)pi1,m (2mi + k j).

k=j

The stated symmetry property pi,m (j) = pi,m (j) now follows recursively from
the assumption of uniform distribution. To finish the proof of the lemma, it
remains to observe that after replacing k 2mi by l in the last relation for
pi,m (j) we get
pi,m (j) =

mi
2X

l=j2mi

pi1,m (l)pi1,m (j l),

which in turn equals to


j
X
l=0

pi1,m (l)pi1,m (j l) +
=

mi
2X

l=j+1
j
X
k=0

pi1,m (l)pi1,m (l j) +

1
X

l=j2mi

pi1,m (k)pi1,m (j k) + 2

mi
2X

k=j+1

pi1,m (l)pi1,m (j l)

pi1,m (k)pi1,m (k j).




88

Asymptotic results for the number of Wagners solutions


Corollary 21. Comparison of the key recursion in Lemma 3.1 with the recursion (3.1) yields
(m)

pmi,m (j) = vi (j).


4. Proof of Theorem 2.1
Recall (3.2) and put
(m)

n,m (x) := n (x2m ).

Rn,m (j) = 2m vmn (j),

We prove Theorem 2.1 by verifying a more general convergence result


(4.1)

n,m :=

max

0j2mn1

|Rn,m (j) n,m (j)| 0,

m .

To this end we use induction over n. The base case n = 1 is trivial. To prove the
inductive step observe first that by (3.1)
(4.2) Rn,m (j) = 2

j
X
k=0

1m

Rn1,m (k)Rn1,m (j k) + 2

mn
2X

k=j+1

Rn1,m (k)Rn1,m (k j).

It is easy to see recursively that the constant


Cn := sup

max

m>n 0j2mn1

Rn,m (j)

is finite.
On the other hand, by (2.1),
Z
Z j
1m
m
n1,m (u)n1,m (j u)du +2
n,m (j) = 2
0

2mn

n1,m (u)n1,m (u j)du,

so that
n,m (j) = 2

j
X
k=0

(4.3)

1m

+2

n1,m (k)n1,m (j k)
mn
2X

k=j+1

n1,m (k)n1,m (k j) + n,m (j),

with accordingly defined remainder term n,m (j). Uniform continuity of n (x)
yields uniform convergence n,m (j) 0 as m , and (4.1) follows from (4.2)
and (4.3), since
h
i
n,m 2 Cn1 + maxn n (x) n1,m + maxmn |n,m (j)|.
0x2

0j2

89

Asymptotic results for the number of Wagners solutions


5. Proof of Theorem 2.2
Observe that for a Z+ -valued random variable Z and > 0, we have
X
k e

2dTV (L(Z), P o()) =
.
P(Z = k)
k!
kZ+

The following result is a straightforward corollary of Theorem 1 from [1] and


is a key tool for our proof here.
P
Lemma 5.1. Let Z = iJ i be a sum of possibly dependent indicator random
variables with E(Z) = . Suppose there is a family of subsets J i J such that for any
i J and k < J i , indicators i and k are independent. Then
X
k X X     X X



P(Z = k) e
E

+
E

i
k
i k .
k!
4(1 e )
iJ kJ i

kZ+

iJ kJ i \{i}

We start the proof of Theorem 2.2 by observing that V0 =


the indicator random variables
i = 1

{ai1 ,1 +...+aiN ,N =0}

iJ

i , where

i = (i1 , . . . , iN )

are identically distributed with E(i ) = M 1 , and mutually independent. Independence is due to the defining property of the matrix A. Indeed, if k , i and
(without loss of generality) 1, . . . , j are the coordinates where these two vectors
differ, then
M

P(ak1 ,1 + . . . + akN ,N = ai1 ,1 + . . . + aiN ,N = 0)


M

= P(ak1 ,1 + . . . + akj ,j = ai1 ,1 + . . . + aij ,j = aij+1 ,j+1 . . . aiN ,N )


X
M
M
M
=
P(ak1 ,1 + . . . + akj ,j = b; ai1 ,1 + . . . + aij ,j = b; aij+1 ,j+1 + . . . + aiN ,N = b)
bDm

= M 1

bDm

P(ai1 ,1 + . . . + aij ,j = b; aij+1 ,j+1 + . . . + aiN ,N = b) = M 2 .

Therefore, we can apply Lemma 5.1 with J i = {i}, and the first statement of
Theorem 2.2, concerning L(V0 ), follows from E(V0 ) = and
XX    
E i E k = LN M 2 = M 1 .
iJ kBi

To proof the second statement of Theorem 2.2, concerning L(W ), we define


J i as the set of k L such that vectors i and k share at least one component.
Observe that
|J i | = LN (L 1)N .
90

Asymptotic results for the number of Wagners solutions


By definition of W ,
W=

i ,

iJ

i = 1{Hn (ai )=0} ,

 
so that E i = pn,m and therefore,
XX    


2
E i E k = LN LN (L 1)N pn,m
N L1 2 .
iJ kJ i

Since a Wagners solution is necessarily is a zero-sum vector, we have for i , k,




M
E i k = P(Hn (ai ) = 0; Hn (ak ) = 0) P(ak1 ,1 + . . . + akN ,N = 0; Hn (ai ) = 0).

Let l1 , . . . , lj are the coordinates where the vectors i, k differ. Then it follows that

 X
M
M
E i k
P(akl ,l1 + . . . + akl ,lj = b; ail ,l1 + . . . + ail ,lj = b; Hn (ai ) = 0)
1

= M 1

bDm

and we get
X X

iJ kJ i \{i}

P(ail

bDm

,l
1 1

+ . . . + a il

,lj
j

= b; Hn (ai ) = 0) = M 1 pn,m ,





E i k LN LN (L 1)N pn,m M 1 N L1 .

The proof is finished by applying once again Lemma 5.1.

Acknowledgements. The first author is grateful to Vladimir Vatutin and Andrey Zubkov for formulating an initial problem setting that eventually lead to
this research project.
Bibliography
[1] Arratia, R., Goldstein, L. and Gordon, L. (1989) Two moments suffice for Poisson approximations: The Chen-Stein method. Ann. Probab., 17(1), 925.
[2] Fulman, J, and Goldstein, L. (2015) Steins method and the rank distribution of random matrices over finite fields. Ann. Probab., 43(3), 12741314.
[3] Greferath, M. (2009) An introduction to ring-linear coding theory. In M. Sala, S. Sakata,
T. Mora, C. Traverso, and L. Perret, editors, Grbner Bases, Coding, and Cryptography, Springer,
Berlin Heidelberg, 219238.
[4] Lang, S. (2002) Algebra. Graduate Texts in Mathematics. Springer New York.
[5] Minder, L. and Sinclair, A. (2012) The extended k-tree algorithm. J. Cryptology, 25(2), 349
382.
[6] Schroeppel, R. and Shamir, A. (1981) A T = O(2n/2 ), S = O(2n/4 ) algorithm for certain NPcomplete problems. SIAM J. Comput., 10(3), 456464.
[7] Wagner, D. (2002) A generalized birthday problem. In In CRYPTO, Springer-Verlag, 288303.

91

You might also like