You are on page 1of 33

1 Introduction to Non-Life Insurance Mathematics

Lit.: Straub, ch. 1-3, EKM, 1.1-1.2


1.1 Basic models and concepts
initial reserves of company: u
reserves at time t: R
t
accumulated premium up to t: P
t
Ass: P
t
= t, annual premium
claims at times T
1
, T
2
, . . . with claim amounts X
1
, X
2
, . . . Y
1
= T
1
, Y
j
= T
j
T
j1
,
j = 2, 3, . . . interoccurence times between claims
risk process: R
t
= u +t S(t)
S(t) =

N(t)
j=1
X
j
total claim amount up to time t
N(t) = maxk; T
k
t no. of claims up to time t
Renewal model: a) Y
1
, Y
2
, . . . , X
1
, X
2
, . . . independent positive random variables
b) X
1
, X
2
, . . . i.i.d. with L(X
j
) = F, nite mean cX
j
= and
2
= varX
j

c) Y
1
, Y
2
, . . . i.i.d. with nite mean c
[
=

Special cases: Cramer-Lundberg (CL) model if L(


[
) = c()
Erlangs model if L(
[
) = c() and L(A
[
) = c
_

_
In the CL-model, T
1
, T
2
, . . . is homogeneous Poisson process with intensity and
L(N(t)) = T( t).
Lemma 1.1 If Y
1
, Y
2
, . . . i.i.d. Exp(), then T
n
=

n
j=1
Y
j
is Gamma-distributed (L(T
\
) =

\
()), i.e. its density is
p
n
(x) =

n
(n 1)!
x
n1
e
x
, x 0.
1.2 Moment-generating functions and the total claim process
Z real random variable. If they exist:
c(Z
\
) n-th moment of Z, n 1
c(Z cZ)
\
n-th central moment of Z, n 2
Denition: a) moment-generating function of Z :
Z
() = c(|
?
)
b) log-moment-generating function of Z :
Z
() = log Z()
1
Lemma 1.2 Let
Z
() exist in a neighbourhood of = 0. Then,
a)
(n)
Z
(0) = cZ
\
, /
b)
t
Z
(0) = cZ,
tt
?
(/) = varZ ,
ttt
?
(/) = c(Z cZ)

Remark: If Y, Z independent, then


Y +Z
() =
Z
()
Z
() and
Y +Z
() =
Y
() +

Z
()
If X
1
, X
2
, . . . i.i.d., S
n
= X
1
+. . . +X
n
, then
S
n
() = n
X
1
()
Lemma 1.3 Let N 0 random integer, independent of i.i.d. X
1
, X
2
, . . . , S
N
= X
1
+
. . . +X
N
. Then, with p
k
= pr(N = k), k 0,
a) pr(S
N
x) =

k=0
p
k
F . . . F
. .
ktimes
(x)
b)
S
N
() =
N
(
X
1
()) and
S
N
() =
N
(
X
1
())
For the total claim amount S(t) of section 1.1:

S(t)
=
N(t)
(
X
1
())
Special case CL-model:
S(t)
= t(
X
1
() 1)
Corollary 1.1
For the renewal model, c1
|
= + . c^(.).
For the CL-model, c1
|
= + ( ) ..
Denition =

1 safety loading; net prot condition: > 0 ( c1


|
> / for all t )
1.3 Poisson processes
Denition 1.1 A stochastic process N(t), t 0, in continuous time is a Poisson process
if
1) N(0) = 0 a.s.
2) for all n 1, 0 = t
0
< t
1
< . . . < t
n
, the increments N(t
i1
, t
i
] = N(t
i
)
N(t
i1
), i = 1, . . . , n, are independent.
3) For some non-decreasing, right-continuous function : [0, ) [0, ) with (0) =
0 :
L(N(s, t]) = T((s, t]) for all 0 s < t,
where N(s, t] = N(t) N(s), (s, t] = (t) (s). (t) is called the mean function.
4) With probability 1, the path N(t), t 0, is right-continuous and has limits from the
left everywhere (c`adl`ag).
2
Remark: a) L(N) = T() = cN = = varN
b) N(t) Poisson process = (t) = cN(t) and
pr (N(t
1
) = k
1
, N(t
2
) = k
1
+k
2
, . . . , N(t
n
) = k
1
+. . .+k
n
) = e
(t
n
)
n

j=1
((t
j
) (t
j1
))
k
j
k
j
!
for all n 1, 0 = t
0
< t
1
< . . . t
n
, k
1
, . . . , k
n
N 0.
Denition 1.2 a) A Poisson process is homogeneous with intensity > 0 if (t) = t.
If = 1, it is a standard homogeneous Poisson process (SHPP).
b) If, for a general Poisson process, is absolutely continuous with density , i.e.
(s, t] =
_
t
s
(u)du , 0 s < t,
(t) is called the intensity (function).
Denition 1.3 A stochastic process X(t), t 0, in continuous time is a Levy process if
1) X(0) = 0 a.s.
2) for all n 1, 0 = t
0
< t
1
< . . . < t
n
, the increments X(t
i
) X(t
i1
), i = 1, . . . , n, are
independent
3) the increments are stationary, i.e. L(X(t) X(s)) depends only on t s.
4) With probability 1, the sample path X(t), t 0, is c`adl`ag.
Examples: a) homogeneous Poisson process
b) Brownian motion (Wiener process)
c) total claim amount S(t) of Cramer-Lundberg model
Proposition 1.1 Let N(t) be a Poisson process with mean function , N
0
(t) a SHPP.
a) N
0
((t)), t 0, is a Poisson process with mean function .
b) If is continuous, increasing and lim
t
(t) = , then N(
1
(t)), t 0, is a SHPP.
Theorem 1.1 a) Let Y
1
, Y
2
, . . . , i.i.d. Exp(), T
n
=

n
j=1
Y
j
. Then, N(t) = #j; T
j

t is a homogeneous Poisson process with intensity .
b) If N(t) is a homogenous Poisson process with intensity , there are Y
1
, Y
2
, . . . i.i.d.
Exp() such that N(t) = #j, T
j
t with T
n
=

n
j=1
Y
j
.
Proposition 1.2 Let N(t) be a Poisson process with mean function (t) and continuous
intensity function (t) > 0 a.e.. Then, for Y
1
, Y
2
, . . . , T
1
, T
2
, . . . as in Theorem 1.1 b)
a) (T
1
, . . . , T
n
) has the joint density
f(t
1
, . . . , t
n
) = e
(t
n
)
n

j=1
(t
j
) if 0 < t
1
< . . . < t
n
,
and f(t
1
, . . . , t
n
) = 0, else
b) (Y
1
, . . . , Y
n
) has the joint density
g(y
1
, . . . , y
n
) = e
(y
1
+...+y
n
)
n

j=1
(y
1
+. . . y
j
) if y
1
, . . . , y
n
0,
and g(y
1
, . . . , y
n
) = 0, else.
3
1.4 Cramers inequality for ruin probabilities
ruin at time t if risk process R
t
< 0 and lim
st
R
s
> 0.
ruin occurs only at claim times T
1
, T
2
, . . . if at all.
ruin probability, nite horizon t
0
< :
t
0
(u) = pr(S(t) t > u for some t t
0
)
ruin probability, innite horizon (u) = pr(S(t) t > u for some t < )
Notation: (u) = 1 (u)
Main renewal argument: If rst claim at T
1
does not cause riun, then risk process starts
anew with new initial time T
1
and new initial reserves u
1
= u
1
(X
1
T
1
) = u+T
1
X
1
= R
T
1
.
Formally: L(1
|
1
t
) = L(1
T

+|
1
T

[T

, A

) for all t > 0.


Theorem 1.2 Assume Cramer-Lundberg model and net prot condition > 0. Then,
(y) satises the integral equation
(y) (0) =

_
y
0
(y x)F(x)dx
with boundary condition
(0) = 1 (0) =

where F(x) = 1 F(x) denotes the tail of L(A


[
).
Remark: By a limit argument, for 0, i.e. , we get
(u) = 1 for all u, if 0.
Corollary 1.2 For the Erlang-model:
(u) =
_

exp(
1

(1

)u) if > 0
1 if 0
Denition 1.4 Let F(x) denote the tail 1F(x) of L(A
[
). The risk process R
t
satises
the Cramer-Lundberg condition if there is a > 0 with
_

0
e
x
F(x)dx =

.
is called the Lundberg exponent or adjustment coecient.
Theorem 1.3 (Cramers inequality): Let R
t
be a Cramer-Lundberg process satisfying
the Cramer-Lundberg condition. Then:
(u) e
u
.
Idea of proof: Ruin occurs at most at times T
1
, T
2
, . . .
With V
j
:= X
j
Y
j
loss in time interval (T
j1
, T
j
]
R
T
k
= u

k
j=1
V
j
=: u W
k
R
T
k
< 0, i.e. ruin at T
k
i W
k
> u
Apply following theorem.
4
Theorem 1.4 Let V
1
, V
2
, . . . i.i.d. real random variables, W
k
=

k
j=1
V
j
the correspond-
ing random walk. Let there exist a > 0 such that
c|
1
|

1
|
() = .
Then, (u) = pr(W
k
> u for some k 1) e
u
.
Corollary 1.3 Under the assumption of Th. 1.2:
F(x) = 1 F(x) e
x
(1 +

)
Remark: The tail F(x) of the claim amount distribution L(A
[
) decreases exponentially
with x, i.e. large values of X
j
are quite unlikely. Therefore, the Cramer-Lundberg condi-
tion is a small claim condition, i.e. the probability for extremely large individual claims
is very small. This is not always a realistic assumption.
1.5 Premiums
For risk process R
t
as above consider annual premium in relation to annual risk (i.e. to-
tal claim amount in year t +1) S(t +1)S(t) which, for any t, is distributed as S(1) Z.
Premium calculation principle H: Given nonnegative random variable Z, representing a
risk, with distribution L(Z). H is a mapping of L(Z) onto the premium required for
insuring the risk:
H : L(Z) (Notation: H(Z) = )
Examples:
a) pure risk premium: = cZ (certain ruin, even for large initial reserves)
b) loading proportional to cZ : = (+)cZ for some > 0
c) -loading: = cZ + (Z) for some > 0,
2
(Z) = varZ
d)
2
-loading: = cZ + varZ for some > 0
e) covariance loading:
1
= cZ

+ cov(Z

, Z + Z

) as a premium for including


new risk Z
1
into existing portfolio of risks with total value Z.
These are heuristic, but practically important principles. In the rest of this section, we
discuss two theoretically based principles.
A utility U(x), x > 0, measures the value attributed to an amount x of money. Two
utilities U
1
, U
2
are equivalent if U
1
(x) = U
2
(x)a+b, a > 0. Therefore, as standardization,
we require U(0) = 0, U
t
(0) = 1.
Intuitively, U
t
(x) 0, i.e. U(x) increases. We require usually a bit more:
U C
2
, U
t
(x) > 0, U
tt
(x) 0 for all x > 0.
5
Zero-utility principle: expected utility after 1 year zero utility, and = H(Z) dened
by requiring equality:
c|( Z) = |(/) (= / by standardization).
Alternatively, taking initial reserves into account,
c|( + Z) = |().
Special cases:
i) U(x) = x zero-utility principle pure risk premium
ii) U(x) =
1
a
(1 e
ax
) for some a > 0 : exponential utility with risk aversion parameter
a. Then,
zero-utility principle =
1
a
log c|
?
=


?
().
In particular, for L(Z) = ^(,

) :
= +
a
2

2
, i.e.
2
loading
Denition: Premium calculation principle H is additive if for independent risks Z
1
, Z
2
:
H(Z
1
+Z
2
) = H(Z
1
) +H(Z
2
).
Proposition 1.3 Zero-utility principle additive i U(x) = x or
U(x) =
1
a
(1 +e
ax
) for some a > 0.
Expected value principle: Given f : [0, ) [0, ) continuous, strictly increasing.
= H(z) is dened by
f() = c(Z) or =

(c(Z))
Special case: f(x) = e
ax
zero utility principle with exponential utility
Denition: Premium calculation principle H is iterative if for any random variable Y ,
representing information on risk Z, and any risk Z
H(Z) = H(H(Z[Y ))
(analogy to: cZ = c(cZ[)).
Proposition 1.4 H iterative i H is an expected value principle for some suitable f.
Remark: For f(x) = e
ax
, H is iterative and additive.
Let > 0 be given by ce
(z)
= 1. For initial reserves u, ruin occurs if Z > u. As in
proving Theorem 1.2 (but much simpler), a Cramer-type inequality follows:
pr(Z > u) e
u
for all u 0. If is calculated by expected value principle with
f(x) = e
ax
( zero utility principle with exponential utility) = a.
6
1.6 Experience rating and credibility theory
Portfolio of risks partitioned into N taris or risk classes.
Available data from T years:
V
tj
insured volume in tari j in year t ; j = 1, . . . , N, t = 1, . . . , T
X
tj
total claim amount in tari j in year t ; j = 1, . . . , N, t = 1, . . . , T

tj
=
X
tj
V
tj
relative costs in tari j in year t ; j = 1, . . . , N, t = 1, . . . , T
Notation: V.j =

T
t=1
V
tj
, V
t
=

N
j=1
V
tj
, V.. =

t,j
V
tj
, X.
j
, X
t
, X.. analogously.
.
j
= X.
j
/V.
j
,
t
= X
t
/V
t
, .. = X../V..
The s are also called loss ratios.
Credibility problem: Estimate expected loss ratio in each tari:

j
= c(.
j
) =?
Naive solutions:
j
= .
j
(ignoring large variability in small classes)

j
= .. (ignoring dierences between classes)
Alternative approach
j
=
j
.
j
+ (1
j
)
0
with
j
=
w V.
j
v +w V.
j
,
w = variance of X.
j
within portfolio
v = variance of X
tj
in time

0
=
N

i=1

X.
i
, =
N

i=1

i
In general:
0
,= ..
Assumption: Each risk class is characterized by risk parameter
j
, j = 1, . . . , N, such
that:
c
tj
[
j
= = ()
var
tj
[
j
= =
1
V
tj

2
()
Example: Assuming a Cramer-Lundberg model for each risk with identical distribution
F of individual claim amounts for all risk classes and Poisson-intensity
j
of claim oc-
curence times in risk class j = 1, . . . , N, we choose
j
=
j
, i.e. risk parameter intensity
of claims. As individual claims have the same distribution and are insured in the same
manner:
volume V
tj
number of risks insured in class j and year t.
Then, using notation of section 1.1
X
tj
=
d
V
tj

k=1
S
k
(1) (=
d
stands for is distributed as)
7
where S
1
(1), S
2
(1), . . . are independent annual claim amounts distributed like:
S
k
(1) =
d
N(1)

l=1
X
l
with N(1), X
1
, X
2
, . . . independent, L(N(1)) = T(
j
), L(X
l
) = F. By Corollary 1.1 and
using a similar argument for the variance:
cS
k
(1) = cN(1) =
j
varS
k
(1) =
2
varN(1) +
2
cN(1) = (
2
+
2
)
j
with = cX
l
,
2
= varX
l
. We get cX
tj
[
j
= V
tj

j
and varX
tj
[
j
= V
tj
(
2
+
2
)
j
.
Therefore,
c
tj
[
j
=
1
V
tj
cX
tj
[
j
=
j
(
j
)
var
tj
[
j
=
1
V
2
tj
varX
tj
[
j
=
1
V
tj
(
2
+
2
)
j

1
V
tj

2
(
j
).
Bayesian approach: The unknown risk parameter
1
, . . . ,
N
are treated as i.i.d. random
variables with a-priori-distribution G and distribution function G(u) = pr(
j
u), called
the structure function of the portfolio. var(
j
) is a measure for the heterogeneity of
risks.
Assumption: The data V
tj
,
tj
, t = 1, . . . , T, j = 1, . . . , N are given, where c
tj
[
j
=
(
j
), var
tj
[
j
=
1
V
tj

2
(
j
) for all t, j and
tj
,
si
are conditionally independent given

= (
1
, . . . ,
N
) if t ,= s or j ,= i.
Notation: m = c (
j
) =
_
(u)dG(u), v = c
2
(
j
), w = var((
j
))
A simple formulation of the credibility problem: For any given risk class k, estimate
the average loss ratio c
tk
[
k
= (
k
) by a linear, unbiased estimate
k
:

k
=

t,j

tj

tj
, c
k
= c (
k
) (recall:
k
random!)
such that
c(
k
(
k
))
2
= min

tj
!

tj
depend on k !
Using c(
tj

si
)[ = c(c
tj

si
[

) = m
2
+
ij
w +
ij

st
v
V
tj
c(
tj
(
k
)) = m
2
+
ik
w
c
2
(
k
) = m
2
+w
this reduces to the following constrained minimization problem:
min

tj

t,j,s,i

tj

si
_
m
2
+
ij
w +
ij

st
v
V
tj
_
2

t,j

tj
[m
2
+
jk
w] + [m
2
+w]
8
under the constraint

t,j

tj
= 1
with solution:
tj
=
V
tj
V.
j
_

(1
k
) +
jk

k
_
and
k
=

t,j

ij

tj
=
k
.
k
individual experience
+ (1
k
)
N

j=1

.
j
. .
=
0
overall experience
.
The credibility factor
j
= w V.
j
/(w V.
j
+v) depends on unknowns w, v.
Unbiased estimates are: v = V.. s
2
t
, w =
1
w
=
1
Q
(s
2
g
s
2
t
)
s
2
t
=
1
N
N

j=1
1
T 1
T

t=1
V
tj
V..
(
tj
.
j
)
2
s
2
g
=
1
TN 1

t,j
V
tj
V..
(
tj
..)
2
Q =
1
TN 1
N

j=1
V.
j
V..
(1
V.
j
V..
)
1.7 Cramer-Lundberg Theory for Large Claims
Notation: For claim sizes X
j
with tail distribution F(x) = pr(X
j
> x) and = cX
j
, the
integrated tail distribution is given by
F
I
(x) =
1

_
x
0
F(y)dy.
F
I
itself is a distribution function on (0, ). In terms of F
I
, the C-L-condition is: there
is a > 0 such that
_

0
e
x
dF
I
(x) =
1

_

0
e
x
F(x)dx
!
=

= 1 +
with net prot condition

1 > 0.
Denition: A distribution G on (0, ) belongs to class / if, for all > 0,
_

0
e
x
dG(x) = .
If F
I
/, the corresponding claim size distribution cannot satisfy the C-L-condition.
Denition: a) A positive, measurable function L on (0, ) is slowly varying, (L 1
0
),
if lim
x
L(tx)
L(x)
= 1 for all t > 0.
9
b) A positive, measurable function H is regularly varying of index R (H 1

),
if lim
x
H(tx)
H(x)
= t

for all t > 0.


Remark: If H 1

, then L(x) :=
H(x)
x

1
0
, i.e. H(x) = x

L(x). If F(x) 1
1
for
some > 0, then F
I
/.
Lemma 1.4 Let X, Y be independent r.v. with tail distributions F(x) =
1
x

L
1
(x), F
2
(x) =
1
x

L
2
(x) 1

, for some > 0. Then, the tail distribution G(z) = pr(X + Y > z) of
X +Y is in 1

, too. More precisely:


G(x)
1
x

(L
1
(x) +L
2
(x)) for x .
Corollary 1.4 Let X
1
, X
2
, . . . i.i.d. with tail distribution F(x) =
1
x

L
1
(x) 1

for
some > 0. Then, the tail distribution of S
n
= X
1
+. . . X
n
satises
pr(S
n
> x) = F
n
(x) nF(x) for x , for all n 1.
(F
n
= F . . . F = L(S
n
), n-fold convolution)
Let M
n
= max(X
1
, . . . , X
n
). Then, pr(M
n
x) = pr(X
1
x, . . . , X
n
x) = F
n
(x), and
the tail distribution of M
n
satises:
F
n
(x) = pr(M
n
> x) = 1 F
n
(x) = F(x)
n1

k=0
F
k
(x) F(x) n
for x and all n 1.
Corollary 1.5 Let X
1
, X
2
, . . . i.i.d. with tail distribution F(x) 1

(x) for some > 0.


Then, pr(S
n
> x) pr(M
n
> x) for x , for all n 1.
Let (u) be the ruin probability, (u) = 1 (u) as in section 1.3. Using the integrated
tail distribution, Theorem 1.1 can be reformulated as
a) (y) (0) =
1
1+
_
y
0
(y x)dF
I
(x)
b) (0) =

1+
Corollary 1.6 Under the assumptions of Theorem 1.1:
(y) =

1 +

n=0
1
(1 +)
n
F
n
I
(y), y 0.
10
Interpretation: If Z
1
, Z
2
, . . . i.i.d. with distribution F
I
, independent of random N which
is geometrically distributed with parameter q =
1
1+
, then
(y) = pr
_
N

j=1
Z
j
y
_
.
Corollary 1.6 is, therefore, called a geometric representation of the non-ruin probability.
Recall: N geometrically distributed if pr(N = n) = (1 q)q
n
, n = 0, 1, 2, . . .
Analogously, we can derive a geometric representation of the ruin probability
(y) = pr
_
N

j=1
Z
j
> y
_
=

1 +

n=0
1
(1 +)
n
F
n
I
(y)
Therefore, if F
I
1

for some > 0, we have from Corollary 1.4:


(u)
F
I
(u)

1 +

n=0
1
(1 +)
n
n =
1

provided that

n=0
and lim
u
may be interchanged (which is the case - compare The-
orem 1.4). As a consequence, we get that, for large initial reserves u, the ruin probability
is essentially determined by the tail of the claim size distribution:
(u)
u
1

F
I
(u) =
1

_

u
F(y)dy
Denition: A distribution G with support (0, ) is subexponential if lim
x
G
n
(x)
G(x)
= n
for all n 1. Notation: G o.
Remark: If G 1

, > 0, then G o (by Corollary 1.4). Corollary 1.5 holds for


F o, too.
Lemma 1.5 G o i limsup
n
G
2
(x)
G(x)
2.
Lemma 1.6 a) If G o, then lim
x
G(xy)
G(x)
= 1 uniformly in y C, C compact
b) If assertion of a) holds, then lim
x
e
x
G(x) = for all > 0
(explanation of term subexponential)
c) If G o, then for any > 0 there is a K > 0 such that
G
n
(x)
G(x)
K(1 +)
n
for all n 1, x 0.
11
Theorem 1.5 (Cramer-Lundberg Theorem for large claims): Let 1
t
be a Cramer-
Lundberg process with net prot condition =

1 > 0 and F
I
o. Then, the ruin
probability satises
(u)
1

F
I
(u) =
1

_

u
F(y)dy for u .
Consider now a general renewal model for which N(t) = #j; T
j
t, the number of
claims up to time t, has a general distribution given by
pr(N(t) = n) = p
t
(n), n 0.
Let G
t
(x) = pr(S(t) x) be the distribution function of the total claim amount S(t) =

N(t)
j=1
X
j
up to time t, which, using independence, is
G
t
(x) =

n=0
pr(N(t) = n, S
n
x) =

n=0
p
t
(n)F
n
(x)
Let R
t
be a renewal process with F = L(X
j
) o, and, for xed t > 0 :

n=0
p
t
(n)(1 +)
n
< for some > 0.
Then, G
t
o, and G
t
(x) = pr(S(t) > x) F(x) c^(.) for x .
In the Cramer-Lundberg case: L(N(t)) = T(t) and G
t
(x) t F(x).
Example 1.1 p
t
(n) =
_
+n1
n
_
_

+t
_

_
t
+t
_
n
, n 0
i.e. N(t) a negative binomial process. The above condition on p
t
(n) is satised and
cN(t) =

t

such that G
t
(x)

t

F(x). Apart from the Cramer-Lundberg assumption of


a homogeneous Poisson process, this is the most frequent model for claim number distri-
bution in insurance practice. It is appropriate in the case of overdispersion, i.e. where
varN(t) > cN(t). In the negative binomial case: varN(t) = c^(.) (+
|

). For the
Poisson process we have in contrast c^(.) = var^(.). One cause for overdispersion
is intersubject variability where the intensity of claims varies between subjects, i.e. one
observes a mixed Poisson process:
Denition 1.5 Let be a positive random variable, independent of a homogeneous Pois-
son process N
1
(t) with intensity 1. The process N(t) = N
1
(t) is a mixed Poisson process.
If is Gamma-distributed, i.e. its density is f

(y) =

()
y
1
e
y
, y > 0, then N(t) is a
negative binomial process.
If pr( = ) = 1, N(t) is a homogeneous Poisson process with intensity .
F o does in general not imply F
I
o and vice versa. Therefore, conditions for F
I
o
are of interest. Some of them use the hazard rate q(x) = f(x)/F(x), known from survival
analysis. If is a survival time with distribution F and density f, then
q(x)dx =
f(x)dx
F(x)

1
F(x)
_
x+dx
x
f(t)dt = pr( x +dx/ > x)
12
Lemma 1.7 a) If limsup
x
xq(x) < , then F
I
o.
b) F
I
o if lim
x
xq(x) = , lim
x
q(x) = 0 and one of the following conditions
holds:
(i) limsup
x
xq(x)
log F(x)
< 1
(ii) q 1

for some 0 < 1


As a consequence, for the following list of heavy-tailed distributions we have F o and
F
I
o, provided the rst moment exists:
Pareto, Weibull ( < 1), lognormal, Benktander type I and II, Burr, loggamma.
2 Fluctuations of sums and maxima
Throughout this chapter: X
1
, X
2
, . . . , X i.i.d. with distribution L(X
j
) = F.
S
n
= X
1
+. . . +X
n
, X
n
=
1
n
S
n
, M
n
= max(X
1
, . . . , X
n
)
2.1 Limit behaviour of sums
Classical central limit theorem (CLT):
varX
j
=
2
< , cA
[
= =

n(X
n
)

=
S
n
n

n

/
^(/, )
Denition: A distribution G is stable if for i.i.d. Z
0
, Z
1
, Z
2
with law G and for all c
1
, c
2
,
there are b > 0 and a such that
L(|

+|

) = L(Z
t
+).
As a consequence, if F is stable there are normalizing sequences b
n
> 0, a
n
, such that
L
_
o
\

\
_
= L(A

), .
Theorem 2.1 If for some b
n
> 0, a
n
:
S
n
a
n
b
n

/
Z
with nondegenerate L(Z) = (, then G is stable.
Theorem 2.2 (Characterization of stable laws:) G = L(Z) is stable if there are
R, c > 0, (0, 2], [1, +1] such that the characteristic function of Z is

Z
(u) = c|
)|?
= exp) |[[

() sgn()(, ))
with (u, ) = tan

2
for ,= 1 and =
2

log [u[ for = 1.


13
If the location parameter and the skewness parameter are both 0,then G is sym-
metric around 0, and we talk of a symmetric -stable (ss) distribution. In this case:

Z
(u) = e
c[u[

.
If = 2, = 0, then G ^(/,

) with
2
= 2c. In general, c is a dispersion parameter.
is called the characteristic exponent and determines the tail behaviour. For = 1, the
ss-distribution is the Cauchy distribution.
Remark: If G is -stable, L(Z) = (, then, for < 2,
c[Z[

< i < .
Denition: The random variable X with L(A) = T belongs to the domain of attraction
DA(G

) if for suitable b
n
> 0, a
n
:
S
n
a
n
b
n

/
G

.
X DA() if X DA(G

) for some -stable law G

.
Theorem 2.3 a) X DA(2), i.e. the domain of attraction of the normal law i V (x) =
_
x
x
y
2
dF(x) = cA


[,]
(A) is a slowly varying function.
b) X DA() for some < 2 i there are a slowly varying function L(x) and constants
c
1
, c
2
0, c
1
+c
2
> 0 such that
F(x)
c
1
L(x)
[x[

for x and F(x)


c
2
L(x)
[x[

for x +.
If cA

< , then V (x) cA

and, therefore, X DA(2). Otherwise, V (x) slowly


varying (i.e. 1
0
) i the following tail condition is satised:
pr([X[ > z) = O(
1
z
2
V (z)).
Analogously, if X DA() for some < 2 :
pr([X[ > z) =
L(z)
z

for some L 1
0
and
z
2
pr([X[ > z)
V (z)

2

for z .
Corollary 2.1 If X DA(), then:
c[A[

< for <


c[A[

= for > , < 2.


14
Proposition 2.1 a) Let Q(z) = pr([X[ > z) +
1
z
2
V (z).
Then, b
n
can be chosen as solution of Q(b
n
) =
1
n
, n 1. In particular, b
n

n for

2
= varX < and cA = /, and b
n
= n
1/
L(n) for some L(n) 1
0
if < 2.
b) a
n
can be chosen as
a
n
=
_
_
_
n if 1 < 2, = cA
0 if 0 < < 1
0 if = 1 and F symmetric
Theorem 2.4 (General CLT for i.i.d. random variables): Let X DA() for some
0 < 2. Then,
a) If cA

< ,
S
n
n

n

/
^(/, )
a) If cA

= , then
S
n
a
n
n
1/
L(n)

/
G

for an -stable law G

where L 1
0
and a
n
as in Proposition 2.1.
Corollary 2.2 Theorem 2.4 b) holds for L(n) c i < 2 and the condition of Theorem
2.3 holds with L(x) c.
Let 0 T
1
T
2
. . . be an increasing sequence of random variables,
N(t) = maxk; T
k
t, S(t) =
N(t)

j=1
X
j
.
Let T
1
, T
2
, . . . be independent of X
1
, X
2
, . . .
Theorem 2.5 (Anscombes CLT): Suppose X DA() for some 0 < 2 and
N(t)
t

p
> 0.
Then, Theorem 2.4 holds with N(t) replacing n and S(t) replacing S
n
. Moreover,
S(t) a
N(t)
(t)
1/
L(t)

/
G

i.e. N(t) may be replaced by its asymptotic approximation in the denominator.


15
2.2 Limit behaviour of maxima
The distribution function of M
n
= max(X
1
, . . . , X
n
) is given by
pr(M
n
x) = pr(X
1
x, . . . , X
n
x) = F
n
(x).
Let x
F
= supx R; F(x) < 1 . Then, for n ,
F
n
(x) 0 for x < x
F
, F
n
(x) 1 for x > x
F
, i.e.
M
n

p
x
F
(also: M
n

a.s.
x
F
due to monotonicity). To get an interesting limit behaviour
we have to standardize M
n
.
Denition 2.1 The random variable X belongs to the maximum domain of attraction
MDA(H) of a nondegenerate law H if for suitable c
n
> 0, d
n
:
M
n
d
n
c
n

/
H
i.e. F
n
(c
n
x +d
n
) H(x) for all points x of continuity of H(x).
Denition 2.2 Extreme value distributions with distribution functions
a) Frechet:

(x) = exp1/x

, x 0 , for some > 0


b) Gumbel: (x) = expe
x
, x R
c) Weibull:

(x) = exp[x[

, x 0 , for some > 0


The Frechet distributions are supported on [0, ), the Weibull distributions on (, 0].
Denition 2.3 The generalized extreme value distribution (GEV) with shape parameter
R has the distribution function:
H

(x) = exp(1 +x)


1/
, 1 +x > 0 for ,= 0
H
0
(x) = (x)
This notion just combines the three dierent distributions of the previous denition:
H

(
x 1

) =
1/
(x) for > 0, H

(
x + 1

) =
1/
(x) for < 0.
The denition describes the standard forms. In general, we may apply shifts and scale
transformations to get other GEV-laws: H(x) = H

(
x

) for some , R, > 0. In


the asymptotic theory this does not matter as the standardizing sequences can always be
chosen such that the limit is in standard form ( = 0, = 1).
Theorem 2.6 (Fisher-Tippett): If there are c
n
> 0, d
n
and a non-degenerate H such that
M
n
d
n
c
n

/
H,
then H is a GEV distribution.
16
Lemma 2.1 (Convergence of types theorem): Let U
1
, U
2
, . . . , V, W be random variables,
b
n
,
n
> 0, a
n
,
n
R. Suppose:
U
n
a
n
b
n

/
V.
Then:
U
n

/
W i
b
n

n
b 0,
a
n

n
a R.
In this case: L(W) = L(bV +a).
Sketch of proof of Theorem 2.6: t > 0, [ ] = integer part,
M
n
d
n
c
n

/
H. As F
[nt]
is distri-
bution function of M
[nt]
,
F
[nt]
(c
[nt]
x +d
[nt]
) H(x) for [nt] , i.e. n .
On the other hand,
F
[nt]
(c
n
x +d
n
) = (F
n
(c
n
x +d
n
))
[nt]
n
H
t
(x) for n .
Therefore:
M
[nt]
d
[nt]
c
[nt]

/
H ,
M
[nt]
d
n
c
n

/
H
t
.
By Lemma 2.1:
c
n
c
[nt]
(t) 0,
d
n
d
[nt]
c
[nt]
(t),
and
H
t
(x) = H((t)x +(t)), t > 0, x R. (A)
Applying that argument to t, s and s t implies
(st) = (s) (t), (st) = (t)(s) +(t). (B)
Solving the functional equations (A), (B) for H(x), (t), (t) implies H ,

, i.e.
GEV.
Remark The GEV - laws coincide with the class of max-stable laws F = L(X) dened
by the existence of c
n
> 0, d
n
, n 1 such that
L(M
n
) = L(c
n
X +d
n
), n 1.
Denition 2.4 F distribution function. The generalized inverse F

of F:
F

(t) = infx R; F(x) t, 0 < t < 1


is called the quantile function. F

(t) is the t-quantile.


Theorem 2.7 (Characterization of MDA of Frechet and Weibull-laws):
17
a) F MDA(

) for some > 0 i F(x) =


1
x

L(x) for some L 1


0
. In this case,
x
F
= and
M
n
c
n

with c
n
= F

(1
1
n
).
b) F MDA(

) for some > 0 i x


F
< and F(x
F

1
x
) =
1
x

L(x) for some


L R
0
. In this case,
M
n
x
F
c
n

with c
n
= x
F
F

(1
1
n
).
Proposition 2.2 Let 0 , u
n
R. Then, for n ,
nF(u
n
) i pr(M
n
u
n
) e

.
Corollary 2.3 F MDA(H) with normalizing sequences c
n
, d
n
i
nF(c
n
x +d
n
) log H(x) for all x R, n .
Remark Part b) of the Theorem follows rather immediately from part a) by exploiting
the relation

(
1
x
) =

(x), x > 0. Mark also that

(x)
1
x

, > 0.
Theorem 2.8 (MDA of the Gumbel law): Let x
F
. F MDA() i there exists
z < x
F
, measurable scaling functions c(x) c > 0, g(x) 1 for x x
F
and an
absolutely continuous function e(x) > 0 with e
t
(x) 0 for x x
F
such that
F(x) = c(x) exp
_
x
z
g(y)
e(y)
dy, z < x < x
F
.
In this case,
M
n
d
n
c
n

/

with d
n
= F

(1
1
n
) and c
n
= e(d
n
). The function e(x) may be chosen as the mean excess
function
e(x) =
_
x
F
x
F(y)
F(x)
dy, x < x
F
.
Lemma 2.2 F satises the conditions of Theorem 2.8 if F C
2
on (z, x
F
) with density
f > 0, f
t
< 0 on (z, x
F
), i.e. F is strictly concave on (z, x
F
), and F(x)f
t
(x)/f
2
(x)
1 for x x
F
. Then, e(x) may be chosen simply as e(x) = F(x)/f(x).
Examples:
18
a) The Pareto-, Burr- and -stable laws ( < 2) all have Pareto-tails, i.e.
F(x)
K
x

for x ,
and, therefore, are in MDA(

). As F
1
(t) (
K
t
)
1/
, c
n
(Kn)
1/
and:
M
n
(Kn)
1/

/

for n .
b) For the uniform distribution U(0, 1), we have x
F
= 1, F(x) = 1 x, 0 x 1, and
F(1
1
x
) =
1
x
. Therefore, U(0, 1) MDA(
1
), c
n
=
1
n
and:
n(M
n
1)
/

1
for n .
c) For Exp(), we have F(x) = e
x
, satisfying the assumption of Theorem 2.8 with
c(x) = 1, g(x) = 1, z = 0 and e(x) =
1

. Therefore, Exp() MDA(), c


n
=
1

, d
n
=
1

log n and:
(M
n

log n)
/
for n .
d) Let , denote distribution function and density of ^(0, 1). We have (x)
1
x
(x)
for x (Mills ratio), from which we get the conditions of Lemma 2.2. Therefore,
^(0, 1) MDA(), and some asymptotic approximations show
_
2 log n(M
n
d
n
)
/
for n
with d
n
=

2 log n log log n + log(4)/2

2 log n.
Denition 2.5 Let u < x
F
be a given threshold.
a) F
u
(x) = prX u x/X > u = (F(u + x) F(u))/F(u), 0 x < x
F
u is
called the excess distribution function above the threshold u.
b) e(u) = cX u/X > u, u < x
F
, is the mean excess function.
Remarks
i) Integration by parts implies (compare Theorem 2.8):
e(u) =
_
x
F
u
F(y)
F(u)
dy.
ii) If
u
is a random variable with distribution F
u
, then c
u
= e(u).
Proposition 2.3 Let X be a positive random variable with density f.
19
a) The mean excess function characterizes F uniquely:
F(x) =
e(0)
e(x)
exp
_
x
0
1
e(u)
du, x > 0.
b) If F(x) =
1
x

L(x) for some L 1


0
, > 1, then e(u)
u
1
for u .
Denition 2.6 The generalized Pareto distribution (GPD) with parameters > 0, has
the distribution function
G
,
(x) = 1 (1 +
x

)
1/
for
_
x 0 if > 0
0 x

if < 0,
and
G
0,
(x) = 1 e

x
, x 0.
G

(x) G
,1
(x) are called standard GPDs.
For = 0, the GPD is Exp(
1

), for > 0 it is a Pareto distribution reparametrized (in


particular: =
1

). For < 0, this type of law is called a Pareto distribution of type II.
Theorem 2.9 F MDA(H

) for a GEV H

with shape parameter i there exists a


(measurable) function (x) > 0 such that
sup
0xx
F
u
[F
u
(x) G
,(u)
(x)[ 0 for u x
F
,
where G
,
denotes a GPD.
Corollary 2.4 If F MDA(H

), then for an appropriate scaling function (u) :


pr
X u
(u)
> x[X > u G

(x) for u x
F
.
(Approximation of the law of scaled excesses).
Examples: a) By the memorylessness of Exp(), we have F
u
(x) = F(x) for all u > 0,
and, therefore, F
u
G
0,
with (u) =
1

.
b) Stability of GPDs under truncation: If F = G
,
, then for all u > 0
F
u
(x) G
,+u
(x) for
_
x 0 if 0
0 x <

u if < 0,
i.e. (u) = + u in this case.
20
2.3 Exploiting Fisher-Tippett in practice
To estimate characteristics of the tail behaviour, one may use for large n that, by Theorem
2.6, L(M
n
) is approximately GEV, i.e. there are a shape parameter , a location parameter
and a scale parameter (the latter two depending on n) such that
pr(M
n
y) H

(
y

).
To estimate = (, , )
T
, we need a whole sample of maxima M
(1)
n
, . . . , M
(b)
n
. For that
purpose, a very large sample of size N = b n is partitioned into b blocks of size n each,
and M
(k)
n
is chosen as the maximum of the observations in the k-th block, k = 1, . . . , b.
Assumption: M
(1)
n
, . . . , M
(b)
n
are i.i.d. with distribution function H

(
y

). Let h

(y) denote
the density of H

(
y

) with support
D

= y; 1 +
y

> 0.
We get the ML-estimates of :

= (

, , )
T
by maximizing the log-likelihood (stressing
the dependence of the support D

of h

on ):
(/M
(1)
n
, . . . , M
b)
n
) =
b

k=1
logh

(M
(k)
n
) 1
D

(M
(k)
n
) = max!
Though D

depends of , contradicting one of the usual assumptions of asymptotics for


ML-estimates in standard situations, it can be shown that, for >
1
2
, the same asymp-
totic results as usual hold, e.g. asymptotic normality of

with rate
1

N
and asymptotic
eciency (compare also Proposition 3.1).
As usual, one may calculate asymptotic condence regions for , construct LR-tests for,
e.g., the size of and so on. The same applies for functions of like:
Denition: Let M
n
= maxX
1
, . . . , X
n
, X
1
, . . . , X
n
i.i.d. The n-block return level
R
n,
(e.g., for daily data n = 365, = 10, the 10-year return level) is the (1
1

)-quantile
of M
n
, i.e.
pr(M
n
> R
n,
) =
1

.
We are talking of a 10-year (resp. a centennial) event if in year no. k we have M
(k)
n
>
R
365,10
(resp. > R
365,100
).
Let F = L(X
j
) and, therefore, L(M
n
) = F
n
and F
n
(R
n,
) = 1
1

. Therefore, R
n,
is the
(1
1

)
1
n
-quantile of F. In applications, (1
1

)
1
n
1 such that the empirical quantile
of the X
j
is based on too few observations and highly variable. For estimating R
n,
, one
uses again Fisher-Tippett from which we have, using the notation H

(y) = H

(
y

) :
R
n,
H
1

(1
1

) +

log(1
1

.
21
Replacing , , by their ML-estimates

, , , we get the ML-estimate

R
n,
of R
n,
.
A critical point in using that type of extreme value statistics is the choice of block length
n and number of blocks b given the total sample size N. If b is too small, the sample size
for the ML-procedure is small, and the variance of

or

R
n,
is large. If n is too small, the
approximation of L(M
n
) by a GEV is not a good one which causes a bias in estimates
like

R
n,
to appear. As N = n b, we have one of the common bias-variance dilemmas.
Example: The annual maxima of the water level of River Nidd (England) are given from
1936-1970, i.e. n = 365, b = 35, N = 12 775.
1) What is the probability that next years maximum exceeds all previous annual maxima?
We use the estimation procedure above and get
1 H

(maxM
(k)
365
, k = 1, . . . , 35) 0.04
2) What is the 10-year return level R
365,10
? From the data, we get

R
365,10
222. The
plot shows this level together with a condence interval for R
365,10
.
3 Statistics for Extremal Events
Literatur: EKM, chapter 6
Throughout this chapter, X, X
1
, . . . , X
N
are i.i.d. random variables with L(X
j
) = F and
x
F
= (unbounded support on the right-hand side).
Notation: X
(1)
. . . X
(N)
and X
(1)
. . . X
(N)
denote the order statistics, i.e. the
ordered data, in ascending and descending order. Of course, X
(1)
= X
(N)
, X
(N)
= X
(1)
etc.
Denition: Let K
N
(u) = j N; X
j
> u and

F
N
(x) =
1
N

N
j=1
1
(,x]
(X
j
) be the
empirical distribution function, N
u
= #K
N
(u).
e
N
(u) =
_

u

F
N
(y)dy/

F
N
(u) =
1
N
u

jK
N
(u)
(X
j
u) =
1
N
u
N

j=1
(X
j
u)
+
is called the empirical mean excess function.
e
N
(u) approximates the mean excess function e(u) of section 2.2.
For exploratory data analysis, one frequently considers the following plots:
probability plot (F(X
(k)
),
N k + 1
N + 1
), k = 1, . . . , N,
quantile plot (X
(k)
, F

(
N k + 1
N + 1
)), k = 1, . . . , N,
mean excess plot (X
(k)
, e
N
(X
(k)
)), k = 1, . . . , N.
22
By the Glivenko-Cantelli theorem, the rst two plots should be approximately linear, if
the assumption L(X
j
) = F really holds.
3.1 The POT-method (peaks-over-threshold)
The goal of this and the following sections is the derivation of tail estimates, i.e. estimates
of F(x) = 1 F(x) for large x, and of corresponding quantities like quantiles F

(q) for
q 1.
Assumption: F MDA(H

) for some GEV H

, 0.
Let K
N
(u) and N
u
denote the set of indices and the number of indices for which X
j
exceeds the given threshold u (as above).
Denition: The excesses above the threshold u are the random variables Y
l
, l = 1, . . . , N
u
,
with Y
1
, . . . , Y
N
u
= X
j
u; j K
N
(u).
The POT (peaks-over-threshold) method is based on considering the Y
l
, l N
u
, as the
main information about the tail behaviour of the original data X
j
, j N.
Remark: By denition, given N
u
, Y
1
, . . . , Y
N
u
are i.i.d. with L(Y
l
) = F
u
, the excess dis-
tribution (compare section 2.2). Therefore, by Theorem 2.10, F
u
(y) G
,(u)
(y) for some
GPD, provided u is large enough.
By denition, F
u
(y) = pr(X u > y/X > u) = F(y +u)/F(u), or:
F(x) = F(u) F
u
(x u), u < x < .
u is large, therefore F
u
may be approximated by G
,
for appropriate , . F(u) is replaced
by

F
N
(u), the empirical distribution function:

F
N
(u) =
N N
u
N
= 1
N
u
N
.
For u itself, this works, but not for x u. The estimate 1

F
N
(x) for F(x) depends for
extreme x only on very few observations and is too unreliable. Using the above identity
for F(x) and replacing the two factors of the right-hand side by their approximations we
get:
Denition: The POT-tail estimate F

(x) for F(x), x large, is given by


F

(x) =
N
u
N
G

(x u) =
N
u
N
(1 +

(x u)

)
1/

, u < x < ,
where

,

are some appropriate estimates (e.g. ML-estimates) for , based on the ex-
cesses Y
1
, . . . , Y
N
u
.
23
We consider the ML-estimation of , for a sample Y
1
, . . . , Y
M
of xed, not random, size
M which are assumed to be i.i.d. with L(Y
j
) = G
,
, > 0. As the density of the Pareto
distribution is
g(y) =
1

(1 +
y

1
, x 0,
the log-likelihood function is, denoting Y = (Y
1
, . . . , Y
M
)
T
:
l(, /Y ) = M log (
1

+ 1)
M

j=1
log(1 +

Y
j
).
Maximizing it, we get the ML-estimates

,

.
Proposition 3.1 If >
1
2
, then for M :

M(

1)
T

/
^

(/, T

)
with D = (1 +)
_
1 + 1
1 2
_
.
Moreover,

,

are asymptotically ecient.
In the POT-approach, M = N
u
is random. Then,

,

are the conditional ML-estimates
given N
u
. The limit theory is known for that case; to avoid an asymptotic bias, F has to
satisfy some second-order conditions.
Denition: The POT-quantile estimate x
q
for the q-quantile x
q
= F

(q) is given as the


solution of F

( x
q
) = 1 q, i.e.
x
q
= u +

_
(
N
N
u
(1 q))

1
_
.
To compare this estimate with the common empirical quantile, assume that u is chosen
such that there are exactly k exceedances: N
u
= k > N(1 q), i.e. u = X
(k+1)
. Then,
depending on the choice of u resp. k, the POT-quantile estimate is:
x
q,k
= X
(k+1)
+

k
_
_
N
k
(1 q)
_

k
1
_
,
stressing the dependence of the ML-estimates for , on k. The empirical quantile is
x
e
q
= X
([N(1q)]+1)
which corresponds to x
q,k
for the minimal choice k = [N(1 q)] + 1 roughly.
Simulation studies show that the optimal choice k
0
for k which minimizes mse( x
q,k
) =
c(

U,|

U
)

is much larger than [N(1q)] +1, i.e. the POT-estimate diers considerably
24
from the very variable empirical quantile.
The quality of the POT-estimate essentially depends on the choice of the threshold u.
Qualitatively, we have the following bias-variance dilemma:
- if u is too large, there are too few exceedances Y
l
, l N
u
, and the variance of
estimates increases,
- u is too small, the approximation of the excess distribution F
u
by a GPD is not
good, and a nonneglible bias occurs.
Exploratory methods for choosing a useful threshold are based on results like:
Proposition 3.2 If L(Z) = G
,
is a GPD, then the mean excess function is linear:
e(u) = cZ [Z > =
+
+
, /, for / < .
For the Pareto distribution in its usual representation, =
1

. The condition < 1 is


therefore equivalent to > 1, i.e. c[Z[ < .
Threshold selection rule: For POT-estimates, select the threshold u such that the empir-
ical mean excess function e
N
(v) is approximately linear for v u. For this purpose, the
mean excess plot is considered where it is often advisable to omit the highly variable
right-most points (X
(k)
, e
N
(X
(k)
)), k N, which disturb the visual impression.
3.2 Measures of Risk
Denition: Let 0 < q < 1, and F = L(X) be the distribution of claims or losses. Typi-
cally, q = 0.95 or q = 0.99.
a) The Value-at-Risk (VaR) is the q-quantile
VaR
q
(X) x
q
= F

(q).
b) The Expected Shortfall is
ES
q
(X) S
q
= cA[A >
U
.
Denition: (Artzner, Delbaen, Eber, Heath): A coherent risk measure is a function
on the space of real-valued random variables (corresponding to the losses) with
A1) X Y a.s. = (X) (Y ) (monotonicity)
A2) (X +Y ) (X) +(Y ) (subadditivity)
A3) (X) = (X) for 0 (positive homogeneity)
A4) (X +a) = (X) +a (translation equivariance)
25
ES is a coherent risk measure, VaR isnt. Consider, e.g., independent X, Y , assuming
only the two values 0 and 100, with
L(A) = L() = /.
t
+/./
tt
and, therefore,
L(A +) = /.


t
+/./


tt
+ /./ /.
tt
.
For q = 0.95 :
VaR
q
(X) = VaR
q
(Y ) = 0 but VaR
q
(X +Y ) = 100.
The expected shortfall is closely related to the mean excess function at u = x
q
:
S
q
= e(x
q
) +x
q
.
Proposition 3.3 a) If F MDA(H

), 0 < < 1 (Frechet-case), then


lim
u
1
u
e(u) =

1
.
b) If F MDA(H
0
) (Gumbel-case), then
lim
u
1
u
e(u) = 0.
Consider the expected shortfall-to-quantile ratio
S
q
x
q
=
e(x
q
)
x
q
+ 1 for
a) F = ^(/, ) /T/(H
t
) = lim
U
S

= .
b) F = t

MDA(H

) with =
1

= lim
q1
S
q
x
q
=
1
1
=

1
> 1.
q 0.95 0.99 0.995 q 1
^(/, ) 1.25 1.15 1.12 1
t
4
1.50 1.39 1.37 1.33
t
2
2.11 2.02 2.01 2
Losses exceeding VaR
q
exceed it by 15% on the average for q = 0.99, ^(/, ), but by
102% on the average for q = 0.99, t
2
.
The expected shortfall may be estimated by the POT-method. F
u
(x) G
,
(x) for large
enough threshold u implies
e()
+ ( u)
1
for > u.
Therefore, for x
q
> u, we have
S
q
x
q
= 1 +
e(x
q
)
x
q

1
1
+
u
x
q
(1 )
.
The POT-estimate for S
q
is, then, with x
q
denoting the POT-quantile estimate

S
q,u
=
x
q
1



u
1

.
26
3.3 The Hill estimator
Recall: For > 0, the GEV H

is a Frechet distribution

with =
1

.
Assumption: X
1
, . . . , X
n
i.i.d. with L(X
j
) = F MDA(

) for some > 0.


Denition: Let X
(1)
X
(2)
. . . X
(n)
be the order statistics in descending order. The
Hill estimator
H
of the tail index is, for appropriate k = k(n), given by

H
=
1
k
k

j=1
log X
(j)
log X
(k)

1
.
We motivate this form of an estimate by considering ML-estimates in a series of succes-
sively more complicated situations. Recall F MDA(

) i F(x) =
L(x)
x

for some slowly


varying function L.
1) Assume F(x) =
1
x

, x 1. Then, for Y
j
= log X
j
we have
pr(Y
j
> y) = pr(X
j
> e
y
) = F(e
y
) = e
y
, y 0,
i.e. Y
1
, . . . , Y
n
are i.i.d. Exp(). It is well-known that the ML-estimate of = (cY )
1
is:
= (Y
N
)
1
=
1
n
n

j=1
log X
j

1
=
1
n
n

j=1
log X
(j)

1
.
2) Assume F(x) =
C
x

, x u > 0, with C = u

. Dividing X
j
by u, we are back in case
1). Therefore, the ML-estimate of is now
=
1
n
n

j=1
log
X
j
u

1
=
1
n
n

j=1
log X
(j)
log u
1
.
3) For general F MDA(

), we have F(x)
C
x

for x u where u is an appropriately


large threshold. Let again N
u
= #j N; X
j
u. We condition on the event N
u
= k,
i.e. for x X
(k)
we are approximately in case 2). We calculate the conditional maximum-
likelihood (CML-)estimate for given N
u
= k. For that purpose we need:
Proposition 3.4 Let X
1
, . . . , X
n
be i.i.d. with L(X
j
) = F and density f.
a) The joint density of the order statistics X
(1)
. . . X
(n)
is
f
(n)
(x
1
, . . . , x
n
) =
_
n!
n
j=1
f(x
j
) for x
1
> . . . > x
n
0 else
b) The joint density of X
(1)
, . . . , X
(k)
, k n, is
f
(k)
(x
1
, . . . , x
k
) =
_
n!
(nk)!
F
nk
(x
k
)
k
j=1
f(x
j
) for x
1
> . . . > x
k
0 else
27
Idea of proof: a) density of (X
1
, . . . , X
n
)
T
is
n
j=1
f(x
t
j
), x
t
1
, . . . , x
t
n
R. Every possible
value (x
1
, . . . , x
n
)
T
, x
1
> . . . > x
n
, of the vector of order statistics is the result of ordering
one out of n! possible vectors (x
t
1
, . . . , x
t
n
).
b) Integrate f
(n)
with respect to x
k+1
, . . . , x
n
successively, remembering x
n
< x
n1
, x
n1
<
x
n2
, etc.
If F(x)
C
x

for x > u, then the density of F is f(x)


C
x
+1
for x > u, and, moreover,
we have F
nk
(x) (1
C
x

)
nk
for x > u. The conditional likelihood given N
u
= k is
therefore
L
k
(, C)
n!
(n k)!
(1
C
x

k
)
nk
(C)
k

k
j=1
1
x
+1
j
, u < x
k
< . . . < x
1
.
Replacing the variable x
k
, . . . , x
1
by the observed X
(k)
, . . . , X
(1)
and maximizing w.r.t.
, C, we get the CML-estimates:
=
1
k
k

j=1
log X
(j)
log X
(k)

1
=
H

C =
k
n
(X
(k)
)

.
Another approach, leading also to the Hill estimate, uses F(x) =
L(x)
x

for some L 1
0
directly. By denition of L as a slowly varying function:
lim
x
F(tx)
F(t)
=
1
t

for all t > 0.


Using partial integration and Karamatas theorem on the tail behaviour of integrals of
regularly varying functions, this implies:
1
F(x)
_

x
(log t log x)dF(t)
1

for x .
We replace F by the empirical distribution F
n
(t) =
1
n
#j n; X
j
t and x by a large,
data-dependent level, e.g. x = X
(k)
for k = k(n). If, for n , k and
k
n
0, we
still have X
(k)
.
1

=
1
F
n
(X
(k)
)
_

x
(k)
(log t log X
(k)
)dF
n
(t) =
1
k 1
k1

j=1
log X
(j)
log X
(k)
i.e.
H
.
Theorem 3.1 Let X
1
, X
2
, . . . , i.i.d. with L(X
j
) = F MDA (

), > 0.
a) weak consistency:
H

p
for n, k such that
k
n
0.
28
b) strong consistency:
H

a.s.
for n, k ,
k
n
0,
k
log log n
.
c) asymptotic normality:

k(
H
)
/
^(0,
2
) under further assumptions on k, F.
Choosing an appropriate value for k is again a bias-variance-dilemma. If k , then
var(
H
) but bias as in the POT-method. The following result describes a k which
achieves some kind of balance between bias and variance. However, the second-order
assumptions on F are not veriable in practice.
Proposition 3.5 Let F MDA(

), > 0, and, moreover,


lim
x
1
a(x)
_
F(tx)
F(x)

1
t

_
=
_
1
t

, t > 0, < 0
1
t

log t , t > 0, = 0
for some function a(x) with [a(x)[ 1

and sgn a(x) = const. Let


A(t) =
1

2
a(F

(1
1
t
)), t > 0.
If k ,
k
n
0 such that

kA(
n
k
) = R for all n, then:

k(
H
)
/
^(

)
Based on the Hill estimate for the tail parameter , we also get estimates for the tail of
F or for quantiles of F:
If F MDA(

) and, therefore, F(x) =


L(x)
x

for some L 1
0
, we have for x X
(k)
:
F(x)
F(X
(k)
)
=
L(x)
L(X
(k)
)
_
X
(k)
x
_

_
X
(k)
x
_

as a slowly varying function is nearly constant in the tails. Using F


n
(X
(k)
) =
k
n
F(X
(k)
),
where F
n
denotes the empirical distribution, we get as an estimate of F(x) :
F

H
(x) =
k
n
(
X
(k)
x
)

H
as the Hill tail estimate. Inverting this estimate, we get the Hill quantile estimate for
q 1 :
x
q,H
= X
(k)
_
n
k
(1 q)
_
1/
H
= X
(k)
+X
(k)
_

n
k
(1 q)

H
1
_
with

H
= 1/
H
, where the latter form stresses the similarities and the dierences to the
POT-quantile estimate.
29
3.4 Extreme value theory for time series
Let Z
j
, < j < , be a strictly stationary time series with L(Z
j
) = F i.e.
pr(Z
j
1
x
1
, . . . , Z
j
k
x
k
) = pr(Z
j
1
+t
x
1
, . . . , Z
j
k
+t
x
k
)
for all k 1, < j
1
< j
2
< . . . < j
k
< , x
1
, . . . , x
k
R, t Z. Let X
1
, X
2
, . . . be
i.i.d. with the same distribution L(X
j
) = F.
Let M
n
= maxZ
1
, . . . , Z
n
, M
x
n
= maxX
1
, . . . , X
n
. A simple fundamental relation for
the previous chapters was, using independence of the X
j
,
pr(M
x
n
y) = (pr(X
j
y))
n
= F
n
(y).
In the dependent situation of time series, this argument no longer applies, and L(M
n
) is
not known in terms of F. However, often there is at least a similar approximation:
pr(M
n
y) F
n
(y) F
n
(y) for large n,
where [0, 1] is the so-called extremal index. For a precise denition, recall Proposition
2.2 for the i.i.d. case:
nF(u
n
) i pr(M
x
n
u
n
) e

.
Denition: [0, 1] is called the extremal index of the time series Z
j
, < j < ,
if for some , u
n
nF(u
n
) and pr(M
n
u
n
) e

.
(, if it exists, does not depend on the special choice of , u
n
).
Remarks: a) This denition implies the approximation above as
pr(M
n
u
n
) e

e
nF(u
n
)
= (e
F(u
n
)
)
n
(1 F(u
n
))
n
= F
n
(u
n
).
b) Not every stationary time series has an extremal index. Consider, e.g., Z
j
= A X
j
,
where X
j
are i.i.d., A > 0 is a random variable independent of the X
j
, L(X
j
) MDA(

)
for some > 0 with normalizing sequence c
n
> 0. By Theorem 2.7,
pr(
1
c
n
M
n
y) = pr(
1
c
n
M
x
n

y
A
) = cpr(

|
\
/

\


/
)[/
c

(

/
) = c exp(
/

) for > /,
whereas, again by Theorem 2.7, using F(x) =
L(x)
x

and F(c
n
)
1
n
nF(c
n
y) = nF(c
n
)
F(c
n
y)
F(c
n
)
1
1
y

.
Therefore, with =
1
y

, u
n
= c
n
y, we have nF(u
n
) , but pr(M
n
u
n
) c|
,

.
30
c) For white noise, i.e. i.i.d. Z
j
, we have trivially = 1.
d) If Z
j
is a Gaussian ARMA(p, q)-process, e.g. for p = q = 1 :
Z
t+1
= aZ
t
+b
t
+
t+1
,
t
i.i.d. ^(/,

),
then = 1.
e) If Z
j
is an ARCH(1)-process, i.e.
Z
t
=
t

t
,
t
i.i.d. ^(/, ),

|
= +Z

|
,
then = (a) < 1. can be calculated approximatively, e.g. for a =
1
2
, we have 0.835.
There are two conditions D(u
n
), D
t
(u
n
) which guarantee = 1 as in the i.i.d. case. Finan-
cial time series violate the second one as, due to the practically observed stylized fact of
volatility clustering, extremal observations tend to be closer together than in the standard
situation.
Denition For any sequence of thresholds u
n
as in the denition of the extremal index:
a) D(u
n
)-condition: For n, l 1 let

n,l
= sup [pr( max
iAB
Z
i
u
n
) pr(max
iA
Z
i
u
n
) pr(max
iB
Z
i
u
n
)[
where the supremum is taken over all 1 k n l, A 1, . . . , k, B k +l, . . . , n.
Assume that for n and some sequence l = l(n) with
l
n
0, we have
n,l
0.
b) D
t
(u
n
)-condition:
lim
k
limsup
n
n
[n/k]

j=2
pr(X
1
> u
n
, X
j
> u
n
) = 0.
D(u
n
) states the asymptotic independence of maxima taken over index sets A, B which
are far apart in time (at least by a lag l ). This is a rather weak form of the
mixing conditions in time series analysis which guarantee that the presence is only weakly
depending on the remote past. A consequence of the approximate independence of block
maxima is:
pr(M
n
u
n
) (pr(M
[n/k]
u
n
))
k
()
for xed (or slowly increasing) number k of blocks of length [n/k] each.
D
t
(u
n
) is an anti-clustering condition which states that the occurence of two extrema
(exceeding threshold u
n
) close together (separated by a time lag of at most [n/k]) has a
very small probability.
31
Theorem 3.2 Let Z
j
be a stationary time series with extremal index > 0, X
1
, X
2
, . . .
i.i.d. with L(A
[
) = L(Z
[
) = T. Then, for a GEV H

,
pr
_
M
x
n
d
n
c
n
x
_
H

(x) i pr
_
M
n
d
n
c
n
x
_
H

(x)
for all x in the support of H

.
The maxima of the time series have asymptotically the same type of distribution as the
i.i.d. data, as H

is itself a GEV with the same shape parameter as H

, e.g. for > 0 :


H

(x) = exp(1 +x)


1/
= H

(
x

), 1 +x > 0
with =

and = (1

).
Based on this result, many techniques developped for extreme value statistics of i.i.d. data
may be used for time series if appropriately adapted. One of the main problems is that the
eective sample size is n instead of n, i.e. more data are needed. For the POT-method,
e.g., the idea of approximating the excess distribution by a GPD still works. However,
for estimating , by ML, we have to make particular model assumptions to be able
to write down the likelihood function as, in particular due to clustering of extremes as
consequence of the violation of D
t
(u
n
), the excesses Y
1
, Y
2
, . . . are no longer independent
too. One approach tries instead to make them more independent by replacing clusters
of exceedances by just one exceedance, e.g. by the maximum value in a cluster. Cluster
size has to be chosen such that the number of exceedances is reduced by a factor approx-
imately (a sample of n time series data corresponds to n independent observations with
respect to its information about extremes). Then, the standard POT-method is applied
to the reduced data set.
For applying such modications of the methods of previous chapters one needs the ex-
tremal index . It may be estimated by various methods. We consider only one of them
which may be easily explained without additional background information:
The blocks method: Partition Z
1
, . . . , Z
N
into b blocks of size n each (N = bn, b, n
large). Let M
(k)
n
be the maximum in the k-th block:
M
(k)
n
= max(Z
(k1)n+1
, . . . , Z
kn
), k = 1, . . . , b.
For a large threshold u = u
N
:
N
u
= #j N; Z
j
> u, B
u
= #k b; M
(k)
n
> u,

:=
1
n
log(1
B
u
b
)
log(1
N
u
N
)
.
This estimate of is justied by the following three arguments:
32
a) For large N, pr(M
N
u) F
N
(u) if u = u
N
such that NF(u
N
) . Solving
for we get:

log pr(M
N
u)
N log F(u)
.
b) Estimate F by empirical distribution F
N
: F(u) = 1 pr(Z
j
> u) 1
N
u
N
.
c) Use (*) above, recalling N = bn :
pr(M
N
u) pr(M
n
u)
b

1
b
b

k=1
1
(,u]
(M
(k)
n
)]
b
= (1
B
u
b
)
b
.
Combining a)-c), we get:

b log(1
B
u
b
)
N log(1
N
u
N
)
=

.
33

You might also like