A Bayesian Analysis of Multiple Change Point Problems

A Bayesian analysis of multiple change point
problems in data sequence

Rosangela H. Loschi
1
Departamento de Estatstica, Universidade Federal de Minas Gerais,
Belo Horizonte - MG, Brazil
email: loschi@est.ufmg.br
Pilar L.Iglesias and Reinaldo B. Arellano-Valle
Departamento de Estadstica, Facultad de Matematicas
Ponticia Universidade Catolica de Chile, Santiago -Chile.
email:{pliz, reivalle}@mat.puc.cl
Frederico R. B. Cruz
Departamento de Estatstica, Universidade Federal de Minas Gerais,
Belo Horizonte - MG, Brazil
email: fcruz@est.ufmg.br
Abstract
We apply the product partition model (PPM) to identify multiple change points
in normal means () and variances (
2
), extending some previous works. We
establish a full predictivistic characterization for the prior distribution of and
2
which yields an easier way to obtain the prior distribution of these parameters
by considering opinion on observable quantities only. We also propose a Gibbs
sampling scheme to estimate the posterior distributions of the number of change
points and of the instants when changes occured. We apply the results to identify
multiple changes in the expected return and the volatility of a series of returns
in the Chilean stock market, providing a sensitivity analysis of the model if some
dierent prior specications are considered. We conclude that Chilean market
possesses expected return and volatility clusteres and that the product estimates
are inuenced by the prior specications.
Keywords: Gibbs sampling, predictivism, product partition model, Student-t
distribution.
1
Corresponding author.Departamento de Estatstica, ICEx, Universidade Federal de Minas Gerais, Caixa
Posta 702, CEP: 31270-901- Belo Horizonte, MG, Brasil. Fax: +55 33 3499 5924
1
1 Introduction
In this paper we consider a Bayesian analysis of the multiple change point problem using
the product partition model (PPM) proposed by Hartigan (1990). The PPM allows the
identication of multiple change points in the parameters as well as in functional form of
the distribution function itself. Besides this some exibility is introduced by the PPM in the
analysis of change point problems since the number of change points is a random variable (as
opposed to a known number considered in threshold models (Chen and Lee (1995), Geweke
and Terui (1993)) and in the model considered by Hawkins (2001), for example).
The one change point problem has been approached from the Bayesian point of view by sev-
eral authors. For example, Menzefricke (1981) considers the problem of making inferences
about a change point in the precision of normal data with unknown mean. A single change
point in the functional form of the distribution is explored by Hsu (1984), who considers the
class of the exponential-power distributions (Box and Tiao, 1973) for treating the problem.
Both authors apply their methodologies to stock market prices. The Bayesian identication
of a single change point is also discussed by Smith (1975). The PPM proposed later by
Hartigan (1990) generalizes most situations described before. The PPM is applied by Barry
and Hartigan (1993) to identify multiple change points in the mean of normal random vari-
ables with common variance. Recently, Crowley (1997) provides a new implementation of the
Gibbs sampling in order to solve the problem of estimating normal means by using PPM. The
identication of change points in normal means with common variance is also considered by
Cherno and Zacks (1964) and Gardner (1969) using dierent Bayesian approaches. (More
about change point problems can be found in Carlstein, M ueller and Siegmund (1994).)
The aim of this paper is to apply the PPM presented by Barry and Hartigan (1992) to
identify multiple change points in both the mean and the variance
2
of normal data
2
which are sequentially observed, extending some results from Barry and Hartigan (1993)
and Crowley (1997). We consider a conjugate prior distribution for the parameters and
2
, justifying this choice within a full predictivistic setting due to de Finetti (1937). In fact,
we propose a more tractable way to elicit the prior distibution of and
2
by considering
only opinions on observable quantities. We also use Yaos (1984) algorithm to compute the
posterior estimates or product estimates for these parameters. A Gibbs sampling scheme to
estimate the posterior distributions of the number of change points as well as the instants
when changes occured is proposed. In spite of using the transformation suggested by Barry
and Hartigan (1993), the proposed method to estimate these posterior distribution was not
found in the literature. We also consider dierent prior specications for the probability that
a change occurs in any instant and evaluate the sensitivity of the PPM to these dierent
choices. In order to illustrate the method, the results are applied to identify multiple change
points in the mean and variance of a series of returns of the Chilean stock market. As a
consequence, it is reported that returns in the Chilean stock market are characterized by
changes in the expected or mean return and volatility (measured here as variance).
The PPM introduced by Barry and Hartigan (1992) is briey reviewed in Section 2. Later in
Section 2 we obtain the Student-t PPM for random variables which are normally distributed,
given the mean and variance (both unknown), providing the posterior estimation for these
parameters. A predictivistic characterization of the Student-t PPM, which explains the
choice of the prior distributions adopted in an alternative way, is provided as a by-product.
In Section 3, we introduce procedures based on Gibbs sampling schemes to compute the pos-
terior distributions for the random partition and for the number of change points, assuming
normal data. Finally, in Section 4 we apply the procedures obtained in Sections 2 and 3
to identify change points in the mean return as well as in the volatility of Endesa (Chilean
National Electric Company) returns. We also provide a sensitivity analysis to the PPM.
3
2 The Student-t PPM
In this section we apply the Product Partition Model (PPM) introduced by Barry and Har-
tigan (1992) to identify change points in the mean and variance of normal data observed
through time. We consider a conjugate analysis and present a full predictivistic characteri-
zation to the complete model (likelihood function and prior distribution). First, we present
the denition of PPM and some preliminary results obtained from this model, as given by
Barry and Hartigan (1992, 1993).
2.1 The product partition model (PPM)
Let X
1
, . . . , X
n
be a data sequence. Consider a random partition of the set I = {1, . . . , n}
and a random variable B that represents the number of blocks in . Consider that each parti-
tion = {i
0
, i
1
, . . . , i
b
}, 0 = i
0
< i
1
< < i
b
= n, divides the sequence X
1
, . . . , X
n
into B =
b, b I, contiguous subsequences, which will be denoted by X
[i
r1
i
j
]
= (X
i
r1
+1
, . . . , X
ir
)
,
r = 1, . . . , b. Let c
[ij]
be the prior cohesion associated to the block [ij] = {i + 1, . . . , j},
i, j I {0}, j > i, which represents the degree of similarity among the observations in X
[ij]
(Hartigan, 1990).
Hence, it is said that the random quantity (X
1
, . . . , X
n
; ) follows a PPM, denoted by
(X
1
, . . . , X
n
; ) PPM, if:
i) the prior distribution of is the following product distribution:
P( = {i
0
, . . . , i
b
}) =
b
j=1
c
[i
j1
i
j
]
C

b
j=1
c
[i
j1
i
j
]
, (2.1)
where C is the set of all possible partitions of the set I into b contiguous blocks with
end points i
1
, , i
b
, satisfying the condition 0 = i
0
< i
1
< . . . < i
b
= n, b I;
4
ii) conditionally on = {i
0
, . . . , i
b
}, the sequence X
1
, . . . , X
n
has the joint density given
by:
f(X
1
, . . . , X
n
| = {i
0
, . . . , i
b
}) =
b
j=1
f
[i
j1
i
j
]
(X
[i
j1
i
j
]
), (2.2)
where f
[ij]
(X
[ij]
) is the joint density of the random vector X
[ij]
= (X
i+1
, . . . , X
j
)
.
Notice that the number of blocks B in has a prior distribution given by:
P(B = b)
C
1
b
j=1
c
[i
j1
i
j
]
, b I, (2.3)
where C
1
is the set of all partitions of I in b contiguous blocks.
As shown in Barry and Hartigan (1992), the posterior distributions of and B have the
same form of prior distribution, where the posterior cohesion for the block [ij] is given by
c
[ij]
= c
[ij]
f
[ij]
(X
[ij]
). That is, the PPM induces some kind of conjugacy.
In the parametric approach to the PPM, a sequence of unknown parameters
1
, . . . ,
n
, such
that, conditionally in
1
, . . . ,
n
, the sequence of random variables X
1
, . . . , X
n
has conditional
marginal densities f
1
(X
1
|
1
), . . . , f
n
(X
n
|
n
), respectively, is considered. In this case, it is
considered that two observations X
i
and X
j
, such that i = j, are in the same block, if it
is believed that they are identically distributed. Thus, in this approach to the PPM, the
predictive distribution f
[ij]
(X
[ij]
), which appeared in (2.2), can be obtained as follows:
f
[ij]
(X
[ij]
) =
_
[ij]
f
[ij]
(X
[ij]
|)
[ij]
()d, (2.4)
where
[ij]
is the parameter space corresponding to the common parameter, say,
[ij]
=
i+1
= . . . =
j
, which indexes the conditional density of X
[ij]
.
The prior distribution of
1
, . . . ,
n
is constructed as follows. Given a partition = {i
0
, . . . , i
b
},
b I, we have that
i
=
[i
r1
ir]
for every i
r1
< i i
r
, r = 1, . . . , b, and that
[i
0
i
1
]
, . . . ,
[i
b1
i
b
]
5
are independent, with
[ij]
having (block) prior density
[ij]
(),
[ij]
.
Hence, the goal in the parametric PPM is to obtain the marginal posterior distributions of
the parameters , B, and
k
, k = 1, . . . , n. Barry and Hartigan (1992) have shown that the
posterior distributions of
k
is given by:
(
k
|X
1
, . . . , X
n
) =
k1
i=0
n
j=k
r
[ij]

[ij]
(
k
|X
[ij]
), (2.5)
for k = 1, . . . , n, and the posterior expectation of
k
is given by:
E(
k
|X
1
, . . . , X
n
) =
k1
i=0
n
j=k
r
[ij]
E(
k
|X
[ij]
), (2.6)
for k = 1, . . . , n, where r
[ij]
denotes the posterior relevance for the block [ij], that is:
r
[ij]
= P([ij] |X
1
, . . . , X
n
) =
[0i]
c
[ij]
[jn]
[0n]
, (2.7)
where
[ij]
=
b
k=1
c
[i
k1
i
k
]
, and the summation is over all partitions of {i + 1, . . . , j} in b
blocks with endpoints i
0
, i
1
, . . . , i
b
, satisfying the condition i = i
0
< i
1
< < i
b
= j.
2.2 Product estimates for normal means and variances
Assume that
1
= (
1
,
2
1
), . . . ,
n
= (
n
,
2
n
), such that X
k
|
k
,
2
k
N(
k
,
2
k
), k = 1, . . . , n,
and they are independent. Denote by
[ij]
= (
[ij]
,
2
[ij]
) the common parameter related to the
block [ij]. Thus, the Student-t PPM can be specied by considering the following conditional
(j i)-dimensional normal distribution for the observations in X
[ij]
:
X
[ij]
|
[ij]
,
2
[ij]
N
ji
(
[ij]
1
ji
,
2
[ij]
I
ji
), (2.8)
where 1
k
and I
k
are the k-dimensional vector of one and the k k-dimensional identity
matrix, respectively; as well as by assuming that (
[ij]
,
2
[ij]
) has normal-inverted-gamma
6
prior distribution denoted by (
[ij]
,
2
[ij]
) NIG(m
[ij]
, v
[ij]
; a
[ij]
/2, d
[ij]
/2), that is,
[ij]
|
2
[ij]
N(m
[ij]
, v
[ij]
2
[ij]
) and
2
[ij]
IG(a
[ij]
/2, d
[ij]
/2), (2.9)
where IG(a, b) is the inverted-gamma distribution with parameters a and b. Under (2.8) and
(2.9), the conditional distribution of
[ij]
= (
[ij]
,
2
[ij]
), given the observations in X
[ij]
, is the
normal-inverted-gamma distribution given by
[ij]
|X
[ij]
,
2
[ij]
N(m
[ij]
, v
[ij]
2
[ij]
) and
2
[ij]
|X
[ij]
IG(a
[ij]
/2, d
[ij]
/2),
_
(2.10)
where
m
[ij]
=
(ji)v
[ij]

X
[ij]
(ji)v
[ij]
+1
+
m
[ij]
(ji)v
[ij]
+1
v
[ij]
=
v
[ij]
(ji)v
[ij]
+1
d
[ij]
= d
[ij]
+ j i
a
[ij]
= a
[ij]
+ q
[ij]
(X
[ij]
),
_
_
(2.11)
with
X
[ij]
=
1
j i
j
r=i+1
X
r
,
q
[ij]
(X
[ij]
) =
j
r=i+1
(X
r

X
[ij]
)
2
+
(j i)(

X
[ij]
m
[ij]
)
2
(j i)v
[ij]
+ 1
.
(See OHagan (1994) for details). Therefore, we obtain from (2.10) and (2.6) that the product
estimates for
k
and
2
k
are given by
E(
k
|X
1
, . . . , X
n
) =
k1
i=0
n
j=k
r
[ij]
m
[ij]
(if d
[ij]
> 1) (2.12)
and
E(
2
k
|X
1
, . . . , X
n
) =
k1
i=0
n
j=k
r
[ij]
a
[ij]
d
[ij]
2
(if d
[ij]
> 2), (2.13)
respectively, k = 1, . . . , n, where m
[ij]
, a
[ij]
and d
[ij]
are dened as in (2.11).
Notice that the PPM induced by (2.8) and (2.9) implies that for each block [ij], the ran-
dom vector X
[ij]
follows a (j i)-dimensional Student-t distribution denoted by X
[ij]

t
ji
(m
[ij]
, V
[ij]
; a
[ij]
, d
[ij]
) with density function given by
7
f(X
[ij]
) = c(d
[ij]
, j i)a
d
[ij]
/2
[ij]
|V
[ij]
|
1/2
{a
[ij]
+ (X
[ij]
m
[ij]
)
V
1
[ij]
(X
[ij]
m
[ij]
)}
(d
[ij]
+ji)/2
, (2.14)
where c(d, k) = [
d+k
2
]{[
d
2
]
k
2
}
1
and m
[ij]
= m
[ij]
1
ji
and V
[ij]
= I
ji
+ v
[ij]
1
ji
1
ji
.
The distribution in (2.14) is named by Arellano-Valle and Bolfarine (1995) Generalized
Student-t distribution, which is reduced to the usual Student-t distribution with d
[ij]
degrees
of freedom and the same dispersion matrix when a
[ij]
= d
[ij]
. Notice that assuming this
model, the elements within the same block are correlated and distributed according to a
distribution with heavier tail than the normal distribution. Moreover, for the block [ij] it
follows that
E(X
j
|X
j1
, . . . , X
i
) = E(
[ij]
|X
j1
, . . . , X
i
) = m
[i(j1)]
and
E(X
2
j
|X
j1
, . . . , X
i
) = E[(
2
[ij]
+
2
[ij]
)|X
j1
, . . . , X
i
)
=
(j i)v
[ij]
+ 1
(j i 1)v
[ij]
+ 1
a
[i(j1)]
d
[i(j1)]
2
+ (m
[i(j1)]
)
2
,
where m
[i(j1)]
, d
[i(j1)]
and a
[i(j1)]
are dened as in (2.11).
2.3 Yaos algorithm
In order to compute the posterior relevances given in (2.7) we consider the following recursive
algorithm proposed by Yao (1984).
[00]
= 1,
[01]
= c
[01]
,
[0j]
= c
[0j]
+
j1
t=1

[0t]
c
[tj]
, j = 2, . . . , n,
[(n1)n]
= c
[(n1)n]
,
[in]
= c
[in]
+
n1
t=i+1
[tn]
c
[it]
, i = 1, . . . , n 2,
[nn]
= 1.
_
_
(2.15)
8
where
[ij]
is the summation presented in (2.7) and c
[ij]
is the posterior cohesion of the block
[ij]. A Gibbs sampling scheme to compute the posterior relevances can be found in Loschi
et al. (2003). See Barry and Hartigan (1993) for a Gibbs sampling scheme to compute the
product estimates directly.
2.4 A Predictivistic justication of the Student-t PPM
Sometimes to elicit prior distributions to solve real problems is not an easy task. In this
section we establish a full predictivistic characterization to the Student-t PPM presented in
Section 2.2 where the likelihood function as well as the prior distribution of and
2
are
consequences of judgements on observable quantities. As a by-product this characterization
provides a tractable way to elicit the prior distribution of (,
2
).
As shown in Section 2.2, the Student-t distribution is a location and scale mixture of the
normal distribution, where the mixing measure is the normal-inverted-gamma distribution.
Thus, it follows that the Student-t distribution can be obtained in two stages. Firstly, a
conditional normal distribution, given the location and scale parameters, is specied. Sec-
ondly, we identify a normal-inverted-gamma distribution as the prior joint distribution for
the location and scale parameters. By adopting the predictivistic approach de Finetti (1937),
the rst stage is replaced by an assumption about observables (Iglesias (1993) and Wech-
sler (1993)). For example, the assumption of invariance under some groups of orthogonal
transformation over innite sequences of random quantities implies that the law of sequence
of observables can be represented as mixtures of conditionally normally distributed and in-
dependent quantities (see Kingman (1972), Smith (1981), Diaconis, Eaton and Lauritzen
(1992)). However, this type of condition does not permit the characterization of the mixing
measure. Additional conditions have to be assumed to obtain the mixing measure. Arellano-
9
Valle, Bolfarine and Iglesias (1994), following Diaconis and Ylvisaker( 1979, 1985), charac-
terize a scale mixture of a normal distribution by considering invariance under orthogonal
transformation and additional conditions which determine how to predict X
2
n+1
. In the full
predictivistic approach considered by Arellano-Valle, Bolfarine and Iglesias (1994) the mix-
ing measure (prior distribution) obtained is the inverted-gamma distribution. These authors
also obtain a characterization for a location and scale mixture of normal distributions which
depends on non-observable quantities - that is, it is not a full predictivistic characterization
of the model. Proposition 2.1 in the following improves this partial result.
Consider

X
n
=
1
n
n
i=1
X
i
and S
2
n
=
n
i=1
(X
i

X
n
)
2
. We say that an innite sequence
of random variables X
1
, X
2
, . . . is O(1)-invariant, if for each n 2 and real values m and
r, the conditional distribution of X
[0n]
, given

X
n
= m and S
2
n
= r
2
, is uniform on the n-
sphere centred in m1
n
and with ratio r, that is, on the set S
n
= {(x
1
, . . . , x
n
) R
n
: x
n
=
m,
n
i=1
(x
i
x
n
)
2
= r
2
}.
Proposition 2.1 Let X
1
, X
2
, . . . be an innite sequence of O(1)invariant random variables,
such that P(X
1
= X
2
) = 0 and
E(X
2
3
|X
1
, X
2
) = e(X
2
1
+ X
2
2
) + w,
E(X
3
|X
1
, X
2
) = e(X
1
+ X
2
) + u,
_
(2.16)
then e (0, 1/2), u R, w > u
2
/(1 2e) and, for each n 3,
X
[0n]
t
n
_
u
1 2e
1
n
, I
n
+
e
1 2e
1
n
1
n
;
1
e
_
w
u
2
1 2e
_
;
1 +e
e
_
. (2.17)
The converse also holds.
Proof: From Smiths (1981) theorem, there are random variables and
2
, such that, for
every n 2,
X
[0n]
|,
2
N(1
n
,
2
I
n
),
where
2
> 0 with probability one. Consequently, considering M =
2
i=1
X
i
= 2

X and
Q =
2
i=1
X
2
i
= S
2
+ 2

X
2
and denoting by = (
1
,
2
) = (/
2
, 1/2
2
) the natural
10
parameter of the distribution of (M, Q), given (,
2
), we obtain the following conditional
density of (M, Q) given :
dP
(M, Q) = exp{(
1
,
2
)(M, Q)
t
D()}d(M, Q),
where d(M, Q) =
1
2
(Q M
2
/2)
1
2
d, is the Lebesgue measure dened on R
2
and
D() =
2
1
/(2
2
) log(
2
).
The vector of partial derivates of D() with respect to the natural parameters
1
and
2
is
given by
D
() =
_
2
,

2
1
2
2
2
2
_
= E{(M, Q)|}.
Hence, by using properties of the conditional expectation and conditions (2.16), it follows
that
E{D
()|(M, Q)} = E{E{(M, Q)|

1
,
2
}|(M, Q)}
= 2E{E{(X
3
, X
2
3
)|(,
2
)}|X
1
, X
2
}
= 2e(X
2
+ X
1
; X
2
2
+ X
2
1
) + 2(u, w).
From Theorem 3 in Diaconis and Ylvisaker (1979) the following prior density for (,
2
) is
obtained:
(,
2
) = K
_
1
2
_ 1
2e
+
3
2
exp
_
1
2e
2
_
w
u
2
12e
__
_
12e
e
2
_1
2
exp
_
12e
2e
2
_

u
12e
_
2
_
.
_
_
_
(2.18)
Consequently, (2.17) is obtained (see OHagan (1994) pp.244). The converse is obtained by
using the properties of the Student-t distribution (see Arellano-Valle and Bolfarine (1995)).
Proposition 2.1 improves some partial results from Arellano-Valle, Bolfarine and Iglesias
(1994) by providing a full predictivistic characterization to a location and scale mixture of
normal distributions. Extensions of this result to Student-t linear models can be found in
Loschi, Iglesias and Arellano-Valle (2003).
Corollary 2.1 Consider the assumptions established in Proposition 2.1. Then, the parame-
ters and
2
have the following inverse-gamma-normal distribution:
|
2
N
_
u
12e
,
e
2
12e
_
and
2
IG
_
1
2e
_
w
u
2
12e
_
,
1+e
2e
_
.
_
_
_
(2.19)
11
Thus, under O(1)-invariance assumptions the representations in (2.16) are equivalent to the
specication in (2.19).
3 Posterior Distributions for and B
In this section, we provide the exact posterior distribution for and B assuming the prior
cohesions suggested by Yao (1984) and propose a Gibbs sampling scheme to estimate these
posterior distributions.
3.1 Exact Posterior Distributions
Let p, 0 p 1, be the probability that a change occurs at any instant in the sequence.
Therefore the prior cohesion for block [ij] corresponds to the probability that a new change
takes place after j i instants, given that a change has taken place at instant i, that is,
c
[ij]
=
_
p(1 p)
ji1
if j < n
(1 p)
ji1
if j = n.
(3.1)
Notice that the prior cohesions given in (3.1) imply that the sequence of change points
establishes a discrete renewal process, with occurence times identically distributed with
geometric distribution. If a high value for p is considered we are previously assuming that
there are small blocks of data (or, equivalently, a large number of change points) in the
data sequence. Assuming these cohesions, it follows from expression (2.1) that the prior
distribution of takes the form
P( = {i
0
, i
1
, . . . , i
b
}) = p
b1
(1 p)
nb
,
b I, which depends only on the number of observations n and the number of blocks b in the
partition, but does not depend on the positions where the change points occur. Moreover,
12
it follows that the prior distribution for the random variable B is given by
P(B = b) = C
n1
b1
p
b1
(1 p)
nb
, b I,
where C
n1
b1
is the number of distinct partitions of I into b contiguous blocks. Consequently,
we have that:
E(B) = (n 1)p and (3.2)
V (B) = (n 1)p(1 p). (3.3)
From Section 2.1 we only need to nd the posterior cohesion for each block to obtain the
posterior distribution of and B. Recalling that the posterior cohesion for the block [ij] is
obtained by multiplying the correspondent prior cohesion by the predictive distribution of
X
[ij]
, which is the Student-t distribution dened in (2.14), the following result is obtained:
c
[ij]
=
_
_
p(1p)
ji1
c(d
[ij]
,ji)a
d
[ij]
/2
[ij]
(1+(ji)v
[ij]
)
1/2
{a
[ij]
+q
[ij]
(X
[ij]
)}
(d
[ij]
+ji)/2
, if j < n
(1p)
ji1
c(d
[ij]
,ji)a
d
[ij]
/2
[ij]
(1+(ji)v
[ij]
)
1/2
{a
[ij]
+q
[ij]
(X
[ij]
)}
(d
[ij]
+ji)/2
, if j = n,
where c(d, k) and q
[ij]
(X
[ij]
) are dened as in (2.14) and (2.11), respectively.
Notice that the exact calculation of the posterior distribution for and B demands great
computational eorts, in spite of the simplications introduced by Yaos (1984) algorithm. In
the next section we propose a Gibbs sampling scheme for computing the posterior distribution
for the random partition and for the random quantity B, which is based on the sample
generated by using the Gibbs sampling approach (see Gelfand and Smith (1990), Gamerman
(1997) for MCMC methods).
3.2 Gibbs Sampling Approach
Consider the auxiliary random quantity U
i
suggested by Barry and Hartigan (1993) which
reects whether a change point has, or has not occured at the time i, that is,
U
r
=
_
1 if
r
=
r1
0 if
r
=
r1
,
13
r = 2, . . . , n (U
1
= 0). Thus, the random quantity is perfectly identied by considering a
vector of these random quantities, namely, (U
2
, . . . , U
n
), n > 2. Consequently, we can esti-
mate the posterior probability for each particular partition = {i
0
, i
1
, . . . , i
b
} by computing
the proportion of samples of (U
2
, . . . , U
n
) such that U
ir
= 0 for r = i
k
+ 1, k = 1, . . . , b 1,
and U
r
= 1 otherwise.
Similarly, it is possible to use the above procedure to estimate the posterior distribution of
B noticing that
B = 1 +
n1
i=1
(1 U
i
).
The vector or partition (U
k
2
, . . . , U
k
n
) at step k is generated by using the Gibbs sampling as
follows. Starting with an initial sampling (U
0
2
, . . . , U
0
n
) of the random vector (U
2
, . . . , U
n
), at
step k, the r-th element U
k
r
is generated from the conditional distribution
U
r
|U
k
2
, . . . , U
k
r1
, U
k1
r+1
, . . . , U
k1
n
; X
1
, . . . , X
n
,
r = 2, . . . , n. To generate the vectors above, it is sucient to consider the ratios given by
the following expressions:
R
r
=
P(U
r
= 1|A
k
; X
1
, . . . , X
n
)
P(U
r
= 0|A
k
r
; X
1
, . . . , X
n
)
,
r = 2, . . . , n, where A
k
r
= {U
k
2
= u
2
, . . . , U
k
r1
= u
r1
, U
k1
r+1
= u
r+1
, . . . , U
k1
n
= u
n
}. Hence,
considering a degenerate prior distribution for p, we have that
R
r
=
c
[xy]
c
[xr]
c
[ry]
,
where c
[ij]
is the posterior cohesion for block X
[ij]
,
x =
_
max
i
{0 < i < r, U
k
i
= 0}, if U
k
i
= 0 for some i {2, . . . , r 1}
0, otherwise,
and
y =
_
min
i
{r < i < n, U
k1
i
= 0}, if U
k1
i
= 0 for some i {r + 1, . . . , n}
n, otherwise.
14
Consequently, the criterion for choosing the vectors (U
k
2
, . . . , U
k
n
) becomes
U
k
r
=
_
1, if
c
[xy]
c
[xr]
c
[ry]
1u
u
0, otherwise,
r = 2, . . . , n, where u is a random number chosen from the uniform distribution U(0, 1). This
completes the procedure to estimate the posterior distributions for the random partition
and for the number of blocks B. (Loschi et al. (2003) extend the PPM presented in this
paper by considering a beta prior distribution for p. In these cases, the choice of p seems
less arbitrary since the beta family is rich enough to describe the uncertainty about p under
many practical circumstances. For example, a proper non-informative prior distribution for
p can be specied declaring the beta parameters equal to 1. A comparison between the
results obtained here and those obtained by Loschi et al. (2003) can be found in Loschi and
Cruz (2002a) which conclude that the product estimates obtained by using a degenerate
prior distribution to p or a beta prior distribution with modal value close to this x value of
p are similar.)
4 Applications: The Chilean Stock Market Behavior
The ultimate goal of this section is to present a sensitivity analysis for the PPM assuming
dierent degenerate prior distributions for p and to identify multiple change points in the
mean (or expected return) and variance (volatility) of the returns of the Endesa stock series
(Figure 1) within the period from 1987 to 1994 using the methodology developed in the
previous section. As usual in nance, a return series is dened by using the transformation
X
t
= (P
t
P
t1
)/P
t1
, where P
t
is the price in the month t. Dened in this way, the returns
within each block can be considered normally distributed, given the expected return and the
volatility (Correa, 1998).
15
Figure 1: Returns of ENDESA
Year
R
e
t
u
r
n
s
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
0
0
.
2
0
.
4
4.1 Sensitivity analysis
We adopt the following normal-inverted-gamma prior specication to describe uncertainty
on the parameter (
[ij]
,
2
[ij]
):
[ij]
|
2
[ij]
N(0,
2
[ij]
), and
2
[ij]
IG
_
0.01
2
,
4
2
_
.
We also consider the prior cohesions given in (3.1). Since a small number of changes is
expected we consider p = 0.01 and 0.1 to evaluate the inuence of these prior specications on
the posterior estimates of ,
2
, B and . We also consider very dierent prior specications
for the chance of a change occurring assuming p = 0.5 and p = 0.9. In these cases, a higher
number of change points is expected in the prior evaluation.
In the Gibbs sampling scheme, we generate 5,000 samples of (U
2
, . . . , U
n
) with dimension 94,
starting from a vector of zeros. After convergence has been reached, we discarded the initial
1,000 interactions. A lag of 1 is selected since the correlation among vectors is low.
The algorithm used here were coded in C++. All tests were performed in a PC-like computer,
166 MHz, 32 MB RAM, running Windows 98, and using the freely available C++ compiler
DJGPP (http://www.delorie.com/djgpp).
Figures 2 and 3 show the posterior estimates of
k
and
2
k
, k = 1, . . . , 95, that is, for the
16
monthly mean returns and volatility, respectively. The product estimates of (
2
) are
contrasted with the centered arithmetic moving average (variance) of order 10 for the means
(variances), respectively. It is noticeable that more instants are identied as a change point
if higher values of p are considered. We also notice that similar estimates are obtained for
close values of p. If p = 0.1 we observe that the estimates obtained using PPM are very
similar to the nave estimates.
Figure 2: Posterior means of
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Year
M
e
a
n

R
e
t
u
r
n
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
0
0
.
2
0
.
4
p=0.01
*
Data
Mean R.
M. Aver.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Year
M
e
a
n

R
e
t
u
r
n
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
0
0
.
2
0
.
4
p=0.1
*
Data
Mean R.
M. Aver.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Year
M
e
a
n

R
e
t
u
r
n
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
0
0
.
2
0
.
4
p=0.5
*
Data
Mean R.
M. Aver.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Year
M
e
a
n

R
e
t
u
r
n
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
0
0
.
2
0
.
4
p=0.9
*
Data
Mean R.
M. Aver.
Figure 3: Posterior means of
2
Year
V
o
la
t
ilit
y
87 88 89 90 91 92 93 94 95
0
.
0
0
5
0
.
0
1
5
0
.
0
2
5
p=0.01
Volatility
Moving Variance
Year
V
o
la
t
ilit
y
87 88 89 90 91 92 93 94 95
0
.
0
1
0
.
0
2
0
.
0
3
0
.
0
4
p=0.1
Volatility
Moving Variance
Year
V
o
la
t
ilit
y
87 88 89 90 91 92 93 94 95
0
.
0
1
0
.
0
2
0
.
0
3
0
.
0
4
p=0.5
Volatility
Moving Variance
Year
V
o
la
t
ilit
y
87 88 89 90 91 92 93 94 95
0
.
0
0
5
0
.
0
1
5
0
.
0
2
5
p=0.9
Volatility
Moving Variance
17
Figure 4 presents the most probable partition for dierent values of p. Notice that similarly
to the conclusions we drew from Figures 2 and 3, we can observe that for higher values of p
more instants are identied as change points.
Figure 4: Posterior distribution of
****************************
*
***************
*
**********
*
**************************************
Year
R
e
t
u
r
n
s
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
2
0
.
6
1
.
0
*
Patition
Endesa Returns
p=0.01
********
*
***********
*
*
*
*
*
***
*
************
*
**
*
**********
*
***
*
****
*
***********
*
**************
*
**
Year
R
e
t
u
r
n
s
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
2
0
.
6
1
.
0
*
Patition
Endesa Returns
p=0.1
*
**
**
*
*
*
*
****
*
*
*
*
**
*
*
***
*
*
********
*
****
*
**
*
*
***
*
***
*
****
*
**
*
**
**
*
*
*
*
*
***
*
*
*
*
*
*
**
**
*
*
*
*
**
*
*
*
*
**
Year
R
e
t
u
r
n
s
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
2
0
.
6
1
.
0
*
Patition
Endesa Returns
p=0.5
*
*
*****************************************
*
**
**
***************
*
**
*
*************
*
************
*
Year
R
e
t
u
r
n
s
87 88 89 90 91 92 93 94 95
-
0
.
2
0
.
2
0
.
6
1
.
0
*
Patition
Endesa Returns
p=0.9
Table 1 presents the prior and posterior probabilities of the most probable partition. Notice
that the probability of occurrence of the posterior most probable partition increase substan-
tially in the posterior evaluation.
Table 1: Prior and posterior probability of the most probable partition
p Prior probability Posterior probability
0.010 4.007 10
7
0.3567
0.100 1.593 10
16
0.0173
0.500 2.524 10
29
0.0013
0.900 1.161 10
13
0.0285
From Figure 5 we can notice that the posterior distribution of the number of blocks in the
partition B (or for the number of change points in the time series B1) has only one mode
independently of the value assumed for p. We can also notice that if p is small the posterior
18
distribution of B are centered in lower values (see Table 2 for the descriptive statistics of
the posterior distribution of B). It is also noticeable that for all values of p the probability
of having one or more change point in the Endesa series is one.
Figure 5: Posterior distribution of B
No. Blocks
p
r
o
b
a
b
ilit
y
0 20 40 60 80
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
p=0.01
No. Blocks
p
r
o
b
a
b
ilit
y
0 20 40 60 80
0
.
0
0
.
0
4
0
.
0
8
0
.
1
2
p=0.1
No. Blocks
p
r
o
b
a
b
ilit
y
0 20 40 60 80
0
.
0
0
.
0
2
0
.
0
4
0
.
0
6
0
.
0
8
p=0.5
No. Blocks
p
r
o
b
a
b
ilit
y
0 20 40 60 80
0
.
0
0
.
0
4
0
.
0
8
0
.
1
2
Table 2: Descriptive statistics - prior and posterior distributions of B
Prior Distribution Posterior Distribution
p Mean Variance Mean Variance Mode Median Q1 Q3
0.010 0.940 0.9306 5.093 2.158 4 4 4 6
0.100 9.400 8.4600 17.075 9.682 16 17 15 19
0.500 47.000 23.5000 50.521 23.436 50 50 47 54
0.900 84.600 8.4600 84.753 9.263 85 85 83 87
Notice from Table 2 that the summaries of location (mean, mode, median) of the posterior
distribution of B as well as the mean of the prior distribution of B increase if p increases. We
also observe that the posterior variance is higher than the prior variance for p = 0.01, 0.1 and
0.9. The opposite conclusion can be drawn for 0.5. It is also noticeable that the posterior
variance increases if p increases for values of p up to 0.5. (See more about the inuence of
19
prior specications in the PPM in Loschi and Cruz (2002a,b)).
4.2 A note on the model specication
We suppose that, conditionally in the average stock return and its total standard deviation,
any path followed by the returns within a block presenting the same average returns and
total standard deviation is equally likely to occur, which is mathematically expressed
by the O(1)-invariance assumption amongst the returns. Hence, assuming extendibility -
that is, assuming that all subsequences (X
i+1
, . . . , X
j
) are part of an innite O(1)-invariant
sequence, we have that the joint distribution of the Endesa returns in the same block, X
[ij]
,
can be represented as a mixture of the product of the normal distributions N(
[ij]
,
2
[ij]
)
(Smith, 1981), what agrees with Correa (1998) assumptions about the Chilean market.
We also assume the conditions in (2.16) understanding that these conditions elucidate the
considerations made by Mandelbrot (1963), as well as what Maeda (1996) suggests to be
reasonable for the Chilean market, that large returns tend to be followed by large returns
and small returns tend to be followed by small returns and changes in this behavior are
produced by unanticipated information.
These assumptions leads to a predictive distribution with heavy tails (Student-t distribution)
for the returns in the same block which also discloses a structure of correlation amongst the
returns. Since the Chilean stock market is an emerging market, and so it can experience more
changes than a developed market, because it is more susceptible to the political atmosphere,
the Student-t distribution is more appropriate to describe the behavior of its stock returns
(Duarte Jr. and Mendes (1997) and Mendes (2000)). (Notice that the normality assumption
adopted by Hsu (1984) (see also Hawkins (2001)) to describe the behavior of the Dow Jones
Industrial Average is stronger than the assumptions we did - we only state that data is
conditionally normally distributed.)
20
The prior cohesions given in (3.1) and considered to analyze the Endesa series imply that
the sequence of change points establishes a discrete renewal process, with occurence times
identically distributed with geometric distribution. This type of product partition distribu-
tion is adequate to represent reasonably well the situation described by Mandelbrot (1963)
(and later by Maeda (1996) for the Chilean stock market), who established that changes in
the behavior of the series of stock returns are a consequence of the receipt of information not
previously anticipated, so that the past change points are noninformative about the future
change points.
5 Conclusions
In this paper we have applied the PPM to identify multiple change points in normal means
and variances for data sequences, extending previous results from Barry and Hartigan (1993)
and Crowley (1997). We have proposed a Gibbs sampling scheme to estimate the posterior
distributions of the number of change points as well as for the instants when the changes
occured. We have applied the method to indentify change points in the mean return and
the volatility of the Endesa stock returns and provided a sensitivity analysis for the PPM.
The results indicate that the procedures proposed to compute the posterior estimates of B
and are quite eective, simple and easy to implement. We also conclude that the prior
specications for p strongly inuences on both the posterior distributions of the number of
change points and of the instants when changes occured as well as in the product estimates of
the mean and the variance. Since p is crucial for the inferences we can estimate p assuming a
prior distribution for it. In this case, the conjugacy of PPM model is lost and, it is impossible
to use Yaos procedure (see Loschi and Cruz (2002a)). A procedure to obtain the posterior
distribution of p using Gibbs sampling can be found in Loschi and Cruz (2003)
We believe that some improvement would be obtained if a modication could be done in Yaos
21
procedure, such that a prior distribution to p could be included. An alternative algorithm is
considered in Quintana and Iglesias (2003) in connection with the non-parametric approach
to cluster analysis.
Some open questions remain. Can dierent prior specications for the mean and variance
aect the product estimates? Would it be possible to nd even simpler implementations for
the PPM? How well does the methodology t in the presence of outliers? These and other
similar questions are interesting topics for future research in this area.
6 Acknowledgements
This research supported in part by PRPq-UFMG, grant 4801-UFMG/RTR/ FUNDO/PRPq/
RECEM DOUTORES/00; and CAPES; FONDECYT, grants 8000004, 1971128 and 1990431;
and Fundacion Andes (Chile). The authors hereby would like to thank Heleno Bolfarine and
Wilfredo Palma for their valuable comments and contributions to this paper.
References
Arellano-Valle, R. B. and H. Bolfarine. On some characterizations of the t-distribution.
Statistics & Probability Letters, 25:7985, 1995.
Arellano-Valle, R. B., H. Bolfarine, and P. L. Iglesias. A predictivistic interpretation of the
multivariate t distribution. Test, 2(3):221236, 1994.
Barry, D. and J. A. Hartigan. Product partition models for change point problems. The
Annals of Statistics, 20(1):260279, 1992.
Barry, D. and J. A. Hartigan. A Bayesian analysis for change point problem. Journal of the
American Statistical Association, 88(421):309319, 1993.
22
Box, G. E. P. and G.C. Tiao. Bayesian Inference in Statistical Analysis. Addison-Wesley,
New York, USA, 1973.
Chen, C. W. S. and J. C. Lee. Bayesian inference of threshold autorregressive models. Jounal
of Time Series Analysis, 16(5):483492, 1995.
Cherno, H. and S. Zacks. Estimating the current mean of a normal distribution which is
subjected to changes in time. Annals of Mathematical Statistics, 35:9991018, 1964.
Correa, L.. Modelacion Bayesiana de puntos de cambio en la volatilidad. Masters thesis,
Facultad de Matematicas - Pontifcia Universidad Catolica de Chile, Chile, 1998. (in
Spanish).
Crowley, E. M. Product partition models for normal means. Journal of the American
Statistical Association, 92(437):192198, 1997.
de Finetti, B.. La prevision: ses lois logiques, ses sources subjectives. Annales de lInstitute
Henri Poincare, 7:168, 1937.
Diaconis, P., M. L. Eaton, and S. L. Lauritzen. Finite de netti theorems in linear models
and multivariate analysis. Scandinavian Journal of Statistics, 19:298315, 1992.
Diaconis, P. and D. Ylvisaker. Conjugate priors for esponential families. Annals of Statistics,
7:269281, 1979.
Diaconis, P. and D. Ylvisaker. Quantifying prior opinion. In J. M. Bernardo, M. H. DeGroot,
D. V Lindley, and A. F. M Smith, editors, Bayesian Statistical 2, pages 133156. North-
Holland, Elsevier Science, 1985.
Duarte Jr., A. M. and B. V. M. Mendes. Product partition models for normal means.
Emerging Markets Quarterly, 1(4):8595, 1997.
23
Gamerman, D. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference.
Chapman and Hall, London, UK, 1997.
Gardner, L. A. On detecting change in the mean in the normal variates. Annals of Mathe-
matical Statistics, 40:116126, 1969.
Gelfand, A. E. and A. F. M. Smith. Sampling-based approaches to calculating marginal
densities. Journal of the American Statistical Association, 85:398409, 1990.
Geweke, J. and N. Terui. Bayesian threshold autoregressive models for nonlinear time series.
Journal of Time Series Analysis, 14(5):441454, 1993.
Hartigan, J. A. Partition models. Communication in Statistics - Theory and Method, 19(8):
27452756, 1990.
Hawkins, D. M. Fitting multiple change-point models to data. Computational Statistics &
Data Analysis, 1:323341, 2001.
Hsu, D. A. A bayesian robust detection of shift in the risk struture of stock market returns.
Journal of the American Statistical Association, 77(2):407416, 1984.
Iglesias, P. L. Formas nitas do teorema de de Finetti: A visao preditivista da Inferencia
Estatstica em popula coes nitas. PhD thesis, Departamento de Estatstica, Instituto
de Matematica e Estatstica, Universidade de Sao Paulo, Sao Paulo, Brazil, 1993. (in
Portuguese).
Kingman, J. F. C. On random sequences with spherical symetry. Biometrika, 59:183197,
1972.
Loschi, R. H. and F. R. B. Cruz. An analysis of the inuence of some prior specications in
24
the identication of change points via product partition model. Computational Statistics
& Data Analysis, 39:477-501, 2002.
Loschi, R. H., F. R. B. Cruz. Appling the product partition model to the identication of
multiple change points. Advances in Complex Sistems 5 (4):371-387,2002.
Loschi, R. H., F. R. B. Cruz. Extension to the Product Partition Model: Computing the
Probability of a Change. (Manuscript submmited to publication), 2003.
Loschi, R. H., F. R. B. Cruz, P. L. Iglesias, and R. B. Arellano-Valle. A gibbs sampling scheme
to the product partition model: An application to change-point problems. Computers &
Operations Research, 30(3):463482..
Loschi, R. H., P. L. Iglesias, and R. B. Arellano-Valle. Predictivistic characterization of
multivariate Student-t models. Journal of Multivariate Analysis,85 (1):10-23, 2003.
Maeda, M. A. Volatilidad estocastica en el mercado accionario chileno. Masters thesis,
Facultad de Ciencias Economicas y Administrativas - Universidad de Chile, Chile, 1996.
(in Spanish).
Mandelbrot, B.. The variation of certain speculative prices. Journal of Business, 36:394419,
1963.
Mendes, B. V. M. Computing robust risk measures in emerging equity markets using extreme
value theory. Emerging Markets Quarterly, pages 2441, 2000.
Menzefricke, U. A Bayesian analysis of a change in the precision of a sequence of independent
normal random variables at an unknown time point. Applied Statistics, 30(2):141146,
1981.
25
Carlstein, E. , M ueller, H. G. and D. Siegmund (eds.) Change-point problems. IMS Lecture
Notes - Monograph Series, 23, USA, 1994.
OHagan, A. Kendalls Advanced Theory of Statistics 2A, chapter Bayesian Inference. John
Wiley & Sons, New York, NY, 1994.
Quintana, F. A. and P. L. Iglesias. Nonparametric Bayesian clustering and product partition
models. Journal of the Royal Statistical Society B (to appear), 2003.
Smith, A. F. M. A Bayesian approach to inference about a change-point in a sequence of
random variables. Biometrika, 62(2):407416, 1975.
Smith, A. F. M. On random sequences with centered spherical symmetry. Journal of the
Royal Statistical Society, B, 43:203241, 1981.
Wechsler, S. Exchangeability and predictivism. Erkenntnis: International Journal of Ana-
lytic philosophy, 38:343350, 1993.
Yao, Y. Estimation of a noisy discrete-time step function: Bayes and empirical Bayes ap-
proaches. The Annal of Statistics, 12(4):14341447, 1984.
26

A Bayesian Analysis of Multiple Change Point Problems

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Bayesian Analysis of Multiple Change Point Problems

Uploaded by

Copyright:

Available Formats

A Bayesian analysis of multiple change point

problems in data sequence

()|(M, Q)} = E{E{(M, Q)|

You might also like