Professional Documents
Culture Documents
com/
SIMULATION
http://sim.sagepub.com/content/88/10/1202
The online version of this article can be found at:
DOI: 10.1177/0037549712445233
2012 88: 1202 originally published online 22 May 2012 SIMULATION
Khamron Sunat, Panida Padungweang and Sirapat Chiewchanwattana
Generalized Transport Mean Shift algorithm for ubiquitous intelligence
Published by:
http://www.sagepublications.com
On behalf of:
http://sim.sagepub.com/subscriptions Subscriptions:
http://www.sagepub.com/journalsReprints.nav Reprints:
http://www.sagepub.com/journalsPermissions.nav Permissions:
http://sim.sagepub.com/content/88/10/1202.refs.html Citations:
What is This?
n
i =1
K (x x
i
)= k k
2
_ _
; 1
where K(t) is a kernel function and s is a constant band-
width such that s>0. A mode of the density is a position
x having zero gradient, rp(x) =0: The MS algorithm is an
iterative procedure for seeking the mode of density estima-
tion with repeated shifting of the position x towards high
density and is written as
Sunat et al. 1203
at Bibliotheques de l'Universite Lumiere Lyon 2 on November 4, 2012 sim.sagepub.com Downloaded from
x
( +1)
=f (x
()
); 2
with
f (x) =
n
i =1
K
0
(x x
i
)= k k
2
_ _
x
i
n
j =1
K
0
(x x
j
)
_
_
_
_
_
2
_ _ ; 3
where K(t) = dK/dt and is the iteration index. Using the
Gaussian function K(t) = e
-t/2
, (2) and (3) can be reduced
12
to
x
( +1)
=
n
i =1
p(ijx
()
)x
i
4
and
p(ijx
()
) =
exp (
1
2
(x
()
x
i
)
_
_
_
_
_
2
)
n
j =1
exp (
1
2
(x
()
x
j
)
_
_
_
_
_
2
)
: 5
The algorithm will be terminated if the shift distance is
equal to zero or is less than a tolerant threshold as follows:
x
()
x
(1)
_
_
_
_
threshold: 6
The clustering is performed by representing each mode of
the kernel density estimate as the cluster and the data
points are converged to their corresponding modes. This
idea of plotting two clusters can be depicted graphically,
as shown in Figure 1(a). Figure 1(b) shows that data points
are shifted rising toward their mode. The solid black lines
represent the trajectory of each data point.
The Agglo-MS
10,11
is an agglomerative MS clustering
algorithm. It is built upon an iterative query set compres-
sion mechanism motivated by the quadratic bounding opti-
mization characteristic of the MS algorithm. It performs
well on segmentation of images and clustering of moder-
ate scale data sets. Since the space is limited, the interested
reader is directed to Yuan et al.
10,11
3. Generalized Transport Mean Shift
algorithm
In general, there are many positions shifting through the
same trajectory and trying to place themselves at their mode,
as the example shows in Figure 1. Considering Figure 2, the
i
th
data is shifted to the position that is closed to the original
position of the k
th
data at iteration . Also, the direction of
the shift vector of ith data is in parallel to the trajectory vec-
tor of the kth data. Therefore, the ith data should be consid-
ered as the trailer of the kth data, which is assumed to be a
transporter of the ith data. Hence, the shifting of the ith
data need not be computed in the next iteration.
Even though the jth data at iteration is also shifted to
the position near the original position of the kth data, its
mode is different from the mode of the kth data. One of
the main ideas of this work is that the nearest point that is
assigned as the transporter should have the same direction
of trajectory vector as the direction of the shift vector of
the trailer.
In order to acquire the solution, four matrices are intro-
duced. The first matrix is a matrix of the trajectory vector
of all the data points. The second matrix stores the indexes
of the transporters. The last two matrices are logical, indi-
cating the convergence status and the present status of the
data points. The details of each matrix are as follows.
Let UR
mxn
. The ith column of U, denoted by u
i
, is a
unit trajectory vector of the ith data at the first iteration
and can be computed as
Figure 1. (a) The plotting of two data clusters. (b) The trajectory of data point by applying the Mean Shift algorithm to a two-
dimensional data set. The third axis denotes density of data.
1204 Simulation: Transactions of the Society for Modeling and Simulation International 88(10)
at Bibliotheques de l'Universite Lumiere Lyon 2 on November 4, 2012 sim.sagepub.com Downloaded from
u
i
=
x
1
i
x
0
i
x
1
i
x
0
i
k k
: 7
Let TR
1xn
be a transporter matrix, where the ith column
of T denoted by t
i
is an index of a transporter of the ith
data, such that
t
i
=
arg min
j
x
i
x
j
_
_
_
_
_
_
2
_ _
if
ij
i otherwise;
_
_
_
8
where a is a constant threshold.
ij
denotes the generalized
angle between the trajectory vector (u
j
) and the shift vector
(v
i
). is in the range of [0 1] and is defined by
ij
=
1
2
(1
v
i
u
j
v
i
j j u
j
): 9
The vectors are parallel and have the same direction, if =
0. If = 1, the vectors are still parallel but have an opposite
direction of 180 degrees.
The last two matrices are C
1xn
, the matrix of the con-
vergence status, and A
1xn
, the matrix of the present status.
a
i
is the ith column of A and c
i
is the ith column of C,
expressed as follows:
c
i
=
1
0
_
_
_
if the i
th
data converge
otherwise
; 10
a
i
=
1
0
_
_
_
if the i
th
data should be present
to the next iteration
otherwise
: 11
The GTMS algorithm is shown as pseudo-code Algorithm
1. To reiterate, the trajectory vector is computed in step 2
and is assigned in step 5. Step 6, 9, and 10 are normally
performed by the MS algorithm. If the transporter of the
ith data is, however, found in step 8, then step 9 need not
be computed, which is computing the exponential of all
the distance values in step 6 using (5). Furthermore, the
shift position of the ith data is assigned to t
i
. We imagine
that the more trailers found then the less data that needs to
be computed in the next iteration.
Algorithm1 (GTMS)
Initialization:
C is initialized to false.
A is initialized to true.
Initialize the value of parameter D.
for each x
i
X do
1. Compute x
i
1
2. Calculate the shift distance and the trajectory vector
Set z = x
W
i
- x
i
Set s = ||z||
2
Set v
i
= z/s
0.5
3. Considering the convergence
if s threshold then
c
i
= true , a
i
= false
end if
4. Set i
th
to be a transporter itself, t
i
= i
end for
5. Set W = 1, U=V // u
i
is a unit trajectory vector
while these are a
i
= true do
6.Calculate the distance (d
k
) from x
W
i
to x
k
X,1 k n
7. Find the j
th
data that nearest to the i
th
data
j = argmin
k
d
k
8.Investigate the transporter
Compute G =(1- v
W
i
u
j
/ |v
W
i
||u
j
|)
if i j & t
i
i & G d D then
set t
i
= j, a
i
= false // assign transporter
and inactive trailer
else
9. Compute x
i
W+1
10. Follow step 2 and 3
end if
Increate the number of iteration by setting W = W+1.
HQGwhile
According to step 8, the transporter of the ith data is
found and assigned to t
i
. This means that the ith data
should move toward the same mode as its transporter.
Hierarchical clustering is simultaneously performed in this
step using matrix T. At the convergence step, the depth-
first search algorithm
13
is used for retrieving all the trailers
of the remaining transporters. At the convergence step,
each cluster may have several transporters assigned in the
same cluster. Thus, it would be easy to perform clustering
for those transporters by using the distance threshold. The
appropriate value of threshold is a proportion of the band-
width of a density estimator. In this paper we use half of
the bandwidth to be the desired threshold.
Although there is an additional parameter () in our
algorithm, this parameter makes the algorithm more flex-
ible. It is used for choosing a suitable transportertrailer
relationship. Normally, is between [0, 1]. For = 0, the
trailer is transported by the transporter; only the shift vec-
tor of the trailer and the trajectory vector of the transporter
are parallel. For = 1, the angle of the vectors in the range
of 0180 degrees is acceptable. Hence, the value of (8) is
spanned in range [0,1]. However, the parameter can be
Figure 2. Trajectory vector of x
k
, which is assigned to be the
transporter and shift direction of x
i
and x
j
at iteration .
Sunat et al. 1205
at Bibliotheques de l'Universite Lumiere Lyon 2 on November 4, 2012 sim.sagepub.com Downloaded from
assigned out of this range. In the case of > 1, the near-
est position of the shift position at the ith data is always
assigned to be the transporter of the ith. In this case, the
GTMS algorithm is the fastest. Once < 0, there is no
transporter to be assigned. In this case, the GTMS algo-
rithm performs as the standard MS algorithm. Therefore,
the MS algorithm can be considered as a special case of
the proposed GTMS algorithm.
The value of a can be adapted in each iteration.
According to the nature of a mode-seeking algorithm, the
data that belongs to a different mode will be shifted far
from each other when the number of iterations is increased.
A technique for adjusting a is also provided in this paper.
Let
0
be an initial value and
be a maximum acceptable
threshold at iteration . The value of can be assigned as
= min (
,
0
+(
)i); 12
where i denotes the number of iterations. The value of
can linearly be increased from
0
to
p
i =1
q
i
), where q
p
. . . q
2
( q
1
=n
and q
i
denotes the number of active data points at iteration
i. The space complexity of the GTMS algorithm is O(nm).
4. Experimental results
The GTMS algorithm was tested on clustering and image-
segmentation problems. In general, density estimator-
based algorithms need a desired bandwidth; this is still an
open problem and is beyond the scope of this paper.
Consequently, we experimentally selected some suitable
bandwidths for comparison. The algorithms were tested on
the same parameters and environment. Some notations are
introduced for short naming. MS1 denotes the standard
MS algorithm, as in Yang et al.
9
MS2 denotes the standard
MS algorithm with excluded converge points for the next
iteration. GTMS1 denotes the fastest GTMS algorithm by
choosing > 1 and GTMS2 denotes a GTMS algorithm
with
0
= 0,