You are on page 1of 6

Proceedings of the 2015

IEEE Conference on Robotics and Biomimetics


Zhuhai, China, December 6-9, 2015

Motion Trajectory Recognition Using Local Temporal


Self-Similarities*
Zhanpeng Shao, Y.F. Li, Senior Member, IEEE, and Yao Guo

AbstractMotion trajectories provide a meaningful clue in trajectories in both space and time space. Designing shape
motion characterization of humans, robots, and moving objects. descriptors is an important topic in computer vision for shape
This paper addresses motion trajectory recognition by exploring (curve) matching [4], [5], classification [6], and retrieval [7].
local self-similarities of motion trajectories over time. Such Such descriptors include compactness [8], curvature scale
temporal self-similarities within a motion trajectory are space (CSS) [7], shape context [5], B-spline [9], moment
observed by building a Self-Similarity Matrix (SSM) based on invariant [10], integral invariants [11], Fourier and Wavelet
the sigmoid distances between all pairs of points along the descriptor [12], [13]. Most of these descriptors were initially
motion trajectory. On analysis of SSMs, we develop a designed for planar closed shapes and curves, and thus are not
self-similarity descriptor that captures the layout of local sufficient to represent temporal trajectories in 3D case. Our
temporal similarities within a motion trajectory. Such previous work [14] develops some effective and robust
descriptors exhibit a noise stability and invariance to group invariant descriptors that can provide substantial advantages
transformations. Temporal pyramid ordering is used in the BoF over the raw trajectory data and other sensitive invariants in
approach to quantize a set of self-similarity descriptors as a
trajectory matching and retrieval, but they suffer a low
histogram of visual words, forming a temporal pyramid
efficiency in recognition due to the direct template matching.
representation accordingly as input data used for recognition.
Our method for recognizing motion trajectories is validated on a Although these relevant descriptors for motion trajectories
sign language dataset. It shows similar or superior performance vary significantly, they all share the same basic framework that
in comparison with other methods. In particular, a significant geometrical properties and temporal dynamics of a shape or
improvement in recognition efficiency and robustness to noise trajectory are extracted by investigating spatial interrelation,
are achieved using our method. statistical distributions, and transformed spaces. They perform
I. INTRODUCTION well being applied to matching and retrieval. Nevertheless,
recognition tasks require richer descriptions whose statistical
Visual motion recognition of humans and objects of distribution for each motion class can be encoded effectively to
interest has drawn much attention in different areas such as fit a classification model, achieving good performance in both
computer vision, machine learning, and robotics. Especially in accuracy and efficiency. Therefore, we intend to discover rich
robotics, it is a critical task as they can be widely applied in spatio-temporal patterns in a short period of frames to build
human-robot interaction, vision-based manipulation, such descriptions. Shechtman et al. [15] proposed a local
humanlike behavior imitation, etc. A motion trajectory, a set of self-similarity descriptor to match the local similarities across
positions of moving objects in 3-dimensional Euclidean space images and videos, which captures the internal geometric
(3D), can provide a compact and rich clue for motion layouts of local self-similarities within images and videos.
characterization as shown in Fig. 1(a). In this paper, we are Later on, in [16] Junejo et al. investigated the temporal
especially interested in recognizing motion trajectories as self-similarities of a human action sequence over time to
motions are abstracted as motion trajectories. Methods for such achieve a stable view-independent action recognition. Similar
tasks are usually based on a rich and effective descriptions work can be also found in [17]. All of the related studies
capturing local spatio-temporal patterns within a motion achieved their applications with first building a spatial or
trajectory. Therefore, studying effective descriptions for temporal self-similarity matrix (SSM) by computing distances
motion trajectories is very important for motion recognition. between extracted features of all frame pairs in a sequence.
Various applications using motion trajectories have been With built SSMs, in [15] Shechtman et al. transformed an SSM
proposed over past years [1][3]. However, in most of the at each pixel directly into a binned log-polar representation that
related works, raw data and some simple features of motion accounts for local non-rigid deformations. But, it is a kind of
trajectories are often used directly in those applications. Such global descriptor for images in nature, and does not take
raw data are some quantities including trajectory positions, temporal information into account. In [16], local descriptors
velocities, and angular changes, which are not flexible and are extracted from an SSM by accumulating histograms of
robust in practical applications. It is a feasible way to introduce gradient orientations in local patches of the SSM. They then
particular shape and curve descriptors to describe motion are quantized as histograms of visual words based on bag of
features (BoF) approach [19]. However, built SSMs in [16] are
computed in the way that elements of an SSM are Euclidean
*The work was supported by a grant from the Research Grants Council of
Hong Kong [Project No. CityU 118613] and NSFC [61273286]
distances of all pairs of time frames. The Euclidean distance is
Zhanpeng Shao, Y.F. Li, and Yao Guo are with the Department of a common metric but sensitive to noise and outliers [21].
Mechanical and Biomedical Engineering, City University of Hong Kong, 83 Moreover, the BoF there discards the temporal information in
Tat Chee Avenue, Kowloon, Hong Kong (phone:852-34426778; fax: the sequence. Similar limitations can be observed in [17].
852-34420172; e-mail: perry.shao@my.cityu.edu.hk, meyfli@cityu.edu.hk,
yaoguo4-c@my.cityu.edu.hk).

978-1-4673-9675-2/15/$31.00 2015 IEEE 102


Figure 1. An overview of our proposed framework for trajectory recognition using self-similarity descriptors. (a) Tracking a trajectory (draw x) from a hand. (b)
The tracked trajectory is shown in 3D space, and we then build (c) its temporal self-similarity matrix whose elements are differences between all pairwise points
within the motion trajectory. With the built SSM, we associate (d) a local self-similarity descriptor with each frame using a histogram of gradient orientations
extracted from a log-polar cell structure. From (a-d), these procedures can also produce a collection of self-similarity descriptors for training data and test data
respectively. With BoF, we quantize self-similarity descriptors into (e) histograms of visual words which are then measured using a (f) temporal pyramid
matching kernel. (f) An SVM classifier works with such temporal pyramid matching kernel, and is trained offline. Testing data are processed in the same flow
as input data for the trained SVM to predict their class labels.
This paper proposes a self-similarity descriptor which self-similarity descriptors from both the testing and training
captures the internal spatio-temporal layouts of local temporal data to form histograms of visual words as input data used for a
self-similarities within a motion trajectory. Self-similarity classifier. Differing with the orderless BoF [16], to take
descriptors are extracted from the patches on the diagonal of a temporal information into account, we adopt the spatial
temporal SSM which is computed by the differences of all pyramid matching scheme (SPM) [19] to form a temporal
pairwise points of the motion trajectory. Instead of using the pyramid histogram of visual words for each SSM, where the
Euclidean distance as most of the related work, this research SSM is portioned into sub-blocks at increasingly fine scales
employs an alternative metric to compute such pairwise and the BoF histogram of local descriptors in each sub-block
differences, and then store results in an SSM. The alternative across different scales is computed, as shown in Fig 5. All
metric is a sigmoid distance which is more robust to noise and obtained BoF histograms are concatenated to form a single
outliers [21]. Moreover, the proposed sigmoid distance is a vector as the temporal pyramid representation for the SSM.
function of Euclidean distance which is invariant to group Finally, we apply the pyramid matching kernel [19] together
transformations, including translation, rotation, and uniform with a multiclass Support Vector Machine (SVM) classifier to
scaling. Therefore, studying a rich descriptor for motion achieve trajectory recognition efficiently. An overview of our
trajectories turns into forming a particular image descriptor of approach is shown in Fig. 1.
SSMs as a motion trajectory is treated as an SSM image.
The remainder of this paper is organized as follows:
Self-similarity descriptors are extracted by scanning those Section II gives a formal definition of SSM for a motion
local patches along the diagonal of an SSM shown in Fig 1(c), trajectory based on a sigmoid distance and shows its invariance
instead of traditional image descriptors that usually scan the and robustness. Section III describes our method for trajectory
entire patches of an image by using a predefined kernel with a recognition based on self-similarity descriptors extracted from
specified overlapped size, because in this research local SSMs. In Section IV, we present recognition experiments on a
temporal self-similarities can be observed from the patches sign language dataset to show the results based on our method,
centered at the diagonal elements of an SSM. To avoid when compared with other descriptors and methods. Finally,
dependency on absolute values of SSM images, a histogram of we conclude this paper in Section VI.
gradient orientations within each binned log-polar patch over
time is accumulated as a local self-similarity descriptor, as II. SELF-SIMILARITY MATRIX
shown in Fig. 1(c-d), where the log-polar patch structure
allows local small deformations and is measured locally. In this section, we aim to build an SSM consisting of the
Following the procedures shown in Fig. 1(a-d), a number of pairwise distances of time frames within a motion trajectory to
random samples of self-similarity descriptors are collected capture both the spatial and temporal patterns. Differing with
from training data to be clustered for learning a visual most current work, we use a sigmoid distance, rather than the
vocabulary based on BoF approach. We then quantize Euclidean distance, as the basic metric to measure

103
self-similarities. That means that the distance dij in an SSM is
the sigmoid distance between a pair of corresponding points at
frame i and j of a motion trajectory.
A motion trajectory records a sequence of position vectors
of a moving object in 3-dimensional Euclidean space, and it is
parameterized with (t ) {x(t ), y (t ), z (t )} | t [1, N ] in
discrete {x, y, z , t} space. The SSM for is a square
symmetric matrix of size of N N ,

0 d12 d1N

d 0 d2 N
d ij 21
i , j 1,, N

d N1 dN 2 0
, (1) Figure 2.The distance measure for Euclidean metric and the sigmoid metric
with different .
where d ij is the local distance between the points at two
Euclidean distance to the range of [0,1] , as shown in Fig. 2.
instant i and j of a motion trajectory, and is defined as the
Given an appropriate value of , the sigmoid function maps a
sigmoid function,
corresponding range of Euclidean distances between pairs of
points of a trajectory into the range [0,1) as distance measure.
d (i , j ) tanh( (i ) ( j ) 2 c ) , (2)
In this way, the noisy points and outliers in motion trajectories
where is a positive constant that determine the steepness of will yield abnormal distances between themselves and their
the curve of the sigmoid function in (2), c is the bias. The neighboring points in the SSM. The abnormal distances are
beyond the range determined by , thus are mapped to the
defined sigmoid function d (i, j ) is an monotone increasing
upper bound (the value is 1) of the sigmoid function.
function of (i ) ( j ) 2 , and we claim d (i, j ) is a metric Nevertheless, the abnormal distances are mapped linearly
since three conditions are satisfied, (1) when using Euclidean distance directly in computing SSMs. In
d (i, j ) 0 i j d (i, i ) 0 , (non-negativity); (2) other words, the normal distances are mapped to the
approximated linear area of the sigmoid function that gives a
d (i, j ) d ( j , i ) , (symmetry); (3) d (i, k ) d (i, j ) d ( j , k ) ,
large weight to those normal points, while the abnormal
(triangle inequality ). distances are mapped to the non-linear area of the sigmoid
As we have claimed, the sigmoid distance is a metric that function that give a small weight to those noisy points and
show more robust to noise and outliers over the Euclidean outliers. Therefore, is a key parameter that is obtained by
distance. The sigmoid distance function can map any training with a particular dataset, since different raw data of
motion trajectories are with various scales and sampling rates.

Figure 3. Examples of SSMs for the motion trajectory of an all instance from the ASL dataset [20]. (a) The original trajectory. (b) The transformed version. (c)
The SSM of (a) using Euclidean distances. (d) The SSM of (a) using the sigmoid distances. (e) The SSM of (b) using the sigmoid distances.

104
Figure 4. Example of building a self-similarity descriptor. (a) It is a sign trajectory extracted from an all instance from the ASL dataset [20]. (b) An SSM is
computed from the pairwise sigmoid distances at all frames. (d) A local self-similarity descriptor is computed by accumulating the histograms of gradient
orientations with (d) a log-polar patch that is portioned into cells with a set of parameters of the log-polar coordinate.
We adjust so that the corresponding sigmoid function maps hi hi (1) hi ( ) , (3)
normal distances and abnormal distances to the ranges of [0,1)
and 1, respectively. where is the cell order from 1 to the number of cells within
Such SSM are not only robust to noise and outlier and also the patch. Such histograms of all the cells in a patch are
can achieve invariance to rotation, translation and scaling in concatenated into a self-similarity descriptor at frame i ,
motion trajectory representation. The sigmoid distance in SSM Hi ( ) hi1 hi . For histograms with cells falling outside an
is the function of Euclidean distance. The Euclidean distance is SSM, we set them to zero. Thus, a set of self-similarity
intrinsically invariant with respect to those transformations in
descriptors is built as H ( ) Hi ( )i 1:N for a motion
T
Euclidean space. Examples of SSMs computed from an
example motion trajectory are shown in Fig. 3. To illustrate the trajectory .
invariance of SSMs, we transform the original trajectory in Fig.
3(a) to a new one shown in Fig. 3(b) by a series of actions, B. SSM-Based Motion Trajectory Recognition
including first rotating 30 and 45 by x and z axis,
respectively, translating 200mm and 500mm along x and y To recognize motion trajectories, a recent BoF approach is
direction, respectively, and finally scaling by 0.5 factor. Note employed to encode the statistics of self-similarity descriptors
the visual difference of SSMs using both Euclidean distances by quantizing the descriptors into histograms of visual words
and the sigmoid distances in Fig. 3(c-d). Note also the of BoF. Following classic BoF approach, a visual vocabulary
similarity of SSMs computed for the same trajectory despite is learned offline by k-means clustering of K random local
the transformations on the motion trajectory above mentioned. self-similarity descriptors from training data. By clustering, we
can obtain a predefined number of clusters, D , centers of
III. SSM-BASED RECOGNITION which are the words of the visual vocabulary. In training and
testing, a set of self-similarity descriptors H ( ) for a motion
A. SSM-Based Description trajectory are quantized into a normalized histogram z of
visual words. Unlike an orderless BoF, in our situation we need
With built SSMs, most current work decomposes and
to take temporal information into account when building
transforms them into a reduced dimension space, or extracts
histogram of visual words. We follow an extension of BoF in
image-based features. However, they are global features with
[19] to first partition an SSM into temporal sub-blocks from a
discarding temporal information. We intend to discover local
fine to coarse scale, and compute the histograms of local
self-similarities by extracting a self-similarity descriptor in
descriptors across different sub-blocks and over different
each local patch centered at elements along the diagonal of an
SSM as shown in Fig. 4(b). Self-similarity descriptors are temporal scales. Typically, 2 sub-blocks, 1,, L , are
obtained by accumulating histograms of gradient orientations used. An example of partitioning an SSM into temporal
[18] in local patches which are along the diagonal of the SSM sub-blocks at three scales is shown in Fig. 5, where z ( s )
with a log-polar cell structure that is defined with the denotes the histogram from s -th sub-block at -th scale. By
parameters: r , the radius, the number of bins along the concatenating the histograms from various sub-blocks at
radial and angular directions, , the number of gradient different scales, an SSMs temporal pyramid representation
z = [ z 0 ,..., z L ] is obtained. Such temporal pyramid
orientations at each cell. An example of building such
representations are then input to a nonlinear SVM classifier
self-similarity descriptors is shown in Fig. 4, where for the
with the pyramid matching kernel [19] defined as
log-polar coordinate centered at i frame, the corresponding
patch is partitioned into 25 cells ( 8 4 , i.e. 8 angular bins, 1 0 L 1
L I + L 1 I (4)
4 radial bins) as those center cells are combined into a single 2L 1 2
cell. An 6-bin unsigned ( 6 , gradient orientations are
limited within 0 ~ ) histogram of gradient orientations where the histogram intersection function I between the
histograms of self-similarity descriptors of a given pair of
within each cell of the local patch at frame i is computed as
trajectories x and y is:

105
(AII) (HMM-AII), [14] as observations, (4) a SVM classifer
using AII-based BoF approach that means descriptors in
clustering and quantizing to vocabulary words are AII
descriptors rather than the self-similarity descriptors as our
method (SVM-AII-BoF), (5) the 1-NN classifier using Fourier
descriptors (FD) with Euclidean distances (1-NN-FD).
The recognition experiments for all the methods are
implemented using MATLAB on a common PC with Core
i5-2400 3.1 GHZ CPU (32bit), 4G RAM.

B. Sign Language Recognition


Each time 16 classes of samples are randomly picked up in
Figure 5. Example of constructing a three-scale pyramid. An SSM is ASL dataset to run a batch recognition test where 14 samples
subdivided into temporal sub-blocks at three temporal resolutions.
Thereafter, at each scale of temporal resolution, it is to produce the
of each class are for training and 13 samples are for testing. We
histograms that fall in each temporal sub-block. repeat this test 50 times for all methods. We report the average
2 recognition performance as summarized in Table I and Table II
I (z x , z y ) min(z x (i ), z y (i )) . (5) on the accuracy and efficiency respectively. The recognition
i 1 efficiency here is evaluated using the average time per query in
z x and z y in (5) denote the histograms at scale for any batch recognitions.
two trajectories x and y , respectively. A non-linear Let us examine the experimental results in recognition
multiclass SVM works with such pyramid matching kernel to accuracy and efficiency. Our recognition accuracy is 84.83%,
recognize motion trajectories represented by SSMs. which is higher than all the other methods. To verify the effects
of the sigmoid distance, we experiment with our method when
IV. EXPERIMENTS using Euclidean distances to compute SSMs, and as expected
the result drops to 78.20% from 84.83%. Thus it can confirm
In this section, the SSM-based trajectory recognition the advantage of using sigmoid distances as basic metric of
approach is evaluated using the task of sign language measuring self-similarities over using Euclidean distances.
recognition. We chose Australian Sign Language (ASL) Also, to show the discriminative power when using a temporal
dataset [20] to run several sign recognition benchmarks. pyramid BoF approach, our method based on an orderless BoF
approach is run to achieve a result of 79.82% dropped from
A. Dataset and Parameters Setting 84.83%. It is also interesting to compare performance when
As a sign can be abstracted as two hand trajectories, we using different descriptors as visual words in the BoF approach.
demonstrate the effectiveness of our method by sign Thus, we train an SVM classifier on BoF histograms that are
recognitions on ASL dataset. ASL dataset consists of 2565 quantized using AII descriptors directly instead of
samples of ASL signs, where 27 examples of each of 95 sign self-similarity descriptors. In this case, their results are 40.68%,
classes are captured from a native signer with high-quality data, where the results are to show the richness of self-similarity
and each sample of ASL signs is performed by moving the descriptors compared with AII descriptors. In comparison with
right-hand and left-hand simultaneously in 3-dimensional the methods based on template matching, we employ the 1-NN
space. In this test, as addressed in our previous work [14] we TABLE I
only employ the root trajectory, an average of right and left RECOGNITION ACCURACY USING OUR METHOD COMPARED WITH OTHER
hand trajectories, to represent a sign. METHODS ON ASL DATASET
Methods Accuracy
In computing SSMs, the parameter is adjusted as 0.5e-3 SSM-sig-TPM 84.83%
according to the raw data of ASL dataset, and c 0 in the
experiment. For extracting self-similarity descriptors, some SSM-raw-TPM 78.20%
parameters are set as: 8 4 , 6 , r 30 . Finally, we SSM-sig-BoF 79.82%
learn a visual vocabulary by clustering 20000 self-similarity SVM-AII-BoF 40.68%
descriptors from the training data into 2000 clusters being the
words of this vocabulary, i.e. K 20000 , D 2000 . Also, we 1-NN-FD 78.07%
subdivide each SSMs at 3 scales in temporal pyramid matching, HMM-AII 66.90%
i.e. L 3 . A non-linear SVM with the pyramid matcing kernel
defined in (4) is trained using LIBSVM package [22]. TABLE II
RECOGNITION EFFICIENCY COMPARISON (UNIT: MILLISECONDS PER QUERY)
We compare our proposed method that is abstracted as
SSM-sig-TPM, against different methods and descriptors for Methods Time cost (ms)
recognition: (1) our method when SSMs are computed by SSM-sig-TPM 52
Euclidean distances (SSM-raw-TPM), (2) our method when
using an orderless BoF approach (SSM-sig-BoF), (3) a 1-NN-FD 430
right-left Hidden Markov Model (HMM) [2] with 5 mixtures HMMs using AII 151
of Gaussian outputs and 5 states using area integral invariants

106
classifier on FD descriptors with Euclidean distances. Thus, statistics of local descriptors, which will be our forthcoming
we can see that the result via the 1-NN using FD is 78.07%, research issue.
1-NN-FD method depends on an exhaustive matching which
leads to a higher average time cost of 430ms per query in
recognition efficiency. Our method achieves a significant REFERENCES
improvement in recognition efficiency with the average time [1] C. Rao, A. Yilmaz, and M. Shah, View-invariant representation and
cost of 52ms as recorded in Table II. Finally, a classic left-right recognition of actions, Int. J. Comput. Vis., vol. 50, no. 2, pp. 203226,
2002.
HMM is employed to model the temporal dependency for each
[2] J. Beh, D. K. Han, R. Durasiwami, and H. Ko, Hidden Markov Model
motion trajectory using AII descriptors as observations, and an on a unit hypersphere space for gesture trajectory recognition, Pattern
average accuracy is obtained with 66.90%. Accordingly, its Recognit. Lett., vol. 36, no. 1, pp. 144153, 2014.
time cost is 151ms. [3] M. Bennewitz, Learning Motion Patterns of People for Compliant
Robot Motion, Int. J. Rob. Res., vol. 24, no. 1, pp. 3148, Jan. 2005.
[4] C. Xu, J. Liu, and X. Tang, 2D shape matching by contour flexibility,
C. Noise Effects IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 180186,
In order to support our claim that SSMs computed by the 2009.
sigmoid distance are more robust to noise and outliers, we set [5] G. Mori, S. Belongie, and J. Malik, Efficient shape matching using
up the recognition experiments in same configuration as before shape contexts, IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no.
11, pp. 18321837, 2005.
but add white Gaussian noise to motion trajectories of signs [6] M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, and A. Del
from the test set of ASL dataset. In this paper, the noise is Bimbo, 3-D human action recognition by shape analysis of motion
measured with normalized standard deviation , and is trajectories on riemannian manifold, IEEE Trans. Cybern., vol. 45, no.
increased from 0 to 0.3. The recognition results are shown in 7, 2015.
Fig. 6 that plots the recognition accuracy drops as addictive [7] F. Bashir and A. Khokhar, Curvature scale space based affine-invariant
trajectory retrieval, in Proceedings of IEEE International Multitopic
noise is increased when using SSM-sig-TPM and Conference, 2004, pp. 2025.
SSM-raw-TPM. As indicated in Fig. 6, the accuracy using [8] J. Xu, J. Faruque, C. F. Beaulieu, D. Rubin, and S. Napel, A
SSM-raw-TPM has a sharper drop in noisy trajectory comprehensive descriptor of shape: Method and application to
recognition. content-based retrieval of similar appearing lesions in medical images,
J. Digit. Imaging, vol. 25, no. 1, pp. 121128, 2012.
[9] A. Oikonomopoulos, M. Pantic, and I. Patras, Sparse B-spline
polynomial descriptors for human activity recognition, Image Vis.
Comput., vol. 27, no. 12, pp. 18141825, 2009.
[10] J. Flusser, J. Kautsky, and F. roubek, Implicit Moment Invariants,
Int. J. Comput. Vision., vol. 86, no. 1, pp. 7286, Jan. 2010.
[11] B. Hong and S. Soatto, Shape Matching using Multiscale Integral
Invariants, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 01, pp.
110, 2014.
[12] S. Khalid, Motion-based behaviour learning, profiling and
classification in the presence of anomalies, Pattern Recognit., vol. 43,
no. 1, pp. 173186, 2010.
[13] E. Bala and A. E. Cetin, Computationally efficient wavelet affine
invariant functions for shape recognition, IEEE Trans. Pattern Anal.
Mach. Intell., vol. 26, no. 8, pp. 10951099, 2004.
[14] Z. Shao, Y.F. Li, Integral invariants for space motion trajectory
matching and recognition, Pattern Recognition, vol. 48, no. 8, pp.
24182432, 2015
[15] E. Shechtman and M. Irani, Matching local self-similarities across
Figure 6. Noise effects on recognition accuracy when the added noise is images and videos, in Proceedings of IEEE Conference on Computer
increased from 0 to 0.3 Vision and Pattern Recognition, 2007, pp. 18.
[16] I. N. Junejo, E. Dexter, I. Laptev, and P. Prez, View-independent
action recognition from temporal self-similarities, IEEE Trans. Pattern
V. CONCLUSION Anal. Mach. Intell., vol. 33, no. 1, pp. 172185, 2011.
We propose a self-similarity descriptor to capture rich [17] A.-R. Lee, H.-I. Suk, and S.-W. Lee, View-invariant 3D action
recognition using spatiotemporal self-similarities from depth camera,
spatio-temporal patterns within motion trajectories, and use in Proceedings of International Conference on Pattern Recognition,
them to perform fast recognition tasks. As the sigmoid distance 2014, pp. 501505.
is a basic metric to compute SSMs, they have been [18] N. Dalal and W. Triggs, Histograms of oriented gradients for human
demonstrated to be more discriminative and robust than using detection, in Proceedings of IEEE Conference on Computer Vision and
Euclidean distances as basic units. Moreover, the temporal Pattern Recognition, 2005, pp. 886893.
[19] S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features: Spatial
pyramid matching for BoF histograms of self-similarity pyramid matching for recognizing natural scene categories, in
descriptors yields a significant improvement in recognition Proceedings of IEEE Conference on Computer Vision and Pattern
accuracy. Compared with other methods, our method is clearly Recognition, 2006, vol. 2, pp. 21692178.
confirmed in recognition accuracy and efficiency. [20] UCI KDD ASL Archive, Australian sign language dataset, Available:
http://kdd.ics.uci.edu/databases/auslan2/auslan.html. .
In our method, the BoF approach uses clustering to build a [21] K. Wu and M. Yang, Alternative c-means clustering algorithms,
visual vocabulary for a training dataset, which yields a coarse Pattern Recognit., vol. 35, no. 10, pp. 22672278, 2002.
reconstruction of self-similarity descriptors. It is believed that a [22] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector
less reconstruction of self-similarity descriptors can be found machines, ACM Trans. Intell. Syst. Technol., vol. 2, pp. 27:127:27,
2011.
by sparse coding that is a refined approximation for building

107

You might also like