Professional Documents
Culture Documents
CLASSIFIER
Megha D Bengalur
BVBCET, Hubli
Department of Electronics and Communication Engineering.
megha776@gmail.com
ABSTRACT
In this paper, we address the problem of human activity recognition using support vector machine(SVM) classifier to classify different types of activities. Activity recognition aims to
recognize the actions and goals of one or more agents from a
series of observations on the agents actions and environmental conditions. We use 3D joint skeleton taken from depth
sensor(Microsoft Kinect) which provides adequate accuracy
for real-time full body tracking of humans, as a compact representation of postures. We create a complete human activity
dataset depicting an activity including RGBD image and motion capture data. We make use of the skeleton information
obtained from these videos to best recognize the activities.
We test our method on detecting and recognizing 13 different
types of activities performed by 10 individuals with varied
views in indoor and outdoor environments and achieve good
performance. We show a better results for detection of activities even if the person is not seen before in the training set
and achieve an overall detection accuracy of 89%.
Index Terms Support Vector Machine(SVM), skeletal
joint features, RGBD image, Microsoft Kinect, ROC curves.
1. INTRODUCTION
In this paper, we use the supervised machine learning approach for human action recognition with a particular emphasis on feature selection, data modeling and classifier
structure. Recently, human activity analysis has become
the most active research area in computer vision. This is due
to the promising applications in areas such as visual surveillance, human performance analysis and computer-human
interfaces [1][2][3]. We recognize activities performed by
individuals using RGBD (Microsoft kinect) sensor. Human
joint sequence is an effective representation for structured
motion [4], hence we only utilize a sequence of tracked human joints inferred from RGBD images as a feature. We
generate a dataset to evaluate various features, for detection
of activities via SVMs. We collect dataset for 13 different activities: drinking, walking, reading, waving, writing,
clapping, stretching, dozing e.t.c from 10 participants. We
evaluate several geometric relational body-pose features including joint features and plane features using our dataset for
activity detection. Experimentally, we find joint features to
perform well than others feature choices for this dataset. To
date research has mainly focused on learning and recognizing
actions from video sequences taken by a single visible light
camera. Recently, the rapid development of depth sensors
(e.g. Microsoft Kinect) provides adequate accuracy of realtime full-body tracking with low cost. This enables us to once
again explore the feasibility of skeleton based features for activity recognition. Authors in [5] used hierarchical Maximum
Entropy Markov model to recognize activities in unstructured
environments. They infer the two layered graph structure using dynamic programming approach [6]. Authors in [7] uses
generative model to classify activities using Hidden Markov
Model(HMM) which uses model of randomly generating observable data. But, in real world applications, activities are
seldom performed in unstructured environments and different
person perform activity at different rates. To overcome this
problem, we use the PrimeSense skeleton tracking system
(provided by Microsoft Kinect), which will extract only the
skeleton of a person. Hence we only utilize a sequence of
tracked human joints inferred from RGBD images as a feature. It is interesting to evaluate body-pose features motivated
from motion capture data [8][9][10] using tracked skeletons
from a single depth sensor. So the environment, whether
structured or unstructured doesnt matters. We make use of
the Support Vector Machine (SVM) [11] to tackle irrelevant
actions in whole sequence classification. We find that SVM
based classifier has good classification accuracy for detecting
the activity even when the person is not present in the training
set.
Contributions to the paper are:
1. We propose a method for human activity recognition
for 13 different types of activities captured through
Kinect camera using SVM classifier.
2. We propose to achieve good results even if the person is
not present in the training set by using human skeletal
joint features
3. We use discriminative models to achieve better accuracy rate for all activities.
Proposed method is reviewed in Section 2 which also provides the detailed description of activity dataset which also
defines the geometric relational body-pose features for activity detection. Section 3 shows the experimental results and
section 4 concludes the paper.
2. PROPOSED METHODOLOGY
We use the idea of SVM to solve the problem of activity
recognition in whole sequence classification. Figure 2. shows
our proposed methodology having a database containing a
training set for learning and a test set for validation. We extract distinct features for each activity to build a SVM model.
Figure3: Activities a) Drinking b) Dozing c) Reading d) Stretching e) Writing f) Clapping g)Jumping h) Waving i) Running j)
Walking k) Shaking hands l) Hugging m) Walking and Drinking.
of a person at a particular time T. It captures the distance between two joints in a single pose. We also record the vertical
and horizontal positions of the hands over the last n frames
by capturing the geometrical relationship between the plane
and a joint. The plane features are computed with emphasis
on geometric relationship between a plane and a joint. For
example, we can find how far a persons hand lie in-front of
the plane spanned by his hip, torso, neck
2.2.3. Motion and Velocity information
Motion information is plays an key role in classifying the different activities like running, walking etc. So, we compute
this by selecting n frames over a specified time spaced as:
n/2, 2n/2, . . . 10n/2.. where n gives the number of frames
to be chosen. Using this motion we compute the joint rotation and joint angle that has occurred in each of these frames
represented as half-space quaternions. The plane features are
computed with emphasis on geometric relationship between
a plane and a joint. For example, we can find how far a persons hand lie in-front of the plane spanned by his hip, torso,
neck. The velocity feature captures the velocity of one joint
along the direction between two other joints at time T. The
velocity information especially a very key role in detecting
3. SVM CLASSIFIER
We make use of the multi-class SVM classifiers to classify
the dataset with more than one class of activities. As SVMs
[11][3][17] are inherently binary classifier, the traditional way
to do multi-class classification is to use one-against-one or
one-against-all method [18][19]. We use of both the methods in our approach. We make use of the NuSVC kernel
[20][21][22] for one-against-one approach and Linear SVC
for one-against-all. In case of one-against-one, the classifier
gives one vote to the winning class and the test point is labeled with the point with the highest vote. In the case of oneagainst-all, we choose the class that classifies the test data
with largest margin. We have a set of 13 different activities
with common features applied to all. So, we create a feature
space with a separate plane for every feature with each plane
accommodating the corresponding feature of all the activities. Once the features are mapped into the feature space we
apply the SVM to each of these spaces so that the classifier
manages to distinguish all the activities with separate hyper-
dT A =
dT = min(dT A1 , dT A2 . . . dT A10 )
(1)
(2)
Table 2. Performance analysis on whole sequence classification for both training set containing test set(Set 1) and training
set not containing test set(Set 2). One-against-one has better
classification results than one-against-all SVM.
Classifier
Set 1
Set 2
Performance
decrease
One-against-one 89.538% 80.230%
-9.308%
One-against-all
87%
78.234%
-8.766%
5. CONCLUSION
We have addressed the problem of human activity recognition
using supervised learning approach. We have used the oneagainst-one and one-against-all multi-class SVM approach to
classify all the activities. We have proposed the skeletal joint
features extracted from the depth videos with varied poses for
teraction. Even though the results are not 100% accurate our
approach outperforms in recognizing 3 kinds of activities like
single person one activity, two person interaction and single
person performing two activities which the papers in the literature survey fails to do. Thus, we propose single person
performing two activities as the novelty of our approach.
6. REFERENCES
[1] R.W. Poppe, A survey on vision-based human action
recognition., .
[13] Young Min Kim, Microsoft kinect., Geometric Computing Group, 2012.
[14] Mat Cook Toby Sharp Richard Moore Alex Kipman Andrew Blake Jamie Shotton, Andrew Fitzgibbon, Realtime human pose recognition in parts from single depth
images., Microsoft Research Cambridge and Xbox Incubation, 2011.
[15] H. Wang J. Niebles and L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal
words., .
[3] Matthew Scholten, Testing of the support vector machine for binary- class classification, .
[22] Sheng-DeWang Kuo-PingWu, Choosing the kernel parameters for support vector machines by the inter-cluster
distance in the feature space., National Taiwan University,Taipei,Taiwan., 2009.
[23] Bernhard Burgstaller and Friedrich Pillichshammer,
The average distance between two points, Linkoping
University, 2008.
[24] Yinyu Ye, Semidefinite programming for euclidean
distance geometric optimization., Stanford University,
2009.
[25] Charles E. Metz, Basic principles of roc analysis., .
Table 1. One-against-one(OAO) and One-against-all(OAA) SVM Classifier. In Test 1, 1/3 of the samples are used as training
samples and the rest as testing samples. In Test 2, 2/3 samples are used as training samples. Test 3 is a cross subject test, half
of the subjects are used as training and the rest of the subjects were used as testing..
Indoor
Activities
Outdoor
Activities
Two Person
Interaction
Activity
One person
performing
Two Activities
Activity
Label
Test 1
OAO
Test 1
OAA
Test 2
OAO
Test 2
OAA
Test 3
OAO
Test3
OAA
Drinking
Dozing
Reading
Stretching
Writing
Clapping
Jumping
Waving
Running
Walking
Shakinghands
Hugging
Walking
and
Drinking
94%
94%
90%
82%
90%
86%
90%
82%
86%
86%
88%
90%
90%
89%
82%
87%
84%
88%
80%
84%
85%
87%
94%
94%
92%
84%
90%
88%
91%
86%
88%
90%
92%
92%
92%
88%
82%
86%
84%
90%
86%
85%
86%
90%
84%
84%
80%
75%
80%
76%
78%
80%
80%
81%
84%
82%
82%
79%
70%
78%
72%
70%
75%
75%
76%
78%
89%
83%
86%
80%
94%
85%
92%
80%
86%
81%
84%
76%