Professional Documents
Culture Documents
Specific Hand
Gesture
Recognition
System
Ankit Mishra
ELECTICAL ENGG. DEPT; GAUTAM BUDDHA
UNIVERSITY
ANKITMISHRA723@GMAIL.COM
+919971553633
ABSTRACT
area
changes.
The current accuracy rate of the system is 89% with the particular intensity
level at which images were taken on MATLAB Environment.
IST OF FIGURES
1
Figure
No.
Caption
1.1
2.1
2.2
2.3
2.4
3.1
A system Architecture
3.2
3.3
3.4
3.5
Edge Neighborhood
3.6
3.7
3.8
Page
No.
3.9
3.10
3.11
LIST OF TABLES
Table
No.
Particulars
2.1
4.1
Resut
Page
No.
LIST OF ABBREVIATIONS
HCI
Human Computer
LDA
MMI
Man-Machine Interaction
DoF
Degree of Freedom
HOG
ISM
CONTENT
CANDIDATE'S DECLARATION
ACKNOLEDGEMENT
ii
DEDICATION
iii
ABSTRACT
iv
LIST OF FIGURES
LIST OF TABLES
LIST OF ABBREVIATIONS
CONTENTS
Chapter 1 INTRODUCTION
v
vii
viii
ix
1
1.1 General
1.1.1 Computer Vision
1.1.2 Behaviour Understanding
1.1.3 People Detection and Body Part Detection
1.1.4 People and Body Part Tracking
1.1.5 Posture Detection
1.1.6 Proposed Algorithm
1.2 Work Objective and Motivation
1.3 Organization of Dissertation Work
2
3
3
4
4
5
6
7
8
9
10
2.1 Introduction
2.2 Definition and Nature of Gesture
10
13
16
17
18
23
3.1 Introduction
24
24
27
28
30
32
34
35
36
38
4.1 Result
39
4.2 Conclusion
41
Bibliography
42
ACKNOWLEDGEMENT
CANDIDATES DECLARATION
I hereby declare that the work embodied in this dissertation report entitled Application
Specific Gesture Recognition System submitted in the partial fulfilment of the
requirements for the award of the degree of5 Year Integrated Dual Degree
ProgrammeB. Tech. (Electrical Engineering) + M. Tech. (Instrumentation &
Control),to the School of Engineering, Gautam Buddha University, Greater Noida, U.P.,
India, is an authentic record of my own work carried out from December 2014 to May
2015 under the guidance and supervision ofDr.Shabana Urooj, Assistant Professor,
Department of Electrical Engineering, School of Engineering, Gautam Buddha
University, Greater Noida, U.P., India and the co-supervision of Prof. Omar Farooq,
Department of Electronics Engineering, Aligarh Muslim University, Aligarh, U.P., India.
To the best of my knowledge the matter embodied in this dissertation report has not been
submitted for the award of any other degree or diploma.
Date:
Place: Greater Noida
(Ankit Mishra)
CERTIFICATE
This is to certify that the above statement made by the candidate is correct to the best of
my knowledge and belief.
(Dr. ShabanaUrooj)
Department of Electrical Engineering
School of Engineering
Gautam Buddha University
(External Examiner)
/ /2015.
Dedicated to
10
CHAPTER 1
INTRODUCTION
11
CHAPTER - 1
INTRODUCTION
1.1 General
Gesture was the first mode of communication for the primitive cave men and nowadays
gesture recognition has been a prominent domain of research. Gestures are an important
form of human interaction and communication: hands are usually used to interact with
things (pick up, move) and our body gesticulates to communicate with others (no, yes,
stop). Thus, a wide range of gesture recognition applications has been experienced up to
now thanks to a certain level of maturity reached by sub-fields of machine intelligence
(Machine learning, Cognitive Vision, Multi-modal monitoring). For example, humans can
interact with machine through a gesture recognition device Wii-mote [1] , Cyber Glove
[2] and Multi-touch screen [3]. Nonetheless, contact device based methods are intrusive
and require the user cooperation to use correctly the device. Therefore, vision-based
methods propose to overcome these limits and allow the recognition of gestures remotely
with or without slight user cooperation (e.g. body markers, cloth conditions). Since it is
preferable to avoid these constraints, vision based methods have to overcome several
challenges such as illumination changes, low-contrasted or/and noisy videos.
Nevertheless, methods based on cameras tend to be brittle and less precise than the ones
based on contact devices.
The main challenge of vision-based gesture recognition is to cope with the large variety
of gestures. Recognizing gestures involves handling a considerable number of degrees of
freedom (DoF), huge variability of the 2D appearance depending on the camera view
point (even for the same gesture), different silhouette scales (i.e. spatial resolution) and
many resolutions for the temporal dimension (i.e. variability of the gesture speed).
12
The main concepts related to the topic of gesture recognition from video sequence are:
computer vision, behaviour understanding, people and body part detection, people and
body part tracking, and posture detection.
13
feature points, motion of body parts) or global (e.g. the whole body motion signature).
The main goal is to extract people motion features in order to analyze them for gesture
recognition. Once the movement of the body or its parts is detected, computations are
made to identify the type of motion which are known as the motion analysis step. This
analysis may be then used by different middle-level algorithms: object tracker (when we
deal with object motion) and gesture recognition (when we deal with object and body part
motion).
15
16
reliant on the convenience of the user. Considering only the frames of primary
importance i.e. having a frame with good gesture, s kept as required by the system, and
every other gesture is unneeded.
17
Chapter 1 In this section, we overview the remaining contents of this dissertation which
is structured into four main chapters. The next chapter presents the state-of-the-art of
human gesture recognition. The proposed method is over-viewed and detailed in chapters
3, 4 and 5. The seven and last chapter consists of a conclusion where a review of the
contributions and an overview of perspectives are presented.
Chapter 2 recalls the previous work on gesture recognition by presenting an up-to-date
state-of-the-art. After a brief presentation of the types of gesture, and the technologies
available currently for the recognition of these Gestures.
Chapter 3 presents a broader view of how the work has been approached with. First
there is a brief discussion on the types of Acquisition devices available and then the preprocessing stages are briefly explained.
Chapter 4 presents the Result and Conclusion of this project work.
CHAPTER 2
18
LITERATURE REVIEW
19
CHAPTER - 2
LITERATURE REVIEW
2.1 Introduction
Human gesture recognition consists of identifying and interpreting automatically human
gestures using a set of sensors (e.g. cameras, gloves). Here we present a Literature review
of state-of-the-art in human gesture recognition which includes gesture representations,
recognition techniques and applications. Before we proceed with the Literature of gesture
recognition, it is important to understand the definition and the nature of gesture as seen
by the literature.
20
non-verbal.
categories.
1. Body language: facial expressions, postures, eye gaze (e.g. amount of gaze, frequency
2.
3.
4.
5.
6.
Gestures can be categorized with respect to different criteria. For instance, [15]
distinguishes five types of gestures:
1. Emblems: an emblem (or quotable gesture or emblematic gesture) is a gesture which
can be directly translated into short verbal communication such as goodbye wave in
order to replace words. These gestures are very culture-specifics.
2. Illustrators: an illustrator is a gesture that depicts what the communicator is saying
verbally (e.g. emphasis a key-point in the speech, illustrates a throwing action when
pronouncing the words he threw). These gestures are inherent to the
communicators thoughts and speech. Also called gesticulations, they can be
classified into five subcategories as proposed by [16]:
Beats: rhythmic and often repetitive flicks (short and quick) of the hand or the
fingers.
Deictic gestures: pointing gestures which can be concrete (pointing to a real
location, object or person) or abstract (pointing abstract location, period of
time).
21
22
Inertial: these devices measure the variation of the earth magnetic field in
order to detect the motion. Two types of device are available: accelerometers
(e.g. WiFi-mote R) and gyroscopes (e.g. IGS-190 R). [1] propose to recognize
gestures with a Wi-Fi-controller independently from the target system using
Hidden Markov Models (HMM). The user can learn personalized gestures for
multimodal intuitive media browsing. [17] and [18] propose to detect fallings
Figure 2.2: Instrumented Glove equipped with potentiometer and Optic Fiber[44]
24
Infrared cameras: typically used for night vision, the infrared cameras give
information.
Stereo cameras: stereovision delivers directly a 3D world information by
nature.
Body markers: some vision systems require to place body markers in order to
detect the human body motion. There are two types of marker: (1) passive
such as reflective markers shining when strobes hit and (2) active such as
markers flashing LED lights (in sequence). In such system, each camera,
lighting with strobe lights or normal lights, delivers 2D frames with marker
positions form its view. Eventually, a pre-processing step is charged of
interpreting the views and position into 3D space.
except the magnetic sensors (metal obstacles) and ultrasonic sensors (mechanical
obstacles). Concerning health issues, we notice that some contact-devices can rise some
problems: allergy for the mechanical sensor material, cancer risk for magnetic devices.
Table 2.1: Comparison between Contact and Vision based device
Criterion
Contact-devices
Vision-devices
User cooperation
User intrusive
Precise
Flexible to
Yes
Yes
Yes/No
Yes
No
No
No/Yes
No
configure
Flexible to use
Occlusion
No
No (Yes)
Yes
Yes
problem
Health issues
Yes (No)
No
26
Here we review only the three most common ones on which the current research is going
on: (1) particle filtering and condensation algorithm, (2) learning algorithms for statistical
modelling and (3) automata-based approaches (such as Finite State Machines (FSM)).
27
prediction, [24] and learning to learn (i.e. learns his own inductive bias based on
previous experience, [25] [26]. The choice of the learning algorithm depends mainly
on the chosen gesture representation. For example, [27] propose to recognize static
hand gestures by learning the contour lines Fourier descriptor of a segmentation
image obtained by mean shift algorithm [28]. The classification is done by a support
vector machine combined with the minimum enclosing ball (MEB) criterion.
Automata-based Approaches
With learning algorithms, automata-based methods are the most common approaches
in the literature. For instance, FSMs, HMMs, PNF (i.e. Past-Now-Future network) are
sort of automaton with a set of states and a set of transitions. The states represent
static gestures (i.e. postures) and transitions represent allowed changes with temporal
and/or probabilistic constraints. A dynamic gesture is then considered as a path
between an initial state and a final state. [29] Proposed an approach for gesture
recognition using HMM based threshold. [30] presented a method for recognizing
human gestures using PCA-HOG global descriptor. The recognition is done by
maximum likelihood estimation
using HMM classifier proposed by [31]. [32] detected human actions using
PNFpropagation of temporal constraints. The main limitation of the approaches based
on automata is that the gesture model must be modified when a new gesture needs to
be recognized. Moreover, the computational complexity of such approaches is
generally huge since it is proportional to the number of gestures to be recognized
which is not the case for methods based on other tools.
28
Figure 2.3: Tools for Gesture Recognition: Clustering & Classifying Algorithms [43]
29
30
3D skeleton model: these are the most common 3D models due to their simplicity
and higher adaptability: The skeleton contains only the information about the
articulations and their 3D degree of freedom (DoF).
2. Appearance-Based Methods
A Concerning appearance-based methods, two main sub-categories exist: (1) 2D
static model based methods and (2) motion-based methods. Each sub-category
contains several variants. For instance, the most used 2D models are:
Color-based models: methods with this kind of model use generally body markers
to track the motion of the body or the body part. For example, [34] propose a
method for hand gesture recognition using multi-scale colour features,
hierarchical models and particle filtering.
Silhouette geometry based models: such models may include several geometric
properties of the silhouette such as perimeter, convexity, surface, compacity,
bounding box/ellipse, elongation, rectangularity, centroid and orientation. [35]
used the geometric properties of the bounding box of the hand skin to recognize
hand gestures.
Deformable gabarit based models: they are generally based on deformable active
contours (i.e. snake parametrized with motion and their variants [36]. [37] used
snakes for the analysis of gestures and actions in technical talks for video
indexing.
31
CHAPTER 3
GESTURE MODELLING
32
CHAPTER - 3
GESTURE MODELLING
3.1 Introduction
The Association for Computing Machinery defines human-computer interaction as "a
discipline concerned with the design, evaluation and implementation of interactive
computing systems for human use and with the study of major phenomena surrounding
them.
Gesture recognition is an important, yet difficult task. It is important because it is a
versatile and intuitive way to develop new, more natural and more human-centered forms
of human-machine interaction. Moreover, it is difficult because it involves the solution of
many challenging subtasks such as robust identification of hands and other body parts,
motion modeling, tracking, pattern recognition and classification.
A human hand is an articulated object with 27 bones and 5 fingers. Each of these fingers
consists of three joints. Human hand joints can be classified as flexion, twist, directive or
spherical depending up on the type of movement or possible rotation axes. In total human
hand has approximately 27 degrees of freedom. As a result, a large number of gestures
can be generated [38][39]. As tabulated in figure 2.1 and different approaches that can be
used for these different kinds of gesture are also tabulated in figure 2.3.
Limited Resolution: finite number of scan lines (about 625) and frame rate (30 to
60) frames/sec.
Distortion: Nonlinear video output with respect to light intensity
By far the most popular two-dimensional imaging device is the charge-coupled device
(CCD) camera.
Single IC device
Consists of an array of photosensitive cells
Each cell produces an electric current dependent on the incident light falling on it.
Video Signal Output
34
Laser Ranging Systems: Laser ranging works on the principle that the surface of
the object reflects laser light back towards a receiver which then measures the
time (or phase difference) between transmission and reception in order to
calculate the depth. Most work at long distances and therefore have inadequate
depth resolution.
Methods based on shape from shading employ photometric stereo techniques to produce
depth measurements. Using a single camera, two or more images are taken of an object in
a fixed position but under different lighting conditions. By studying the changes in
brightness over a surface and employing constraints in the orientation of surfaces, certain
depth information may be calculated.
Stereoscopy as a technique for measuring range by triangulation to selected locations in a
scene imaged by two cameras already, the primary computational problem of stereoscopy
is to find the correspondence of various points in the two images.
Data gloves are the devices for perfect data input with high accuracy and high speed. It
can provide accurate data of joint angle, rotation, location etc. for application in different
virtual reality environments. These gloves are commercially available in the market.
These have already been discussed in chapter 2.
Coloured markers attached to the human skin are also used as input technique and hand
localization is done by the colour localization.
Low cost web cameras as preferred in this project work can also be used as an Input
devices.
Segmentation is the process of dividing the Input Image into regions separated by
boundaries [42]. The segmentation process depends on the type of gesture, if it is
dynamic gesture then the hand gesture need to be located and tracked [42], if it is static
gesture (posture) the input image have to be segmented only.
To locate the hand, generally a bounding box is used to specify the depending on the skin
color [43]; and for tracking the hand there are two main approaches;
either the video is divided into frames and each frame have to be processed alone, or
using some tracking information such as shape, skin color using tools like Kalman Filter
[42].
In [42], hand is segmented using the skin colour, its the easiest possible way as skin
colour is invariant to translation and rotation changes. Gaussian Model is the paametric
technique and Histogram based technique is non-parametric technique.
Drawback of skin segmentation is that it is affected by illumination condition changes.
In [44] segmentation is done using infrared cameras and range info generated by Timeof-flight (ToF) camera, these can detect different skin colours but are affected by change
in temperature.
Data Gloves and Coloured Markers can also be used for segmentation as they provide
exact info about orientation and position of palm and fingers.
The color space used in a specific application plays an essential role in the success of
segmentation process, however color spaces are sensitive to lighting changes, for this
reason, researches tend to use chrominance components only and neglect the luminance
components such as RG, and HS color spaces. There are factors that obstacle the
segmentation process which are: complex background, illumination changes, low video
quality.
considered as a 3D matrix in which 2 dimensions are for pixel value and the third
dimension is dedicated to the plane of colour.
The planes of each colour can be accessed separately using the following MATLAB
command:
>>IImage(Pixel:Pixel:Plane)
>>To Access Red Plane
IImage(:,:,1)
IImage(:,:,3)
Images consumes large amount of memory space. To save memory space the images are
first resized using down sampling. The method preferred for this process is bicubic
interpolation, because in this method the yielded output pixel value is a weighted average
of pixels in the adjacent 4-by-4 neighborhood. The following MATLAB command by
default uses the bicubic interpolation.
>>IImage=imresize(IImage,[256 256]);
After the Resizing of Image the Region of Interest (ROI) is selected in the image, in
computer vision and optical character recognition (OCR), the ROI describes the borders
of the object placed under consideration and is a subset in an Image which contains the
information desired. The following command was used.
>>roipoly(IImage(:,:,1),uint8(c),uint8(r));
Roipoly creates an interactive tool for selecting the polygon, the mouse is used to identify
the region by selecting vertices of the polygon. This function returns a binary image as
the output which can be used as a mask for mask filtering if required. The functions are
discussed in detail in the documentation of Mathworks [42].
37
reasons edges forms the outline of an object and also indicates the boundary between
overlapping objects. . These points when joined with a line forms the edge of the image;
J. Canny, 1986 [40]. Edge detection is a very important mathematical tool for feature
detection and feature extraction as discontinuities in image brightness corresponds to
different aspects in image such as discontinuities in depth; discontinuities in surface
orientation; change in material properties and variation in scene lighting.
39
40
Sobel Operator: The sobel edge detector computes the gradient by using the
discrete differences between rows and columns of a 3X3 neighborhood. The sobel
operator is based on convolving the image with a small, separable, and integer
valued filter. In below a sobel edge detection mask is given which is used to
compute the gradient in the x (vertical) and y (horizontal) directions.
41
Prewitt Operator: Prewitt operator edge detection masks are the one of the
oldest and best understood methods of detecting edges in images The Prewitt edge
detector uses the following mask to approximate digitally the first derivatives Gx
and Gy. The following is a prewitt mask used to compute the gradient in the x
(vertical) and y (horizontal) directions.
Laplacian of Gaussian (LoG): This detector finds edges by looking for zero
crossings after filtering f(x, y) with a Laplacian of Gaussian filter. In this method,
the Gaussian filtering is combined with Laplacian to break down the image where
the intensity varies to detect the edges effectively. It finds the correct place of
edges and testing wider area around the pixel. In below a 5x5 standard Laplacian
of Gaussian edge detection mask is given
42
43
The above MATLAB command sums up all the ones in the 2D matrix of the image giving
us the approximated perimeter of the hand.
45
CHAPTER 4
46
47
CHAPTER - 4
4.1 RESULT
The Images Processed are classified into different classes of gestures and then further the
total perimeter of each gesture is saved as database in their respective classes. In
precedingstep Query Image is requested from the user and its features are matched
against the image features which are saved in the database.
Following are the results obtained from this work.
The following images shows the successful recognition of query image:
48
RATE of
SUCCESSFUL
RECOGNITION
GESTURE
INPUT
VALUE
IMAGE
20
20
100
20
20
100
20
20
100
20
16
80
20
13
60
CASES
From the above table it is observed that the accuracy of the system is 89%, i.e. there are
11 cases where there is a mismatch between the image input and the class to which it is
recognized.
49
4.2 CONCLUSION
From the above discussions we can conclude that, the technique applied for recognition
of the gestures works fine for chosen application, without using any artificial algorithms
for its recognition.
For 20 images of each gesture, the 1st Gesture value had 100% successful recognition
cases; the 2nd and 3rd gesture values also had 100% recognition cases; the 4th gesture value
however had 4 mismatches with the 3rd gesture value therefore recognized with 80% of
accuracy; similarly the 5th gesture value has 7 mismatches with the 4th gesture value with
an accuracy of 65%.
For the increase in the efficiency of the recognition algorithm used above the orientation
of the gesture images should be similar to each other, with a minute difference; another
important point that should be taken care of is that the algorithm is application specific
i.e. the threshold value for classifying the gesture needs to be revised for different
applications.
50
BIBLIOGRAPHY
[1]
Schlomer, T., Poppinga, B., Henze, N. & Boll, S. (2008), Gesture recognition with Wi-Fi
controller, in TEI 08: Proceedings of the 2nd international conference on Tangible and
embedded interaction, ACM, New York, NY, USA, pp. 1114.
[2]
[3]
[4]
[5]
[6]
51
[7]
[8]
[9]
[10]
[11]
Pei, M. (1984), The Story of Language, Plume; Rep Rev Edition. ISBN-13: 9780452008700.
[12]
[13]
[14]
0385055499.
[15]
[16]
McNeill, D. (1992), Hand and Mind: What Gestures Reveal about Thought,
University Of Chicago Press. ISBN: 9780226561325.
52
[17]
Noury, N., Barralon, P., Virone, G., Boissy, P., Hamel, M. &Rumeau, P. (2003), A
smart sensor based on rules and its evaluation in daily routines, in Engineering in
Medicine and Biology Society, 2003. Proceedings of the 25th Annual
International Conference of the IEEE, Vol. 4, pp. 32863289.
[18]
Bourke, A., OBrien, J. & Lyons, G. (2007), Evaluation of a threshold-based triaxial accelerometer fall detection algorithm, Gait & Posture 26(2), 194199.
[19]
[20]
[21]
trajectories:
Condensation-based
recognition
of
gestures
and
Lee, Y. W. (2008), Application of the particle filter for simple gesture recognition,
in International Conference on Intelligent Computing, Vol. 5227 of Lecture
Notes in Computer Science, Springer Berlin / Heidelberg, pp. 534540.
[23]
[24]
Li, F. & Wechsler, H. (2005), Open set face recognition using transduction,
IEEE Trans. Pattern Analysis and Machine Intelligence 27(11), 16861697.
53
[25]
[26]
[27]
Ren, Y. & Zhang, F. (2009), Hand gesture recognition based on mebsvm, in Embedded Software and Systems, Second International Conference on,
Vol. 0, IEEE Computer Society, Los Alamitos, CA, USA, pp. 344349.
[28]
Cheng, Y. (1995), Mean shift, mode seeking, and clustering, IEEE Trans. Pattern
Analysis and Machine Intelligence 17(8), 790799.
[29]
Lee, H.-K. & Kim, J. H. (1999), An hmm-based threshold model approach for
gesture recognition, IEEE Trans. Pattern Analysis and Machine Intelligence
21(10), 961973.
[30]
Lu, W.-L. & Little, J. J. (2006a), Simultaneous tracking and action recognition
using the pca-hog descriptor, in Computer and Robot Vision, 2006. The 3rd
Canadian Conference on, Quebec, Canada, pp. 613.
[31]
Yamato, J., Ohya, J. & Ishii, K. (1992), Recognizing human action in timesequential images using hidden markov model, in International Conference on
Computer Vision and Pattern Recognition, pp. 379385.
[32]
[33]
54
[34]
[35]
[36]
[37]
Ju, S. X., Black, M. J., Minneman, S. & Kimber, D. (1997), Analysis of gesture
and action in technical talks for video indexing, Technical report, American
Association for Artificial Intelligence. AAAI Technical Report SS-97-03.
[38]
Motion
Capture
and
Analysis.
Computer
Vision
and
Image
Bhuyan, M. K., Neog, D. R. and Kar, M. K. Fingertip Detection for Hand Pose
Recognition. Int. J. On Computer Sc. and Engg. , 4(3) (March 2012), 501-511.
[40]
[41]
55
[42]
[43]
Mahmoud E., Ayoub A., Jorg A., and Bernd M., (2008). Hidden Markov ModelBased Isolated and Meaningful Hand Gesture Recognition, World Academy of
Science, Engineering and Technology 41.
[44]
[45]
Gonzalez, Rafael; Richard Woods, Digital Image Processing (3rd ed.). Upper
Saddle River, New Jersey: Pearson Education, Inc.. pp. 16568.
[46]
Shapiro, Linda; George Stockman. "5, 7, 10". Computer Vision. Upper Saddle
River, New Jersey: Prentice-Hall, Inc.. pp. 157 158, 215216, 299300.
[47]
[48]
[49]
[50]
Marr, D., Hildreth, E. "Theory of edge detection". Proc. R. Soc. Lond. B, 207,
187-217, 1980. [14] Rosenfeld, A., Thurston, M. "Edge and curve detection for
visual scene analysis". IEEE Trans. Comput., C-20, 562-569, 1981.
56
[51]
LS. Davis, "A survey of edge detection techniques", Computer Graphics and
Image Processing, vol 4, no. 3, pp 248-260, 1975.
57