You are on page 1of 58

Application

Specific Hand
Gesture
Recognition
System

Ankit Mishra
ELECTICAL ENGG. DEPT; GAUTAM BUDDHA
UNIVERSITY
ANKITMISHRA723@GMAIL.COM
+919971553633

ABSTRACT

Hand gesture recognition system can be used for interfacing between


computer and human using hand gesture. This work presents a technique for
a human computer interface through hand gesture recognition, the aim is to
recognize gestures and more generally static hand gestures accomplished by
an individual in a video sequence. Many techniques have already been
proposed in literature for gesture recognition in specific environment (e.g.
laboratory) using the cooperation of several sensors (e.g. camera network,
individual equipped with markers).
In this dissertation, a Gesture Recognition Method based on the Edge
Detection method as feature extraction is proposed. The proposed algorithm
works on the basic knowledge; as the number of fingers increase for a
gesture, the hand area also increases, and hence it can be used as an
important criteria for Recognition Systems.
The results of simulation clearly indicate that the recognition performance is
good only for a particular application and it needs to be modified if the
application

area

changes.

The current accuracy rate of the system is 89% with the particular intensity
level at which images were taken on MATLAB Environment.

IST OF FIGURES
1

Figure
No.

Caption

1.1

Proposed Algorithm for the Project

2.1

Taxonomy of Gesture categories

2.2

Instrumented Glove equipped with Potentiometer and Optic Fiber

2.3

Tools for Gesture Recognition: Clustering & Classifying Algorithms

2.4

Different Representation of Gestures

3.1

A system Architecture

3.2

(a) Original image; (b) converted Greyscale Image.

3.3

Edge detection Illustration

3.4

Edge detection using pre-defined operators

3.5

Edge Neighborhood

3.6

Sobel Operator Neighborhood

3.7

Prewitt Operator Neighborhood

3.8

Laplassian of Gaussian 5X5 mask

Page
No.

3.9

Gradient Generating Approach

3.10

Convolution mask used in the Project

3.11

Final image after convlouting with the mask

LIST OF TABLES

Table
No.

Particulars

2.1

Comparison between contact device and vision device

4.1

Resut

Page
No.

LIST OF ABBREVIATIONS

HCI

Human Computer

LDA

Linear Discriminant Analysis

MMI

Man-Machine Interaction

DoF

Degree of Freedom

HOG

Histogram of Oriented Gradient

ISM

Implicit Shape Model

CONTENT
CANDIDATE'S DECLARATION

ACKNOLEDGEMENT

ii

DEDICATION

iii

ABSTRACT

iv

LIST OF FIGURES
LIST OF TABLES
LIST OF ABBREVIATIONS
CONTENTS
Chapter 1 INTRODUCTION

v
vii

viii
ix
1

1.1 General
1.1.1 Computer Vision
1.1.2 Behaviour Understanding
1.1.3 People Detection and Body Part Detection
1.1.4 People and Body Part Tracking
1.1.5 Posture Detection
1.1.6 Proposed Algorithm
1.2 Work Objective and Motivation
1.3 Organization of Dissertation Work

2
3
3
4
4
5
6
7

Chapter 2 LITERATURE REVIEW

8
9
10

2.1 Introduction
2.2 Definition and Nature of Gesture

10

2.3 Technology Available for Recognition

13

2.4 Advantage and Disadvantage of both Technologies

16

2.5 Tools for Gesture Recognition

17

2.6 Gesture Representation

18

Chapter 3 GESTURE MODELLING

23

3.1 Introduction

24

3.2 Acquisition of Data

24

3.3 Gesture Modelling

27

3.4 Image Preprocessing

28

3.5 Edge Detection

30

3.5.1 Generating Gradient Images using Predefined Filters

32

3.5.2 Approach for Gradient

34

3.5.3 Edge Operator used in the Project

35

3.6 Feature Extraction

36

Chapter 4 RESULT CONCLUSION and FUTURE SCOPE

38

4.1 Result

39

4.2 Conclusion

41

Bibliography

42

ACKNOWLEDGEMENT

I would like to express my gratitude and thanks to my supervisor, Dr.


Shabana Urooj, Assistant Professor, School of Engineering, Gautam
Buddha University, for her valuable guidance, and constant support
throughout my project work which provided the much needed continuity for
its completion.
Next, I want to express my Respects to Prof. Omar Farooq, Department of
Electronics Engineering, Aligarh Muslim University, for his constructive
suggestions, healthy criticism, devoted attitude and expert opinions in the
area. His valuable guidance and impeccable support led me to complete the
work present in this dissertation. I will be always indebted to him for his
support.
I would also like to thank my classmates Abhishek Pandey, IshitaBahal,
Jagrati Arya and especially my friends AnandBharadwaj, Ayush Mittal,
Ankit Gupta, Vaibhav Kaushik, MohitAttri, Uttkarsh Yadav,
SumitTarishPratap and Vikrant Rana for all the thoughtful and mind
stimulating late night discussions we had, which prompted us to think
beyond the obvious. Ive enjoyed their companionship so much during my
stay at GBU, Greater Noida.
Ankit Mishra

CANDIDATES DECLARATION

I hereby declare that the work embodied in this dissertation report entitled Application
Specific Gesture Recognition System submitted in the partial fulfilment of the
requirements for the award of the degree of5 Year Integrated Dual Degree
ProgrammeB. Tech. (Electrical Engineering) + M. Tech. (Instrumentation &
Control),to the School of Engineering, Gautam Buddha University, Greater Noida, U.P.,
India, is an authentic record of my own work carried out from December 2014 to May
2015 under the guidance and supervision ofDr.Shabana Urooj, Assistant Professor,
Department of Electrical Engineering, School of Engineering, Gautam Buddha
University, Greater Noida, U.P., India and the co-supervision of Prof. Omar Farooq,
Department of Electronics Engineering, Aligarh Muslim University, Aligarh, U.P., India.
To the best of my knowledge the matter embodied in this dissertation report has not been
submitted for the award of any other degree or diploma.
Date:
Place: Greater Noida

(Ankit Mishra)

CERTIFICATE
This is to certify that the above statement made by the candidate is correct to the best of
my knowledge and belief.

(Dr. ShabanaUrooj)
Department of Electrical Engineering
School of Engineering
Gautam Buddha University

(Prof. Omar Farooq)


Department of Electronics Engineering
Aligarh Muslim University

The Viva-Voce Examination of Mr. Ankit Mishra has been conducted/held on


9

(External Examiner)

/ /2015.

Dedicated to

My Parents for their


Immense Moral Support

To whom it may concern!

Keep reaching for that Rainbow!!

10

CHAPTER 1
INTRODUCTION

11

CHAPTER - 1
INTRODUCTION
1.1 General
Gesture was the first mode of communication for the primitive cave men and nowadays
gesture recognition has been a prominent domain of research. Gestures are an important
form of human interaction and communication: hands are usually used to interact with
things (pick up, move) and our body gesticulates to communicate with others (no, yes,
stop). Thus, a wide range of gesture recognition applications has been experienced up to
now thanks to a certain level of maturity reached by sub-fields of machine intelligence
(Machine learning, Cognitive Vision, Multi-modal monitoring). For example, humans can
interact with machine through a gesture recognition device Wii-mote [1] , Cyber Glove
[2] and Multi-touch screen [3]. Nonetheless, contact device based methods are intrusive
and require the user cooperation to use correctly the device. Therefore, vision-based
methods propose to overcome these limits and allow the recognition of gestures remotely
with or without slight user cooperation (e.g. body markers, cloth conditions). Since it is
preferable to avoid these constraints, vision based methods have to overcome several
challenges such as illumination changes, low-contrasted or/and noisy videos.
Nevertheless, methods based on cameras tend to be brittle and less precise than the ones
based on contact devices.
The main challenge of vision-based gesture recognition is to cope with the large variety
of gestures. Recognizing gestures involves handling a considerable number of degrees of
freedom (DoF), huge variability of the 2D appearance depending on the camera view
point (even for the same gesture), different silhouette scales (i.e. spatial resolution) and
many resolutions for the temporal dimension (i.e. variability of the gesture speed).

12

The main concepts related to the topic of gesture recognition from video sequence are:
computer vision, behaviour understanding, people and body part detection, people and
body part tracking, and posture detection.

1.1.1 Computer Vision


Computer Vision also called Machine Vision (when focusing on industrial applications)
is the broader research field of gesture recognition. On the frontiers of artificial
intelligence, Machine Learning (Cognitive Vision), and Image/Signal Processing, it aims
at developing artificial systems that analyze and understand video streams (i.e. sequence
of images) or static images which are not, generally, intended to emulate human vision.
Computer vision is considered as the cross-road of several research fields: Mathematics
(Geometry, Statistical Analysis, and Optimization problems), Physics (Optics), Imaging
(smart cameras), Robotics (robot vision), and Neurobiology (biological vision).
Biological inspired methods (e.g. attentional vision) are also part of computer vision.
Algorithms in computer vision are generally categorized into three levels: (1) low-level
vision algorithms (i.e. image processing related directly to pixels without deep analysis),
(2) middle-level vision algorithms (i.e. pattern matching, object detection and tracking)
and (3) high-level vision algorithms (i.e. interpretation and semantics extraction from
images).

1.1.2 Behavior Understanding


Behaviour Understanding is the sub-field of computer vision that is interested in
detecting events and activities of human beings. Activities and events can be primitive or
complex. This includes both detection of events such as intrusion in safety zone (e.g.
forbidden access) and probably reacting (e.g. firing an alarm, calling police). Gesture
recognition is the branch of behavior understanding that focuses on any human body
motion in order to recognize this gesture as a meaningful behavior for later analysis.
Algorithms for behavior understanding belong to high-level vision.

13

1.1.3 People Detection and Body Part Detection


People detection and body part detection is concerned with detection of people and/or
their body parts. These middle-level algorithms require often low-level algorithms like
background updating and foreground segmentation. The main challenge of people
detection is to cope with different styles of clothing, various types of posture and
occlusions (partial/total with static objects or with other people). The three main
categories of people detectors are: (1) Holistic Detectors, (2) Part-based Detectors and (3)
Hybrid Detectors using both global and local cues. In holistic detection, a global search in
the whole frame is performed. People are detected when the features, considered around a
local window, meet some criteria. Global features can be used such as edge template [4]
or also local features like Histogram of oriented Gradient (HoG) [5]. Concerning partbased methods, people are considered as collections of parts. Part hypotheses are
generated by learning local features such as edgelet features [6], orientation features [7].
Then the part hypotheses are joined to form the best assembly of people hypotheses. Note
that the task of part detection (e.g. face, arm, legs) is challenging and difficult. As for
hybrid detectors, they combine both local descriptors and their distribution within a
global descriptor. For instance, [9] introduce the Implicit Shape Model (ISM) where the
global descriptor is the silhouette composed of edges matching with a learned model. In
the training process, a codebook of local appearance is learned. During the detection
process, extracted local features are used to match against the codebook entries, and each
match casts a vote for the pedestrian hypotheses which are refined to obtain the final
detection results.

1.1.4 People and Body Part Tracking


People and body part tracking consist of matching the people or their part during a lap of
time (i.e. several frames). This is a wide domain of research in computer vision and is of
paramount importance in gesture recognition. The motion can be local (e.g. motion of
14

feature points, motion of body parts) or global (e.g. the whole body motion signature).
The main goal is to extract people motion features in order to analyze them for gesture
recognition. Once the movement of the body or its parts is detected, computations are
made to identify the type of motion which are known as the motion analysis step. This
analysis may be then used by different middle-level algorithms: object tracker (when we
deal with object motion) and gesture recognition (when we deal with object and body part
motion).

1.1.5 Posture Detection


Posture detection can be viewed as a sub-field of gesture recognition since a posture is a
static gesture. In practice, posture recognition is usually at the crossroad between
people detection and gesture recognition. Sometimes we are only interested in the posture
at a given time which can be performed by a people detector [11]. In other cases, posture
detection can be sometimes considered as a first step for gesture recognition, for instance,
by associating postures to states of a Finite State Machine (FSM) [10]. The challenges of
posture recognition are seamlessly the same as gesture recognition except thatthe
temporal aspect is not accounted. Like the equivalent trade-off in gesture recognition, an
adequate balance between accuracy, precision and processing time is usually difficult to
find.

1.1.6 Proposed Algorithm

15

Figure 1.1: Flowchart of the Project Approach


The camera is made to monitor and record the hand developments continually. The
system framework is at first in standby mode. A particular triggering is passed to the
system framework to initialize the process. This indication should be some suitably
chosen previously specified gesture. On activation, the camera performs image
acquisition for subsequent processing.
Before moving forward with the processing of image we performed movement detection
using the frames of the recorded gesture sequence, to discard intermediate and
inappropriate frames between two successive, legitimate gestures. In this way just those
edges in which hand is static for at least a certain amount of time is used for image
processing, and all others with motion or blurring are tossed. This time interim is entirely

16

reliant on the convenience of the user. Considering only the frames of primary
importance i.e. having a frame with good gesture, s kept as required by the system, and
every other gesture is unneeded.

1.2 Work Objective and Motivation


Gesture recognition from video sequences is one of the most important challenges in
computer vision and behavior understanding since it offers to the machine the ability to
identify, recognize and interpret the human gestures in order to control some devices, to
interact with some human machine interfaces (HMI) or to monitor some human activities.
Generally defined as any meaningful body motion, gestures play a central role in
everyday communication and often convey emotional information about the gesticulating
person. During the last decades, researchers have been interested to recognize
automatically human gestures for several applications.
This Dissertation project aims at building a simple Application based gesture recognition
system, which does not include the complexity of Artificial Intelligent Algorithms yet
gives an accuracy of 89%. Here the Environment consist of a physical world in which a
sensor (here a camera) interfaces with the software for processing of Images.
The main challenge of vision-based gesture recognition is to cope with the large variety
of gestures, which can be generated due to change in a number of parameters (e.g. Light
Intensity, Frame sequence speed etc.). Recognizing gestures involves handling a
considerable number of degrees of freedom (DoF), huge variability of the 2D appearance
depending on the camera view point, different silhouette scales and many resolutions for
the temporal dimension.

1.3 Organization of the Dissertation work

17

Chapter 1 In this section, we overview the remaining contents of this dissertation which
is structured into four main chapters. The next chapter presents the state-of-the-art of
human gesture recognition. The proposed method is over-viewed and detailed in chapters
3, 4 and 5. The seven and last chapter consists of a conclusion where a review of the
contributions and an overview of perspectives are presented.
Chapter 2 recalls the previous work on gesture recognition by presenting an up-to-date
state-of-the-art. After a brief presentation of the types of gesture, and the technologies
available currently for the recognition of these Gestures.
Chapter 3 presents a broader view of how the work has been approached with. First
there is a brief discussion on the types of Acquisition devices available and then the preprocessing stages are briefly explained.
Chapter 4 presents the Result and Conclusion of this project work.

CHAPTER 2
18

LITERATURE REVIEW

19

CHAPTER - 2
LITERATURE REVIEW

2.1 Introduction
Human gesture recognition consists of identifying and interpreting automatically human
gestures using a set of sensors (e.g. cameras, gloves). Here we present a Literature review
of state-of-the-art in human gesture recognition which includes gesture representations,
recognition techniques and applications. Before we proceed with the Literature of gesture
recognition, it is important to understand the definition and the nature of gesture as seen
by the literature.

2.2 Definition and Nature of Gesture


Generally speaking, we can define a gesture as a body movement. A gesture is a nonvocal communication, used instead of or in combination with a verbal communication,
intended to express meaning. Gesture constitute a major and an important mean of human
communication. Indeed, [11] enumerates seven hundred thousand of non-verbal
communication signals which include fifty thousand two hundred facial expressions [12]
and five thousand of hand gestures [13]. However, the significance of a gesture strongly
differs from a culture to another: there is no invariable or universal meaning for a gesture
i.e. this implies that the semantic interpretation of a gesture depends strictly on the given
culture. In addition, a gesture can be dependent on an individual state: for example, hand
gestures are synchronous and co-expressive with speech, glance and facial expressions
which reflect the individual mood. According to [14], when two people engage a
discussion, thirty five per cent of their communiction is verbal and sixty five per cent is

20

non-verbal.

Non-verbal form of communication can be classified into seven different

categories.
1. Body language: facial expressions, postures, eye gaze (e.g. amount of gaze, frequency
2.
3.
4.
5.
6.

of glances, visual contact, patterns of fixation, blink rate), gestures, attitude.


Appearance: cloth, personal effects (e.g. jewelry, sunglasses).
Voice: pitch, tone, intonation, loudness, flow and pause, silence, laughter.
Space and distance: proxemics and proxemic behaviour categories.
Colors: cold or hot colors, color interpretation
Chronemics (relation to time): punctuality, willingness to wait, speed of speech,

willingness to listen, monochronic time schedule, polychronic time schedule.


7. Haptics: touching as non-verbal communication depends on the context of the
situation, the relationship between communicators and the manner of touch. Touching
is a particular type of gesture: handshakes, holding hands, kissing (cheek, lips, and
hand), high fives, licking, scratching.

Gestures can be categorized with respect to different criteria. For instance, [15]
distinguishes five types of gestures:
1. Emblems: an emblem (or quotable gesture or emblematic gesture) is a gesture which
can be directly translated into short verbal communication such as goodbye wave in
order to replace words. These gestures are very culture-specifics.
2. Illustrators: an illustrator is a gesture that depicts what the communicator is saying
verbally (e.g. emphasis a key-point in the speech, illustrates a throwing action when
pronouncing the words he threw). These gestures are inherent to the
communicators thoughts and speech. Also called gesticulations, they can be
classified into five subcategories as proposed by [16]:

Beats: rhythmic and often repetitive flicks (short and quick) of the hand or the

fingers.
Deictic gestures: pointing gestures which can be concrete (pointing to a real
location, object or person) or abstract (pointing abstract location, period of
time).
21

Iconic gesture: it consist of hand movements that depict a figural


representations or an action (hand moving upward with wiggling fingers to

depict tree climbing).


Metaphoric gestures: gestures depicting abstractions.
Cohesive gestures: they are thematically related but temporally separated
gestures due generally to an interruption of the current communicator by
another one.

3. Affect displays: an affect display is a gesture that conveys emotion or


communicators intentions (e.g. if the communicator is embarrassed). This type of
gesture is less dependent on the culture.
4. Regulators: a regulator is a gesture that controls interaction (e.g. control turn-taking in
conversation).
5. Adaptors: an adaptor is a gesture that enables the release of body tension (e.g. head
shaking, quickly moving ones leg). These gestures are not used intentionally during a
communication or interaction: they were at one point used for personal convenience
and have turned into a habit though.
A Gesture can be also conscious (intentional) or non-conscious (reflex, adaptors). In
addition, a gesture can be dynamic or static. In the latter case, the gesture becomes a
posture. Finally, we can classify gestures according to the body parts involved in the
gesture: (1) hand gestures, (2) head/face gestures and (3) body gestures. In our work we
focus on static gestures of hand.

22

Figure 2.1: A Taxonomy of Gesture Categories [43]

2.3 Technology available for Recognition


There are two main kinds of devices: (1) contact-based devices and (2) vision-based
devices. Hereafter we discuss the two kinds of devices.
1. Contact based Device: Contact-based devices are various: accelerometers, multitouch screen, instrumented gloves are, for instance, the main used technologies. Some
devices, like Apple c iPhone R, include several detectors: multi-touch screen and an
accelerometer for instance. Other devices use only one detector: the accelerometers of
the Nintendo c Wi-Fi-mote R. Therefore, we can categorize these devices into five
categories:

Mechanical: for instance, Immersion c proposes the CyberGlove II R which


is a wireless instrumented gloved for hand gestures recognition. Animazoo c
proposes a body suit called IGS-190 R to capture body gestures. This kind
of device is usually used in association with other types of device. For
instance, [2] introduce a method for trajectory modelling in gesture
recognition with cybergloves and magnetic trackers. Similarly, the IGS-190
body suit is coupled with eighteen inertial devices (gyroscopes) which enable
motion detection.
23

Inertial: these devices measure the variation of the earth magnetic field in
order to detect the motion. Two types of device are available: accelerometers
(e.g. WiFi-mote R) and gyroscopes (e.g. IGS-190 R). [1] propose to recognize
gestures with a Wi-Fi-controller independently from the target system using
Hidden Markov Models (HMM). The user can learn personalized gestures for
multimodal intuitive media browsing. [17] and [18] propose to detect fallings

among normal gestures using accelerometers.


Haptics: multi-touch screens become more and more common in our life (e.g.
tablet PC, Apple c iPhone R ). [3] propose to recognize multi-touch gestural

interactions using HMM.


Magnetics: these devices measure the variation of an artificial magnetic field
for motion detection. Unlike inertial devices, magnetic devices have some

health issues due to the artificial electro-magnetism.


Ultra-sonic: motion trackers from this category are composed of three kinds of
device: (1) sonic emitters that send out the ultrasound, (2) sonic discs that
reflect the ultrasound (wired by the person) and (3) multiple sensors that time
the return pulse. The position is computed according to the time of
propagation/reflection and the speed of sound. The orientation is then
triangulated. These devices are not precise and have low resolution but they
are useful for environment that lacks light and has magnetic obstacles or
noises

Figure 2.2: Instrumented Glove equipped with potentiometer and Optic Fiber[44]

24

2. Vision Based Technology: Vision-based gesture recognition systems rely on one or


several cameras in order to analyze and interpret the motion from the captured video
sequences. Similarly to contact-devices, vision-based devices are various. For
instance, we can distinguish the following sensors:

Infrared cameras: typically used for night vision, the infrared cameras give

generally a brittle view of the human silhouette.


Traditional monocular cameras: the most common cameras due to their
cheaper cost. Specific variant can be used such as fish-eyes cameras for wideangle vision and time of-flight cameras for depth (distance from the camera)

information.
Stereo cameras: stereovision delivers directly a 3D world information by

embedding the triangulation process.


PTZ cameras: Pan-Tilt-Zoom cameras enable the vision system to focus on
particular details in the captured scene in order to identify more precisely its

nature.
Body markers: some vision systems require to place body markers in order to
detect the human body motion. There are two types of marker: (1) passive
such as reflective markers shining when strobes hit and (2) active such as
markers flashing LED lights (in sequence). In such system, each camera,
lighting with strobe lights or normal lights, delivers 2D frames with marker
positions form its view. Eventually, a pre-processing step is charged of
interpreting the views and position into 3D space.

2.4 Advantage and Disadvantage of both Technologies


Both of the enabling technologies have its pros and cons. For instance, contact-devices
require the user cooperation and can be uncomfortable to wear for long time but they are
precise. Vision-based devices do not require user cooperation but they are more difficult
to configure and suffer from occlusion problem on the other hand contact-devices are
more precise except the utrasonics. Also, they generally do not have occlusion problems
25

except the magnetic sensors (metal obstacles) and ultrasonic sensors (mechanical
obstacles). Concerning health issues, we notice that some contact-devices can rise some
problems: allergy for the mechanical sensor material, cancer risk for magnetic devices.
Table 2.1: Comparison between Contact and Vision based device
Criterion

Contact-devices

Vision-devices

User cooperation
User intrusive
Precise
Flexible to

Yes
Yes
Yes/No
Yes

No
No
No/Yes
No

configure
Flexible to use
Occlusion

No
No (Yes)

Yes
Yes

problem
Health issues

Yes (No)

No

2.5 Tools for Gesture Recognition


The problem of gesture recognition can be generally divided in two sub-problems: (1) the
gesture representation problem (c.f. next section) and (2) the decision/inference problem.
Independently from the used device and the gesture representation, several tools for
decision/inference can be applied to gesture recognition. To detect static gestures (i.e.
postures), a general classifier or a template-matcher can be used. However, dynamic
gestures have a temporal aspect and require tools that handle this dimension like Hidden
Markov Models (HMM) unless the temporal dimension is modelled through the gesture
representation (e.g. motion based model).

26

Here we review only the three most common ones on which the current research is going
on: (1) particle filtering and condensation algorithm, (2) learning algorithms for statistical
modelling and (3) automata-based approaches (such as Finite State Machines (FSM)).

Particle Filtering and Condensation Algorithm for Gesture Recognition


The goal of particle filtering, also called Sequential Monte Carlo method (SMC), is a
probabilistic inference of the object motion given a sequence of measurements.
Introduced by [19], condensation (i.e. Conditional Density Propagation) is an
improvement of particle filtering for visual tracking which has been extended to
gesture recognition [20] and [21]. The main idea behind condensation is to estimate
the future probability density by sampling from the current density and weighting the
samples by some measures of their likelihood. Recently, [22] extends the latter
method to the two hand motion models. The author describes the state of a particle at
a given time by four parameters: the integer index of the predictive model, the current
position in the model, a scaling factor of amplitude and a time-dimension scale factor.
The three latter parameters are duplicated to take into account the motion of each
hand. The recognition of gesture is done through three filtering stages: initialization,
prediction and updating. A motion model, consisting of the average horizontal and
vertical projections of the object velocities, is associated with the filtering process in
order to recognize gestures.

Learning Algorithms for Gesture Statistical Modelling


Learning algorithms are essentially used for feature extraction based methods. There
are two main variants of learning algorithms: (1) linear learner and (2) non-linear
learner. The former is suited for linearly separable data and the latter for the other
cases. Another way to categorize learning algorithms is to consider their outcome.
Thus, we distinguish supervised learning (i.e. matching samples to labels),
unsupervised learning (i.e. only sample clusters without labels), semi-supervised
learning (i.e. mix of labelled and un-labelled data), reinforcement learning (i.e. learns
policies given observations, [23], transduction (i.e. supervised learning with

27

prediction, [24] and learning to learn (i.e. learns his own inductive bias based on
previous experience, [25] [26]. The choice of the learning algorithm depends mainly
on the chosen gesture representation. For example, [27] propose to recognize static
hand gestures by learning the contour lines Fourier descriptor of a segmentation
image obtained by mean shift algorithm [28]. The classification is done by a support
vector machine combined with the minimum enclosing ball (MEB) criterion.

Automata-based Approaches
With learning algorithms, automata-based methods are the most common approaches
in the literature. For instance, FSMs, HMMs, PNF (i.e. Past-Now-Future network) are
sort of automaton with a set of states and a set of transitions. The states represent
static gestures (i.e. postures) and transitions represent allowed changes with temporal
and/or probabilistic constraints. A dynamic gesture is then considered as a path
between an initial state and a final state. [29] Proposed an approach for gesture
recognition using HMM based threshold. [30] presented a method for recognizing
human gestures using PCA-HOG global descriptor. The recognition is done by
maximum likelihood estimation
using HMM classifier proposed by [31]. [32] detected human actions using
PNFpropagation of temporal constraints. The main limitation of the approaches based
on automata is that the gesture model must be modified when a new gesture needs to
be recognized. Moreover, the computational complexity of such approaches is
generally huge since it is proportional to the number of gestures to be recognized
which is not the case for methods based on other tools.

28

Figure 2.3: Tools for Gesture Recognition: Clustering & Classifying Algorithms [43]

2.6 Gesture Representation


Several gesture representations and models have been proposed to abstract and model
human body parts motion. We distinguish two main categories of method: (1) 3D model
based methods and (2) appearance based methods. Moreover, we can split the proposed
models in two kinds according to the spatial and temporal aspects of gestures: (1) posture
automaton models in which the spatial and the temporal aspects are modelled separately
and (2) motion models in which there is a unique spatial-temporal model.

29

Figure 2.4: Different Representation of Gestures[43]


1. 3D Model based Methods: Contact-based devices are various: accelerometers, multitouch screen, instrumented gloves are, for instance, the mainA 3D model defines the
3D spatial description of the human body parts. The temporal aspect is generally
handled by an automaton which generally divides the gesture time into 3 phases [16]:
(1) the preparation or prestroke phase, (2) the nucleus or stroke phase and (3) the
retraction or poststroke phase. Each phase can be represented as one or several
transition(s) between the spatial states of the 3D human model. The main advantage
of 3D model based methods is to recognize gestures by synthesis: during the
recognition process, one or more camera(s) are looking at the real target and then
compute the parameters of the model that matches spatially the real target and then
follows the latter motion (i.e. update the model parameter and check whether it
matches a transition in temporal model). Thus, the gesture recognition is generally
precise (specially the start and the end time of the gesture). However, these methods
tend to be computationally expensive unless implemented directly in dedicated
hardware. Some methods (e.g. [33]) combined silouhette extraction with 3D model
projection fitting by finding the target self-orientation. Generally, three kinds of
model are usually used:

30

Textured kinematic/volumetric model: these models contain very high details of

the human body: skeleton and skin surface information.


3D geometric model: these models are less precise than the formers in terms of
skin information but still contain essentially skeleton information.

3D skeleton model: these are the most common 3D models due to their simplicity
and higher adaptability: The skeleton contains only the information about the
articulations and their 3D degree of freedom (DoF).

2. Appearance-Based Methods
A Concerning appearance-based methods, two main sub-categories exist: (1) 2D
static model based methods and (2) motion-based methods. Each sub-category
contains several variants. For instance, the most used 2D models are:

Color-based models: methods with this kind of model use generally body markers
to track the motion of the body or the body part. For example, [34] propose a
method for hand gesture recognition using multi-scale colour features,
hierarchical models and particle filtering.

Silhouette geometry based models: such models may include several geometric
properties of the silhouette such as perimeter, convexity, surface, compacity,
bounding box/ellipse, elongation, rectangularity, centroid and orientation. [35]
used the geometric properties of the bounding box of the hand skin to recognize
hand gestures.

Deformable gabarit based models: they are generally based on deformable active
contours (i.e. snake parametrized with motion and their variants [36]. [37] used
snakes for the analysis of gestures and actions in technical talks for video
indexing.

31

CHAPTER 3
GESTURE MODELLING

32

CHAPTER - 3
GESTURE MODELLING

3.1 Introduction
The Association for Computing Machinery defines human-computer interaction as "a
discipline concerned with the design, evaluation and implementation of interactive
computing systems for human use and with the study of major phenomena surrounding
them.
Gesture recognition is an important, yet difficult task. It is important because it is a
versatile and intuitive way to develop new, more natural and more human-centered forms
of human-machine interaction. Moreover, it is difficult because it involves the solution of
many challenging subtasks such as robust identification of hands and other body parts,
motion modeling, tracking, pattern recognition and classification.
A human hand is an articulated object with 27 bones and 5 fingers. Each of these fingers
consists of three joints. Human hand joints can be classified as flexion, twist, directive or
spherical depending up on the type of movement or possible rotation axes. In total human
hand has approximately 27 degrees of freedom. As a result, a large number of gestures
can be generated [38][39]. As tabulated in figure 2.1 and different approaches that can be
used for these different kinds of gesture are also tabulated in figure 2.3.

3.2 Acquisition of Data


The first stage of any vision system is the image acquisition stage. Only after the image
has been satisfactorily obtained can the different approaches be successfully applied.
However, if the image has not been acquired satisfactorily then the intended tasks may
not be achievable, even with the aid of some form of image enhancement.
33

Figure 3.1: System Architecture


There are a number of input devices for data acquisition.
A first choice for a two-dimensional image input device may be a television camera -output is a video signal:

Image focused onto a photoconductive target.

Target scanned line by line horizontally by an electron beam

Electric current produces as the beam passes over target.

Current proportional to the intensity of light at each point.

Tap current to give a video signal.


However this form of device has several disadvantages like:

Limited Resolution: finite number of scan lines (about 625) and frame rate (30 to

60) frames/sec.
Distortion: Nonlinear video output with respect to light intensity

By far the most popular two-dimensional imaging device is the charge-coupled device
(CCD) camera.

Single IC device
Consists of an array of photosensitive cells
Each cell produces an electric current dependent on the incident light falling on it.
Video Signal Output
34

Less geometric distortion


More linear Video output.

For a 3-Dimensional image a much complex acquisition devices are used:

Laser Ranging Systems: Laser ranging works on the principle that the surface of
the object reflects laser light back towards a receiver which then measures the
time (or phase difference) between transmission and reception in order to
calculate the depth. Most work at long distances and therefore have inadequate
depth resolution.

Methods based on shape from shading employ photometric stereo techniques to produce
depth measurements. Using a single camera, two or more images are taken of an object in
a fixed position but under different lighting conditions. By studying the changes in
brightness over a surface and employing constraints in the orientation of surfaces, certain
depth information may be calculated.
Stereoscopy as a technique for measuring range by triangulation to selected locations in a
scene imaged by two cameras already, the primary computational problem of stereoscopy
is to find the correspondence of various points in the two images.
Data gloves are the devices for perfect data input with high accuracy and high speed. It
can provide accurate data of joint angle, rotation, location etc. for application in different
virtual reality environments. These gloves are commercially available in the market.
These have already been discussed in chapter 2.
Coloured markers attached to the human skin are also used as input technique and hand
localization is done by the colour localization.
Low cost web cameras as preferred in this project work can also be used as an Input
devices.

3.3 Gesture Modelling


After acquiring the data from input device, the next step involves the modelling of hand
which further includes steps as described in the Fig 2.
35

Segmentation is the process of dividing the Input Image into regions separated by
boundaries [42]. The segmentation process depends on the type of gesture, if it is
dynamic gesture then the hand gesture need to be located and tracked [42], if it is static
gesture (posture) the input image have to be segmented only.
To locate the hand, generally a bounding box is used to specify the depending on the skin
color [43]; and for tracking the hand there are two main approaches;
either the video is divided into frames and each frame have to be processed alone, or
using some tracking information such as shape, skin color using tools like Kalman Filter
[42].
In [42], hand is segmented using the skin colour, its the easiest possible way as skin
colour is invariant to translation and rotation changes. Gaussian Model is the paametric
technique and Histogram based technique is non-parametric technique.
Drawback of skin segmentation is that it is affected by illumination condition changes.
In [44] segmentation is done using infrared cameras and range info generated by Timeof-flight (ToF) camera, these can detect different skin colours but are affected by change
in temperature.
Data Gloves and Coloured Markers can also be used for segmentation as they provide
exact info about orientation and position of palm and fingers.
The color space used in a specific application plays an essential role in the success of
segmentation process, however color spaces are sensitive to lighting changes, for this
reason, researches tend to use chrominance components only and neglect the luminance
components such as RG, and HS color spaces. There are factors that obstacle the
segmentation process which are: complex background, illumination changes, low video
quality.

3.4 Image Pre-Processing


Image Pre-processing is essential to enhance the image quality for better results. A
coloured image consist of three planes of colours Red-Green-Blue, this could be
36

considered as a 3D matrix in which 2 dimensions are for pixel value and the third
dimension is dedicated to the plane of colour.
The planes of each colour can be accessed separately using the following MATLAB
command:
>>IImage(Pixel:Pixel:Plane)
>>To Access Red Plane

IImage(:,:,1)

>>To Access Green Plane IImage(:,:,2)


>>To Access Blue Plane

IImage(:,:,3)

Images consumes large amount of memory space. To save memory space the images are
first resized using down sampling. The method preferred for this process is bicubic
interpolation, because in this method the yielded output pixel value is a weighted average
of pixels in the adjacent 4-by-4 neighborhood. The following MATLAB command by
default uses the bicubic interpolation.
>>IImage=imresize(IImage,[256 256]);
After the Resizing of Image the Region of Interest (ROI) is selected in the image, in
computer vision and optical character recognition (OCR), the ROI describes the borders
of the object placed under consideration and is a subset in an Image which contains the
information desired. The following command was used.
>>roipoly(IImage(:,:,1),uint8(c),uint8(r));
Roipoly creates an interactive tool for selecting the polygon, the mouse is used to identify
the region by selecting vertices of the polygon. This function returns a binary image as
the output which can be used as a mask for mask filtering if required. The functions are
discussed in detail in the documentation of Mathworks [42].

37

Figure 3.2: (a) Original Image; (b) Greyscale Image

3.5 Edge Detection


Edge Detection is a very important field in image processing and image segmentation.
Edges in digital images are areas with strong intensity contrasts and a jump in intensity
from one pixel to the next can create major variation in the picture quality. For those
38

reasons edges forms the outline of an object and also indicates the boundary between
overlapping objects. . These points when joined with a line forms the edge of the image;
J. Canny, 1986 [40]. Edge detection is a very important mathematical tool for feature
detection and feature extraction as discontinuities in image brightness corresponds to
different aspects in image such as discontinuities in depth; discontinuities in surface
orientation; change in material properties and variation in scene lighting.

Figure 3.3: Edge Detection Illustration


In the above figure consider a subset of pixel array with their intensity values, now
observing the intensity of the pixels we can say that there must be an edge existing
sandwiched between the 4th and the 5th pixel, as the difference between the intensity
values of the 4th, 5th pixel is large. Another importance of edge detecting an image is that
it shrinks the amount of data and filters the unwanted information protecting
simultaneously the important structural properties of the image [41].
The gradient [45][46][47] of the image is one of the fundamental building blocks in
image processing. The first- order derivative of choice in image processing is the
gradient. Mathematically, the gradient of a two-variable function (here the image
intensity function) at each image point is a 2D vector [47][48] with the components given
by the derivatives in the horizontal and vertical directions.Several edge detector operators
[49][51] are there for generating gradient images like sobel, prewitt, laplacianand
laplacian of Gaussian (LoG). These edge detectors work better under different conditions
[50][51].

39

Figure 3.4: Edge detection using predefined operators


Edges in digital images are areas with strong intensity contrasts and a jump in intensity
from one pixel to the next can create major variation in the picture quality. With the help
of first- and second- order derivatives such discontinuities are detected. The first- order
derivative of choice in image processing is the gradient. The gradient of a 2-D function,
f(x,y), can be approximated as the vector
f =|g 2x|+|g 2y|
These approximations still behave as derivatives; that is, they are zero in areas is contrast
intensity and their values are related to the degree of intensity change in areas of variable

40

intensity. It is common practice to refer the magnitude of the gradient or its


approximations simply as gradients.

3.5.1 Generating Gradient Images using Pre Defined Filters


An image gradient is a directional change in the intensity or color in an image. Image
gradients may be used to extract information from images. An example of small image
neighborhood is shown below.

Figure 3.5: Edge Mask Neighborhood

Sobel Operator: The sobel edge detector computes the gradient by using the
discrete differences between rows and columns of a 3X3 neighborhood. The sobel
operator is based on convolving the image with a small, separable, and integer
valued filter. In below a sobel edge detection mask is given which is used to
compute the gradient in the x (vertical) and y (horizontal) directions.

Figure 3.6: Sobel Operator neighborhood


The magnitude of the gradient is calculated using the below given formula.

41

Prewitt Operator: Prewitt operator edge detection masks are the one of the
oldest and best understood methods of detecting edges in images The Prewitt edge
detector uses the following mask to approximate digitally the first derivatives Gx
and Gy. The following is a prewitt mask used to compute the gradient in the x
(vertical) and y (horizontal) directions.

Figure 3.7: Prewitt Operator neighborhood


The working of these 3X3 mask is quite simple, the mask is slid over an area of
image, it changes that pixels value and shifts one pixel right and it continues until
the end of row is reached; it starts again from the beginning of the next row. These
3X3 masks cannot manipulate the first and the last rows as well as the columns
because the mask will move outside the image boundary if placed over a pixel in
first or the last row and column [7].

Laplacian of Gaussian (LoG): This detector finds edges by looking for zero
crossings after filtering f(x, y) with a Laplacian of Gaussian filter. In this method,
the Gaussian filtering is combined with Laplacian to break down the image where
the intensity varies to detect the edges effectively. It finds the correct place of
edges and testing wider area around the pixel. In below a 5x5 standard Laplacian
of Gaussian edge detection mask is given

42

Figure 3.8: Laplacian of Gaussian5X5 mask

3.5.2 Approach for Gradient


The flowchart of the approach of generating gradient images is given below. At very
beginning a colored image is chosen and processed further using MATLAB v2012b here.
The image is converted into gray scale in the immediate step. A gray scale image is
mainly combination of two colors, black and white. It carries the intensity information
where, black have the low or weakest intensity and white have the high or strongest
intensity. Variation of this intensity levels forms the edges of object or objects. In final
step different edge detection operators are applied to detect the object boundaries and
gradients.

43

Figure 3.9: Gradient Generating Appproach

3.5.3 Edge Operator used in the Project


In this system design a 3X3 convolution mask was used to approximate the edges in our
Region of Interest. A MATLAB command fspecial( ) is used to create a two dimensional
filter h of an specified type and returns a correlational kernel H.
>>H = fspecial(gaussian);

Figure 3.10: Convolution Mask Used in the Project


44

Figure 3.11: Final Image after Convolution with Mask

3.6 Feature Extraction


Feature vectors of a segmented image can be extracted in different ways according to
their application. There are various methods that can be used for feature extraction. [8]
Created aspect ratio of bounding box parameter as a feature vector. [9] used self-growing
and self-organized neural gas (SGONG) algorithm to capture shape of the hand and Palm
center, Palm Region and Hand Shape was obtained. [10]
Calculated the Center of Gravity (COG) of segmented hand and distance of farthest point
of finger from COG is used to estimate the number of fingers.
In this project a new and simple Algorithm is proposed for the image database created.
The Algorithm uses the fact that perimeter of hand increases as the number of fingers
used for making a gesture increases i.e. the perimeter of gesture of 1 is greater than that
of gesture 2 and so on till gesture 5.
Therefore from the binary images obtained above, we calculated the perimeter by adding
all the 1s in the matrix of binary image i.e. the edge values and then images were
classified in the classes of different gesture.
>>totalsum(i-2)=sum(sum(BW2(:,:,i)));

The above MATLAB command sums up all the ones in the 2D matrix of the image giving
us the approximated perimeter of the hand.

45

CHAPTER 4
46

RESULT & CONCLUSION

47

CHAPTER - 4

4.1 RESULT
The Images Processed are classified into different classes of gestures and then further the
total perimeter of each gesture is saved as database in their respective classes. In
precedingstep Query Image is requested from the user and its features are matched
against the image features which are saved in the database.
Following are the results obtained from this work.
The following images shows the successful recognition of query image:

48

The following shows the unsuccessful cases of Recognition:

Table 4.1: Results obtained


No. of

RATE of

SUCCESSFUL

RECOGNITION

GESTURE

INPUT

VALUE

IMAGE

20

20

100

20

20

100

20

20

100

20

16

80

20

13

60

CASES

From the above table it is observed that the accuracy of the system is 89%, i.e. there are
11 cases where there is a mismatch between the image input and the class to which it is
recognized.

49

4.2 CONCLUSION
From the above discussions we can conclude that, the technique applied for recognition
of the gestures works fine for chosen application, without using any artificial algorithms
for its recognition.
For 20 images of each gesture, the 1st Gesture value had 100% successful recognition
cases; the 2nd and 3rd gesture values also had 100% recognition cases; the 4th gesture value
however had 4 mismatches with the 3rd gesture value therefore recognized with 80% of
accuracy; similarly the 5th gesture value has 7 mismatches with the 4th gesture value with
an accuracy of 65%.
For the increase in the efficiency of the recognition algorithm used above the orientation
of the gesture images should be similar to each other, with a minute difference; another
important point that should be taken care of is that the algorithm is application specific
i.e. the threshold value for classifying the gesture needs to be revised for different
applications.

50

BIBLIOGRAPHY

[1]

Schlomer, T., Poppinga, B., Henze, N. & Boll, S. (2008), Gesture recognition with Wi-Fi
controller, in TEI 08: Proceedings of the 2nd international conference on Tangible and
embedded interaction, ACM, New York, NY, USA, pp. 1114.

[2]

Kevin, N. Y. Y., Ranganath, S. & Ghosh, D. (2004), Trajectory modeling in


gesture recognition using cybergloves R and magnetic trackers, in TENCON
2004. 2004 IEEE Region 10 Conference, Vol. A, pp. 571574.

[3]

Webel, S., Keil, J. &Zoellner, M. (2008), Multi-touch gestural interaction in x3d


using hidden markov models, in VRST 08: Proceedings of the 2008 ACM
symposium on Virtual reality software and technology, ACM, New York, NY,
USA, pp. 263264.

[4]

Papageorgiou, C. &Poggio, T. (2000), A trainable system for object detection,


International Journal of Computer Vision 38(1), 1533.

[5]

Dalal, N. &Triggs, B. (2005), Histograms of oriented gradients for human


detection, in International Conference on Computer Vision and Pattern
Recognition, Vol. 1, IEEE Computer Society Press, San Diego, CA, USA, pp.
886893.

[6]

Wu, B. &Nevatia, R. (2005), Detection of multiple, partially occluded humans in


a single image by bayesian combination of edgelet part detectors, in ICCV 05:
Proceedings of the Tenth IEEE International Conference on Computer Vision
(ICCV05), Vol. 1, IEEE Computer Society, Washington, DC, USA, pp. 9097.

51

[7]

Mikolajczyk, K., Schmid, C. &Zisserman, A. (2004), Human detection based on a


probabilistic assembly of robust part detectors, in Computer Vision - ECCV
2004, Vol. 3021, Springer Berlin / Heidelberg, pp. 6982.

[8]

Leibe, B., Seemann, E. &Schiele, B. (2005), Pedestrian detection in crowded


scenes, in International Conference on Computer Vision and Pattern
Recognition, Vol. 1, pp. 878885.

[9]

Boulay, B. (2007), Human posture recognition for behaviour understanding, PhD


thesis, Universite de Nice-Sophia Antipolis.

[10]

Zuniga, M. (2008), Incremental Learning of Events in Video using Reliable


Information, PhD thesis, Universite de Nice-Sophia Antipolis.

[11]

Pei, M. (1984), The Story of Language, Plume; Rep Rev Edition. ISBN-13: 9780452008700.

[12]

Birdwhistell, R. L. (1963), The kinesic level in investigations of the emotions,


Knapp P H (ed.) Expression of the Emotions in Man II, 123139.

[13]

Krout, M. H. (1935), Autistic gestures: An experimental study in symbolic


movement, Psychological Monographs 46, 119120.

[14]

Hall, E. T. (1973), The silent language, Anchor Books. ISBN-13: 978-

0385055499.
[15]

Ottenheimer, H. J. (2005), The Anthropology of Language: An Introduction to


Linguistic Anthropology, Wadsworth Publishing. ISBN-13: 978-0534594367.

[16]

McNeill, D. (1992), Hand and Mind: What Gestures Reveal about Thought,
University Of Chicago Press. ISBN: 9780226561325.

52

[17]

Noury, N., Barralon, P., Virone, G., Boissy, P., Hamel, M. &Rumeau, P. (2003), A
smart sensor based on rules and its evaluation in daily routines, in Engineering in
Medicine and Biology Society, 2003. Proceedings of the 25th Annual
International Conference of the IEEE, Vol. 4, pp. 32863289.

[18]

Bourke, A., OBrien, J. & Lyons, G. (2007), Evaluation of a threshold-based triaxial accelerometer fall detection algorithm, Gait & Posture 26(2), 194199.

[19]

Isard, M. & Blake, A. (1996), Contour tracking by stochastic propagation of


conditional density, in European Conference on Computer Vision, Vol. 1064 of
Lecture Notes in Computer Science, Springer, pp. 343356.

[20]

Isard, M. & Blake, A. (1998), A mixed-state condensation tracker with automatic


model-switching, in International Conference on Computer Vision, Narosa
Publishing House, pp. 107112.

[21]

Black, M. J. & Jepson, A. D. (1998), A probabilistic framework for matching


temporal

trajectories:

Condensation-based

recognition

of

gestures

and

expressions., in European Conference on Computer Vision, Vol. 1406 of Lecture


Notes in Computer Science, Springer, pp. 909924.
[22]

Lee, Y. W. (2008), Application of the particle filter for simple gesture recognition,
in International Conference on Intelligent Computing, Vol. 5227 of Lecture
Notes in Computer Science, Springer Berlin / Heidelberg, pp. 534540.

[23]

Darrell, T. &Pentland, A. (1996), Active gesture recognition using partially


observable markov decision processes, in International Conference on Pattern
Recognition, Vol. 3, IEEE Computer Society Press, Washington, DC, USA, pp.
984989.

[24]

Li, F. & Wechsler, H. (2005), Open set face recognition using transduction,
IEEE Trans. Pattern Analysis and Machine Intelligence 27(11), 16861697.

53

[25]

Thrun, S. & Pratt, L. (1998), Learning to Learn, Kluwer Academic Publishers,


Norwell, Massachusetts, USA.

[26]

Baxter, J. (2000), A model of inductive bias learning, Journal of Artificial


Intelligence Research 12, 149 198.

[27]

Ren, Y. & Zhang, F. (2009), Hand gesture recognition based on mebsvm, in Embedded Software and Systems, Second International Conference on,
Vol. 0, IEEE Computer Society, Los Alamitos, CA, USA, pp. 344349.

[28]

Cheng, Y. (1995), Mean shift, mode seeking, and clustering, IEEE Trans. Pattern
Analysis and Machine Intelligence 17(8), 790799.

[29]

Lee, H.-K. & Kim, J. H. (1999), An hmm-based threshold model approach for
gesture recognition, IEEE Trans. Pattern Analysis and Machine Intelligence
21(10), 961973.

[30]

Lu, W.-L. & Little, J. J. (2006a), Simultaneous tracking and action recognition
using the pca-hog descriptor, in Computer and Robot Vision, 2006. The 3rd
Canadian Conference on, Quebec, Canada, pp. 613.

[31]

Yamato, J., Ohya, J. & Ishii, K. (1992), Recognizing human action in timesequential images using hidden markov model, in International Conference on
Computer Vision and Pattern Recognition, pp. 379385.

[32]

Pinhanez, C. &Bobick, A. (1997), Human action detection using pnf propagation


of temporal constraints, in International Conference on Computer Vision and
Pattern Recognition, pp. 898904.

[33]

Boulay, B. (2007), Human posture recognition for behaviour understanding, PhD


thesis, Universite de Nice-Sophia Antipolis.

54

[34]

Bretzner, L., Laptev, I. &Lindeberg, T. (2002), Hand gesture recognition using


multi-scale colour features, hierarchical models and particle filtering, in
Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE
International Conference on, pp. 405410.

[35]

Birdal, A. &Hassanpour, R. (2008), Region based hand gesture recognition, in


16-th International Conference in Central Europe on Computer Graphics,
Visualization and Computer Vision, pp. 1 7.

[36]

Leitner, F. &Cinquin, P. (1993), From splines and snakes to snake splines, in


Selected Papers from the Workshop on Geometric Reasoning for Perception and
Action, Springer-Verlag, London, UK, pp. 264281.

[37]

Ju, S. X., Black, M. J., Minneman, S. & Kimber, D. (1997), Analysis of gesture
and action in technical talks for video indexing, Technical report, American
Association for Artificial Intelligence. AAAI Technical Report SS-97-03.

[38]

Moeslund, T. B., Hilton, A. and Kruger, V. A Survey of Advances in Vision-Based


Human

Motion

Capture

and

Analysis.

Computer

Vision

and

Image

Understanding, 104(2) (2006), 90 126.


[39]

Bhuyan, M. K., Neog, D. R. and Kar, M. K. Fingertip Detection for Hand Pose
Recognition. Int. J. On Computer Sc. and Engg. , 4(3) (March 2012), 501-511.

[40]

J. Canny, 1986. A computational approach to edge detection, IEEE Trans. Pattern


Anal.Mach. Intell.,vol. PAMI-8, no.6, pp. 679-698.

[41]

Regina Lionnie, Ivanna K. Timotius, IwanSetyawan, 2011. An Analysis of Edge


Detection as a Extractor in Hand Gesture Recognition System based on Nearest
Neighbour, International Conf. on Electrical Enineering and Informatics.

55

[42]

N. Ibraheem, M. Hasan, R. Khan, P. Mishra, (2012). comparative study of skin


color based segmentation techniques, Aligarh Muslim University, A.M.U.,
Aligarh, India.

[43]

Mahmoud E., Ayoub A., Jorg A., and Bernd M., (2008). Hidden Markov ModelBased Isolated and Meaningful Hand Gesture Recognition, World Academy of
Science, Engineering and Technology 41.

[44]

Xingyan Li. (2003). Gesture Recognition Based on Fuzzy C-Means Clustering


Algorithm,Department of Computer Science. The University of Tennessee
Knoxville.

[45]

Gonzalez, Rafael; Richard Woods, Digital Image Processing (3rd ed.). Upper
Saddle River, New Jersey: Pearson Education, Inc.. pp. 16568.

[46]

Shapiro, Linda; George Stockman. "5, 7, 10". Computer Vision. Upper Saddle
River, New Jersey: Prentice-Hall, Inc.. pp. 157 158, 215216, 299300.

[47]

Dubrovin, B.A.; A.T. Fomenko, S.P. Novikov, Modern Geometry--Methods and


Applications: Part I: The Geometry of Surfaces, Transformation Groups, and
Fields (Graduate Texts in Mathematics) (2nd ed.), Springer, pp. 1417, 1991.

[48]

Haralick, R. K. "Zero-crossings of second directional derivative operator". SPIE


Proc. On Robot Vision, 1982.

[49]

Canny, J. F. "A variational approach to edge detection". Submitted to AAAI


Conference,Washington, D. C., September, 1983.

[50]

Marr, D., Hildreth, E. "Theory of edge detection". Proc. R. Soc. Lond. B, 207,
187-217, 1980. [14] Rosenfeld, A., Thurston, M. "Edge and curve detection for
visual scene analysis". IEEE Trans. Comput., C-20, 562-569, 1981.

56

[51]

LS. Davis, "A survey of edge detection techniques", Computer Graphics and
Image Processing, vol 4, no. 3, pp 248-260, 1975.

57

You might also like