You are on page 1of 6

Gesture based Control for Robotic Arm Edge

Abinaya Devarajan
1
, Dr.M.Sundarambal
2
Department of Electrical and Electronics Engineering
Coimbatore Institute of Technology, Coimbatore-641014.
Email-id:abinayaadevarajan@gmail.com
1
,
msundarambal@cit.edu.in
2
Contact no: +919790465205


Abstract Vision based techniques provide a natural way for
controlling robots. This paper aims at developing Dynamic
Gesture based interfaces for controlling OWI-535 arm edge.
The system recognizes the hand movement and the hand
gesture made and sends the command to the robot. The robotic
Gesture interface is built by using java and processing libraries
which does the hand detection in an accurate manner. The
Proposed System doesnt require training and it is a server less
system which doesnt need any stored images for gesture
analysis. Processing IDE is used for the image processing
dynamic image processing for the real-time hand detection.
JMyron Processing Libraries are used for the camera
connectivity.

Keywords: gesture, interpolation, dilation, Gaussian model, structuring
elements

1. INTRODUCTION

In the past, robots were controlled by human
operators using hand-controllers such as sensors or by
means of Electronic devices such as Remote. These devices
limit the speed and naturality of interaction .To overcome
the limitation of such Electromechanical devices, vision
based techniques are introduced. The vision based method
doesnt require wearing of any contact devices on hand, but
the use of video cameras for capturing of gestures. Gesture
recognition involves the interpretation of the given gestures
into the text form or the machine understandable form with
utmost accuracy. There are two different types of gestures
namely static gestures and the dynamic gestures. The static
gesture recognition system deals with the gesture
recognition based on the stored images or frames. The
dynamic gesture recognition is based on the movement and
the change of the hand posture. There is yet another gesture
recognition method called the 3D gesture recognition
method which is complicated and is not suitable for the real
time applications.
Hand gesture recognition generally involves
various stages like video acquisition, background
subtraction, hand detection, hand tracking, hand tracing,
feature extraction and gesture recognition. The gesture
recognition used in this paper is skin color based and convex
hull method finger tip identification which gives greater
accuracy and is faster.

2. RELATED WORK

Many researchers in the human computer
interaction and robotics fields have tried to control mouse
movement using video devices. However all of them used
different schemes for hand detection and tracking.
Asanterbai Malima et.al [11] developed an algorithm for
finger counting which is not so accurate and the blob
identification is not proper. G.Dudek et.al [8] developed a
visually controlled interface for the gesture based robot
programming, which takes in to account a sequence of
gestures. Mark Fiala et.al [5] developed a fiducial marker
system which does the marker recognition after stipulated
amount of training is undergone by the system. The easiest
approaches for the gesture detection are described by
Prateem et.al [5] which provided a way to analyze the
different gesture based techniques available and it also
provided way to design an efficient gesture based system
for controlling the robots. And the fiducial markers were
extensively used for the robot control and the navigation
process. And it requires intensive training for the image
recognition system to recognize the fiducial markers.

3. SYSTEM OVERVIEW


Fig 1: System Overview

The real time hand gesture recognition system
works as follows. Image acquisition is done by the Myron
Processing Library module. The frame rate is 30frames /sec.
The color modes supported are RGB, CMY, and YIQ.As a
next step motion detection is done using the difference and
the threshold filters. Edge detection is the next step which is
done by using the Canny Edge Detector. The tracking is
done by the grouping the 2d skin tone pixels. Skin Color
based Segmentation is done to extract the hand from the
background. After the hand is segmented from the complex
background, the gesture recognition is done which is the
fingertip detection. The fingertip detection is mainly done
by using the threshold pixel method. Then the commands
are sent according to the gesture made to the OWI-535
robot. The command packet has to match with the OWI-535
robot Standard packet format for the control and working of
the robot along the five degrees of freedom or most
specifically called joints.
4. HAND GESTURE RECOGNITION

3.1 Image Resize

First to recognize a hand gesture, input image has
to be resized, in order to map the camera coordinates to the
screen coordinates. There are two ways to map from source
image to destination image. The first way is to compute the
ratio of screen resolution to camera resolution. To determine
the x and y coordinates on the screen of a given camera pixel,
we use the following equation:



where ( x' , y' ) is the camera position, (cx, cy) is the current
screen resolution, and (x, y) is the corresponding screen
position of camera position. The second way is to use the
interpolation function. The JMyron library uses the
interpolation function Camera Resize ().This function maps
a source image to a destination image as smoothly as
possible. To accomplish this, the function uses interpolation.
Interpolation is a method to add or subtract pixels to an
image as necessary to expand or shrink its proportions while
introducing minimum distortion. Using this function, we can
easily map the position of each input image pixel to a screen
position. In this paper the first method is used because it is
efficient and preserves more time.

3.2 Segmentation

The hand area has to be separated from a complex
background. It is difficult to detect skin color in natural
environments because of the variety of illuminations and
skin colors. So, the color range should be carefully chosen.
To get better results, we converted from RGB color space to
YCbCr color space, since YCbCr is insensitive to color
variation. The conversion equation is as follows.


where Y is luminance, Cb is the blue value and Cr is the
red value. From this information, we detect skin color by
selecting a particular color range from the Cb and Cr values.
In this paper, we choose Y, Cr and Cb values of 0 to 255, 77
to 127, and 133 to 173, as respectively, the skin color
region. (It should be noted that these values were chosen for
the convenience of the investigator.) Then we loop over all
the image pixels, changing pixels within the skin color range
to 0, and all others to 255. Hence, we obtain a binary image
of the hand. The hand can be segmented from the
background by using yet another convenient method, which
is elaborated as follows. The hand must be localized in the
image and segmented from the background before
recognition. Another method can be used for the training
and the accurate segmentation, which is the probabilistic
method for the skin color segmentation. The method is
based in a probabilistic model of the skin-colour pixels
distribution. Then, it is necessary to model the skin-colour
of the users hand. The user places part of his hand in a
learning square as shown in Fig. 1. The pixels restricted in
this area will be used for learning the model. Next, the
selected pixels are transformed from the RGB-space to the
HSL-space for taking the chroma information: hue and
saturation.
The problems encountered in this step that have been
solved in a pre-processing phase. The first one is that human
skin hue values are very near to red colour, that is, their
value is very close to 2 radians, so it is difficult to learn the
distribution due to the hue angular nature that can produce
samples on both limits. To solve this inconvenience the hue
values are rotated radians. The second problem in using
HSL-space is when the saturation values are close to 0,
because then the hue is unstable and can cause false
detections. This can be avoided discarding saturation values
near 0. Once pre-processing phase has finished, the hue and
saturation values for each selected pixel are use to infer the
model, that is, x
r
= (x
r
1
,..., x
r
n
) , where n is the number of
samples and a sample is x
r
i
= (h
i
, s
i
). As a result of testing
and comparing with several statistical models such as the
mixture of discrete histograms, the best results have been
obtained by using the Gaussian model. The values for the
parameters of the Gaussian model (mean, x, and covariance
matrix, ) are computed from the sample using the standard
maximum likelihood methods [11]. Once they are found, the
probability that a new pixel x = (h, s) , is skin can be
calculated as

(1)
Finally, we obtain the blob representation of the hand
applying a connected components algorithm to the
probability image, which groups pixels into the same blob.

3.3 Tracking

USB cameras are known for the low quality images
they produce. This fact can cause errors in the hand
segmentation process. In order to make the application
robust to these segmentation errors we add a tracking
algorithm. This algorithm tries to maintain and propagate
the hand state over time. We represent the hand in the time
t as s
->
(t) a vector,
s
->
(t) = p
->
(t),w
->
(t),(t)).
where p = (p
x
, p
y
) is the hand position in the 2D image,
and w
->
= (w, h) , is the size of the hand in pixels, and is
the hands angle in the 2D image plane. First, from the
hand state in time t we built a hypothesis of the hand state,
h = ( p
->
(t +1), w
->
(t),(t)),
for time t+1 by applying a simple autoregressive progress
to the position component we get
p
->
(t +1) p
->
(t) = p
->
(t) p
->
(t 1). (2)
Equation (2) expresses a dynamical model of constant
velocity. Next, if we assume that at time t, M blobs
have been detected, B={b
1
,..,b
j
,..,b
M
}, where blob b
j
corresponds to the set of connected skin pixels, the tracking
process has to set the relation between the hand hypothesis,
and the observations, b
j
b, over time. In order to cope with
this problem, we define an approximation to the distance
from the image pixel, x
->
= (x, y), to the hypothesis h. First,
we normalize the image pixel coordinates.
n
->
= R (x p(t +1)), (3)
where R is a standard 2D rotation matrix about the origin,
is the rotation angle, and n
->
= (n
x
, n
y
) are the normalized
pixel coordinates. Then we can find the crossing point
vector c
->
= (c
x
,c
y
) between the hand and the hypothesis
ellipse and the normalized image pixel as follows,
c
x
= w. cos()
c
y
= h.sin () (4)
where is the angle between the normalized image pixel
and the hand hypothesis. Finally, the distance from an image
pixel to the hand hypothesis is
d(x
->
,h)=||n
->
||+||c
->
|| (5)
This distance can be seen as the approximation of the
distance from a point in the 2D space to a normalized
ellipse (normalized means centered in origin and not
rotated). From the distance definition of (5) it turns out that
its value is equal or less than 0 if x is inside the hypothesis
h , and greater than 0 if it is outside. Therefore, considering
the hand hypothesis h and a point x belonging to a blob b , if
the distance is equal or less than 0, we conclude that the
blob b supports the existence of the hypothesis h and it is
selected to represent the new hand state. This tracking
process could also detect the presence or the absence of the
hand in the image [12].

3.4 Denoising

When the skin color method is used and the
binarisation of the image is done, the noise components gets
added up. So the dilation and the erosion process are used to
remove the unwanted components present in the image.
Another method used is the background inversion process.
The background is inverted and the pixels other than the
saved background are taken as the blob detection is made.
Erosion trims down the image area where the hand is not
present and Dilation expands the area of the Image pixels
which are not eroded. Mathematically, Erosion is given by,
AB = {x | (B) x A c = }
where A denotes input image and B denotes Structure
elements. The Structure element is operated on the Image
using a Sliding window and exact matches are marked.
Figure 3 shows a graphical representation of the algorithm.
Dilation is defined by,
A + B = {x | (B) A }= {x | [(B) x A] A}

where A denotes the input image and B denotes the structure
element. The same structure element is operated on the
image and if the center pixel is matched, the whole area
around that pixel is marked.

3.5 Finding the center point of palm

After segmenting hand region and background,
the center of the hand can be calculated with the following
equation:


where

x
i
and y
i
are the x coordinates of the pixel in the
hand region, and k denotes the number of pixels in the
region. After we locate the center of the hand, we compute
the radius of the palm region to get hand size. To obtain the
size of the hand, we draw a circle increasing the radius of
the circle from the center coordinate until the circle meets
the first black pixel. When the algorithm finds the first
black pixel then it returns to the current radius value. This
algorithm assumes that when the circle meets the first black
pixel, after drawing a larger and larger circle, then the
length from the center is the radius of the back of the hand.
Thus, the image segmentation is the most significant part
because if some of the black pixels are made by shadows
and illuminations near the center, then the tracking
algorithm will meet earlier than the real background and
the size of the hand region becomes smaller than the real
hand.

3.6 Finding the finger tip

To recognize that a finger is inside of the palm area or
not, we used a convex hull algorithm. The convex hull
algorithm is used to solve the problem of finding the biggest
polygon including all vertices. Using this feature of this
algorithm, finger tips of the hand can be detected. Convex
hull algorithm is used to recognize if a finger is folded or
not. To recognize those states, we multiplied 2 times to the
hand radius value and check the distance between the center
and a pixel which is in convex hull set. If the distance is
longer than the radius of the hand, then a finger is spread. In
addition, if two or more interesting points existed in the
result, then we regarded the longest vertex as the index
finger and the hand gesture is detected when the number of
the result vertex is two or more.

The result of convex hull algorithm has a set of
vertexes which includes all vertexes. Thus sometimes a
vertex is placed near other vertexes. This case occurs on the
corner of the finger tip. To solve this problem, we deleted a
vertex whose distance is less than 10 pixels when comparing
with the next vertex. For the accuracy the fingertip mass
value and the threshold value are set and the values are
compared with the detected tip value and the detected finger
tip mass is calculated by selecting the finger tip blob and
calculating the fingertip mass by using the inbuilt blob
Weight () function of the blob scanner library used in
processing IDE.The Blob scanner library also has inbuilt
functions for finding out the pixels which are not in hand
blob.

3.7 Displaying the detected results

The hand pixels are isolated from the background
pixels and the image subtraction is done. The binarized
image is displayed in the a window of 320x240 resolution,
which will have the segmented hand part without
background and noise components and it only contains the
hand. The hand pixels are detected by finding the
appropriate blobs which are exactly hand objects. Then the
result is displayed by choosing white pixels for the hand
blob pixels.

4. OWI-535 ROBOT

The OWI-535 Robotic arm edge is a commercially
available robot which is cheap, as it is sold as a kit. The
robot has around 5 degrees of freedom (fig 1) which can be
controlled through software that comes along with the kit.
The OWI has ST-1135 controller which controls the motors
and the other features of the robot. The robot can be
configured by sending the commands serially. It has 5 dof
and has a LED attached to the gripper.

4.1 Finding the USB packet format

The USB packet format can be found using the
Snoopy pro, LibUSB software. The software is installed in
two different systems to check the exactness of detection.
The first system however is used for the OWI software and
USBDeview and SnoopyPro. The second machine is only
for USB programming using libusb-win32 and LibusbJava.
The OWI USB driver should also be installed and a libusb-
win32 driver generated for it using the libusb inf-wizard.exe
tool. The resulting INF file is installed in the system. The
hardware details of the device can be viewed by using the
USBDeview software. The USB packet discovery is then
done by using the Snoopy Pro, Protocol discovery usually
means working out the format of the "out" packets sent from
the PC to the device. However, SnoopyPro displays two
types of "out" packet -- "out down" and "out up". These
appear in pairs, and should be considered as a single "out"
packet. This is hinted at by the way that the sequence
number of each "out down" and "out up" pair are the same.
"out down" is a request sent from the USB driver to the OS
to have data delivered to the device. "out up" is a message
sent back from the OS to the driver to inform it that the data
has been delivered.



Fig1: Robot arm OWI-535 Degrees of Freedom

















Fig 2: out down" and "out up" USB Communication


4.2 USB packet format for OWI-535


Fig 3: USB packet format
The SnoopyPro software returns the USB packet
format which is the control format for sending the commands
through the USB commands to the robot. USB data is sent in
little endian format, so the correct way of reading the double-
byte value, index, and size fields is to reverse their bytes. In
other words, for the example in Fig3, value is 0x0100, index
is 0x0003, and size is 0x0003.From the header field the data
required to do move the robot along its joints is 3 bytes.
These 3 bytes are confined to the robot wrist, shoulder,
elbow, base, gripper operations.

4.3 Control Commands

The control commands are sent to the robot, which
controls the operation of the gripper, wrist, Elbow, shoulder,
base. The 3 byte commands to do the OWI-535 specific
operation are listed in the table as follows.

Table1: Byte 1 value and commands

Byte Value Arm Commands
00 Stop
01 Gripper Close
02 Gripper Open
04 Wrist Backward
08 Wrist Forward
10 Elbow Backward
20 Elbow forward
40 Shoulder Backwards
80 Shoulder forward

Table 2: Byte 2 Value Commands

Byte Value Arm Commands
00 Stop base
01 Base turn right(clockwise)
02 Base turn Left (counter Clockwise)

Table 3: Byte 3value commands

Byte Value Arm Commands
00 Light Off
01 Light On

4.3 controlling using angular positions

Timed rotations aren't that useful for programming
a robot, since it's much more common to want to express a
rotation in terms of an absolute angle (e.g. rotate the elbow
to the +45 degrees position) or as an offset from the current
orientation (e.g. rotate the wrist by -10 degrees).

This functionality is best supported by a Joint class, which
is used to create four Joint instances at run time (for the
base, shoulder, elbow, and wrist). The class has several
features:

1. each joint instance stores its current orientation;
2. each joint converts angles into timed rotations;
3. Each joint knows the limits of its rotation, in both
directions.

6.SYSTEM PERFORMANCE

In this section the results of the system
performance and the accuracy of the algorithm found and
analysed under various lighting conditions and with
complex backgrounds are listed. The application has been
tested on a Pentium IV running at 1.8 GHz, Intel I core 3 at
2.25 GHz. The images have been captured using a Logitech
Messenger Webcam with USB connection. The camera
provides 640x480 images at a capture and processing rate of
30 frames per second. The average accuracy of the detected
results is found to be 85% and it is shown in fig 5.



Fig 5: accuracy results under normal lighting conditions

7. RESULTS AND CONCLUSIONS



Fig. 6: Gesture Detection output for Gesture (5)

Fig. 7 Gesture Detection output for Gesture (3)


Fig. 8 hand center detection

Fig. 9 Denoised Output of segmented hand
In this paper the methods for detecting the hand in
a real time environment is done by using the blob detection
method and skin color segmentation method. It provides a
way for controlling the robots by converting the human
gestures in to machine interpretable format. The gesture
recognition results are discussed in this section. Fig 6 and
fig 7 denotes the gesture detection for the hand gesture 5
and 3. The center point of the hand is detected using the
hand center point calculation formula and the bounding box
is formed over the segmented hand image and is shown in
fig 8.
REFERENCES

[1] Raheja J.L., Shyam R,. Kumar U., Prasad P.B., Real-Time
Robotic Hand Control using Hand Gesture, 2nd international
conference on Machine Learning and Computing, 9-11 Feb,
2010,Bangalore, India, pp. 12-16
[2] John Canny. A computational approach to edge detection.
Pattern Analysis and Machine Intelligence , IEEE
Transactions on, PAMI-8(6):679698, Nov. 1996.
[3] Prateem Chakraborty, Prashant Sarawgi, Ankit Mehrotra,
Gaurav Agarwal, Ratika Pradhan.,Hand Gesture Recognition a
Comparitive study, Proceedings of the International
MultiConference of Engineers and Computer Scientists 2008
Vol II MECS 2008, 19-21 March, 2008, Hong Kong.
[4] Gregory Dudek, Junaed Sattar and Anqi Xu A Visual Language
for Robot Control and Programming: A Human-Interface
Study, IEEE International Conference on Robotics and
Automation, 10-14 April 2007, The 2507-2513, Rome Italy.
[5] Mark Fiala. Artag, A fiducial marker system using digital
techniques.In CVPR 05: Proceedings of the 2005 IEEE
Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR05)- Volume 2, pages 590596,
Washington, DC, USA, 2005. IEEE Computer Society.
[6] Leichter, I., LindenBaum, M., Rivlin, E.: A General Framework
for Combining Visual Trackers -The Black Boxes Approach.
International Journal of Computer Vision 67(3), 343363 , 2006
[7] Kuhnert, K., Stommel, M.: Fusion of stereocamera and pmd-
camera data for realtime suited precise3d environment
reconstruction. In: IEEE/RSJ Conference on Intelligent Robots
and Systems, pp. 47804785 , 2006.
[8] G. Dudek, M. Jenkin, C. Prahacs, A. Hogue, J. Sattar, P.
Gigu`ere,A. German, H. Liu, S. Saunderson, A. Ripsman, S.
Simhon, L. A.Torres-Mendez, E. Milios, P. Zhang, and I.
Rekleitis, A visually guided swimming robot, in IEEE/RSJ
International Conference on Intelligent Robots and Systems,
Edmonton, Alberta, Canada, August 2005.
[9] K. Derpanis, R. Wildes, and J. Tsotsos, Hand gesture
recognition within a linguistics-based framework, in
European Conference on Computer Vision (ECCV), 2004, pp.
282296.
[10] Gary Bradski, Adrian Kaehler, Learning OpenCV, OReilly
Media, 2008.
[11] C.M. Bishop, Neural Networks for Pattern Recognition.
Clarendon Press, 1995.
[12] J. Varona, J.M. Buades, F.J. Perales, Hands and face tracking
for VR applications, Computer Graphics, 29(2), 2000
[13] http://www.Processing.org.
[14] www.handvu.org.
[15] http://www.mathworks.com/handdetection.
[16] http://www.nui.org
[17] http://www.Emgucv_net.com/hand/libcodes
[18] www.sourceforge.net

You might also like