Conveyor Belt Based Pick and Sort Industrial Robotics Application

Gilbreth: A Conveyor-Belt Based Pick-and-Sort Industrial
Robotics Application
A Thesis
Presented to
the Faculty of the School of Engineering and Applied Science
University of Virginia
In Partial Fulfillment
of the requirements for the Degree
Master of Science (Computer Engineering)
by
Yizhe Zhang
December 2018

c 2018 Yizhe Zhang
Approvals
This thesis is submitted in partial fulfillment of the requirements for the degree of
Master of Science
Computer Engineering
Yizhe Zhang
Approved:
Malathi Veeraraghavan (Advisor) Joanne Bechta Dugan (Chair)
Jack W. Davidson
Accepted by the School of Engineering and Applied Science:
Craig H. Benson, Dean, School of Engineering and Applied Science
October 2018
Abstract
There is growing interest in creating agile industrial robotics applications for autonomous
operations on small-volumes of mixed parts to complement traditional industrial robotics
that handle large-volume, single-part operations. Cloud robotics, which leverages cloud
computing, cloud storage and high-speed networks (between factory floors and data centers),
is seen as a technological approach to help build such agile industrial robotics applications.
This thesis describes an agile industrial robotics application, named Gilbreth, for picking
up objects of different types from a moving conveyor belt and sorting the object into bins
according to type. The Gilbreth implementation leveraged a number of Robot Operating
System (ROS) and ROS-Industrial (ROS-I) packages. Gazebo, a robotics simulation package,
is used to simulate a factory environment that consists of a moving conveyor belt, a break
beam sensor, a 3D Kinect camera, a UR10 industrial robot arm mounted on a linear actuator
with a vacuum gripper, and different types of object such as pulleys, disks, gears and piston
rods.
Experimental studies were undertaken to measure the CPU usage and processing time
of different ROS nodes. These experiments found that object recognition time and robot
execution time were similar in magnitude, and that motion planning sometimes yielded
incorrect trajectories. Therefore, improvements were made to reduce object recognition
time, using a Convolutional Neural Network (CNN) method and with a new pipeline, and
to the motion-planning pipeline. Evaluation of the object recognition improved pipeline
demonstrates that it outperforms the original Correspondence Grouping (CG) method by
reducing execution time even while achieving the same success rate. Experiments were
conducted to evaluate the pick-and-sort success rate of the Gilbreth application after the
d
Abstract e
improved pipelines were incorporated. Specifically, we found that objects should be spaced
at least 14 sec apart from each other on the conveyor belt. Multiple robot workcells are
required to handle conveyor belts with faster arriving objects.
Finally, we undertook an experiment to evaluate the scalability of the CNN algorithm.
Our conclusion is while CNN-based object-recognition saves processing time within the
run-time operation of the Gilbreth application when compared the CG algorithm, the cost
of CNN-based object-recognition is that it requires significant compute cycles for training.
Given that this training can be done offline, the extensive resources of cloud computing can
be leveraged.
Acknowledgements
I would like to take the opportunity to show my gratitude to many people. This thesis
would have not been possible without their support. I would like to thank my advisor
Professor Malathi Veeraraghavan for her expert advice and support in this project. She
not only taught me the knowledge in network area, but also the way to work and conduct
research. Her attitude towards the work is more than respectable for me and has always
been a guidance towards my life.
I would like to thank my collaborators from the University of Texas at Dallas (UTD) and
Southwest Research Institute (SwRI): Lianjun Li (UTD), Professor Andrea Fumagalli (UTD),
Michael Ripperger (SwRI) and Jorge Nicho (SwRI). This work can not be accomplished
without their dedicated work and help.
I would like to thank my committee members, Professor Joanne Bechta Dugan and
Professor Jack W. Davidson for taking time to review my thesis and provide great suggestions.
I would like to thank my parents and my friends for supporting throughout my graduation
years. Thanks to my colleagues in High-speed Networks research group for their friendship
in the lab.
This work is supported by NSF grant CNS-1531065, and CNS-1624676.
f
Contents
Abstract d
Acknowledgements f
Contents g
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . j
1 Introduction 1
1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Key contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background and Related Work 4

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Robotic Operating System and Gazebo . . . . . . . . . . . . . . . . 4
2.1.2 MoveIt! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Convolution Neural Network object recognition . . . . . . . . . . . . 9
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Gilbreth Application 14
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Gilbreth sorting application . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Gilbreth software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Gilbreth software prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Gilbreth software prototype evaluation . . . . . . . . . . . . . . . . . . . . . 26
3.6 Gilbreth message description and experiments . . . . . . . . . . . . . . . . . 29
3.6.1 Gilbreth message description . . . . . . . . . . . . . . . . . . . . . . 30
3.6.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Gilbreth Prototype Improvements and Evaluation 39

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Object recognition improvements . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Experiments on fine-tuned object recognition process . . . . . . . . . 40
4.2.2 Improved object recognition pipeline characterization . . . . . . . . 43
4.3 Motion planner improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 44
g
Contents h
4.4 Experiment on application performance . . . . . . . . . . . . . . . . . . . . 46

4.5 VoxNet-based object-recognition pipeline . . . . . . . . . . . . . . . . . . . 52
4.5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5 Conclusions and Future Work 58

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Bibliography 62
List of Tables
3.1 Object segmentation ROS node configuration parameters . . . . . . . . . . 19

3.2 Values of experiment parameters . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 ROS-node classification based on CPU usage . . . . . . . . . . . . . . . . . 28
3.4 Message size of triggered ROS message . . . . . . . . . . . . . . . . . . . . . 36
3.5 Message size of periodic ROS message . . . . . . . . . . . . . . . . . . . . . 37
4.1 Experiment-1 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 Experiment-1 results on the improved object-recognition process . . . . . . 42
4.3 Experiment-2 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Improved motion planning time threshold . . . . . . . . . . . . . . . . . . . 46
4.5 Values of experimental parameters for evaluating application performance . 48
4.6 Inter-object spawning period parameters . . . . . . . . . . . . . . . . . . . . 48
4.7 Experiment Set1 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.8 Experiment Set2 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
i
List of Figures
2.1 An example ROS message . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 An example ROS service description . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Move group architecture [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Planning scene pipeline [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 VoxNet architecture [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Gilbreth setup: sensors, objects, conveyor belt and UR10 robot arm . . . . 15
3.2 Gilbreth software: Per-object workflow starts at Object-Arrival Detection . 17
3.3 Object segmentation and object recognition . . . . . . . . . . . . . . . . . . 23
3.4 Gilbreth object picking policies . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Motion planner pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 CPU usage and processing time for one ROS node . . . . . . . . . . . . . . 28
3.7 Compare object recognition time with physical robot arm movement time . 29
3.8 Gilbreth message communication flow; ROS topics: blue; ROS services: red 30
3.9 Depth images of 5 types of objects . . . . . . . . . . . . . . . . . . . . . . . 32
3.10 Depth images of 5 types of objects after object segmentation . . . . . . . . 32
3.11 Gilbreth ROS messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Experiment-1 object-recognition processing times with the improved pipeline 42

4.2 Experiment-2 results on the improved object-recognition pipeline . . . . . . 43
4.3 Improved object recognition pipeline . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Improved motion planning pipeline . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Inter-object spawning time distribution . . . . . . . . . . . . . . . . . . . . 49
4.6 Pick-and-sort success rates for different inter-object spawning periods . . . 49
4.7 Picking evaluation of the Gilbreth application . . . . . . . . . . . . . . . . . . 51
4.8 VoxNet based object-recognition pipeline . . . . . . . . . . . . . . . . . . . 53
4.9 Experiment Set1 results: 5 object types . . . . . . . . . . . . . . . . . . . . 55
4.10 Experiment Set2 results: 13 object types . . . . . . . . . . . . . . . . . . . . 56
j
List of Abbreviations
APC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amazon Picking Challenge

API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Programming Interface
CAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computer-Aided Design
CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correspondence Grouping
CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convolutional Neural Network
DOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Degree of Freedom
FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First-in-first-out
GENI . . . . . . . . . . . . . . . . . . . . . . . . . . . . Global Environment for Network Innovations
GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphics Processing Unit
GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphical User Interface
M2C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Machine-to-Cloud
M2M . . . . . . . . . . . . . . . . . . . . . . . . . . . . Machine-to-Machine
MTU . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum Transmission Unit
NIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . National Institute of Standard and Technology
OGRE . . . . . . . . . . . . . . . . . . . . . . . . . . . Object-Oriented Graphics Rendering Engine
OMPL . . . . . . . . . . . . . . . . . . . . . . . . . . . Open Motion Planning Library
PCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Printed Circuit Board
PCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Point Cloud Data
RGBD . . . . . . . . . . . . . . . . . . . . . . . . . . . Red Green Blue Depth
ROS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robot Operating System
ROS-I . . . . . . . . . . . . . . . . . . . . . . . . . . . . ROS-Industrial
RRTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rapid-exploring Random Trees
SRDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . Semantic Robot Description Format
SSST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second-shortest-sorting-time
SST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shortest-sorting-time
SwRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . Southwest Research Institute
UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User Datagram Protocol
URDF . . . . . . . . . . . . . . . . . . . . . . . . . . . Unified Robot Description Format
UTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The University of Texas at Dallas
WAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wide Area Network
XMLRPC . . . . . . . . . . . . . . . . . . . . . . . Extensible Markup Language Remote Procedure
Call
Chapter 1
Introduction
Traditional industrial object-handling robots are dedicated to performing high-volume
production processes, such as retrieving the same part repeatedly from a known position
and orientation, and performing some action on the part. This approach is ideal for large-
scale manufacturing operations. However, these methods are unsuitable for small-scale and
customized manufacturing. National Institute of Standard and Technology (NIST) proposed
a two-year competition to test the applicability of agile industrial robotic systems [3] for
small-scale customized manufacturing. Such agile systems need to be more autonomous and
could be more productive, when compared to traditional high-volume production processes,
and hence could benefit small-scale manufacturers. An agile robotic system would need to
be data-driven and to execute collective-learning processes that take advantage of cloud
infrastructures and high-speed networks.
1.1 Objective
The objective of this work was to develop an agile industrial robotic system that simulates
a real day-to-day factory manufacturing process. Specifically, our goal was to develop
an application that leverages cloud computing, cloud storage, and high-speed network
technologies.
1
Chapter 1 Introduction 2
1.2 Motivation
Industrial robotics have not evolved much to execute new tasks over the last 20 years [4].
Thus, the demand of agile and versatile autonomous robotic systems is increasing. We were
motivated to address this demand by designing a flexible industrial robotics application.
The Amazon Picking Challenge (APC) [5] focused on pick-and-stow operations wherein a
robot recognizes target items, picks items from shelves and places them in shipping boxes.
The APC challenge combined state-of-the-art object recognition, grasp and path planning
algorithms, and targets a warehouse packing process. In contrast, we focused on a more
typical manufacturing process: picking objects from a moving factory conveyor belt, and
sorting the objects by placing different object types in different bins.
Industrial random bin-picking methods have been proposed in the past [6]. The state-of-
the-art bin-picking algorithms have focused on the use of sensing and grasping technologies
applied to picking a single object-type randomly positioned in a container. Some of the
major industrial robot manufacturers, such as ABB, Adept, Fanuc, Motoman and Kuka,
offer some form of random bin picking, either directly or through integrator partners. To the
best of our knowledge, implementations from the robot manufacturers are currently limited
to operating with only one object type with easily distinguishable features, and the objects
are randomly placed in a bin or traveling on a conveyor belt. The limitation to a single,
easily distinguishable object type has prevented widespread adoption of the technology.
This challenge provided us the motivation for creating Gilbreth: an application in
which objects of different types randomly arriving on a moving conveyor belt are first
automatically identified, then picked up by an industrial UR10 robot arm, and finally
sorted into predisposed bins based on the object type. This application combines a number
of tasks that include object detection, object segmentation and recognition, picking pose
alignment and robot poses planning, robot motion planning to generate trajectories, and
robot execution through robot controllers.
The architecture of the system is designed for distributed execution across high-speed
networks, and for leveraging computing cloud infrastructures.

1.3 Key contributions 3
1.3 Key contributions
Our key contributions are as follows: (i) Gilbreth: a new mixed-parts sorting application in
support of industrial manufacturing; (ii) open-source software for the Gilbreth application,
which could be useful in other future conveyor-belt based applications; (iii) performance
characterization on two distinct test-beds to identify the ROS nodes that require paral-
lelization of their execution in order to achieve high pick-and-sort throughput; (iv) new
improved pipelines for object recognition and motion planning, and (v) an implementation
of a Convolutional Neural Network (CNN) object recognition process.
1.4 Thesis organization
Besides this Introduction, this thesis has four chapters. Chapter 2 provides the reader
requisite background information, and reviews related work. Chapter 3 describes the
Gilbreth application, software architecture, message description and prototype of Gilbreth,
a conveyor-belt based pick-and-sort application. Experimental results are presented and
described in this section to evaluate the CPU usage and processing times for various ROS
nodes. Chapter 4 describes improved pipelines for Gilbreth object recognition and motion
plan computation. Further studies were conducted to evaluate the performance of the
improved pipelines, both independently, and after integration into the Gilbreth application.
In addition, the CNN object recognition implementation was evaluated, and found to provide
improved object recognition performance with faster execution times than the original object
recognition method. Chapter 5 lists the conclusions drawn from this project, and proposes
future-work items.
Chapter 2
Background and Related Work
Section 2.1 provides background information relevant for our Gilbreth application, and
Section 2.2 reviews other related work.
2.1 Background
Section 2.1.1 provides an overview of Robotic Operating System (ROS) [7] and Gazebo, a
robotic simulation software package. MoveIt! [1], a state-of-the-art software package used
for motion planning is described in Section 2.1.2. Section 2.1.3 describes the CNN object
recognition algorithm implemented in VoxNet [2], which is used in an improved version of
our Gilbreth application.
2.1.1 Robotic Operating System and Gazebo
ROS is a flexible collaborative framework for robotic software development, and consists of
a large collection of tools, libraries and templates. Generally speaking, creating a robust
and general-purpose robotic application from scratch is challenging. The ROS community
provides a platform where experts all over the world can collaborate together. Using ROS as
the programming platform, research institutes, laboratories and individuals can contribute
their algorithms - also known as ROS packages - to the community and build software
rapidly by leveraging existing modules.
4
2.1 Background 5
Our Gilbreth application leverages many existing ROS and ROS-Industrial (ROS-I)1
packages. The latter are designed specifically for industrial robotic applications. Supported
by the robotics industry and numerous research institutes, ROS-I takes advantage of the
ROS software, but extends its capabilities to industrial manufacturing [4]. ROS-I seeks to
advance agile industrial robotics to accomplish a wide variety of automation tasks. ROS-I
contains a large number of sensor plugins, robot controller packages (ROS defines a standard
format to describe robots, based on which a number of contributors have developed a wide
range of libraries for programming specific robot types), and planning algorithms. Our
Gilbreth application was developed in a short duration by leveraging these ROS and ROS-I
packages.
In addition to the functional modules, ROS offers abundant toolsets for debugging,
plotting and visualization. A process performing computation is called a ROS node. ROS
middleware provides four capabilities for inter-process communication between nodes2 :
Message passing via publishing and subscribing systems
Message recording and playback
Remote procedure calls (RPC) in request-response systems
Distributed parameter server.
A simplified message description language, which helps ROS to automatically generate
source code for different languages, is used to define ROS messages3 . A ROS message is a list
of data-field descriptions as well as constant definitions. Fig. 2.1 illustrates a typical ROS
message data field description; constant definitions are not used in our Gilbreth application.
The field type listed in the left column is a built-in or self-defined description. The field
name, delimited by a space following the field type, describes the name of the data structure.
The description of a data field is not required, but can be included after a comment sign
(#) as shown in Fig. 2.1. The Gilbreth application leverages many built-in ROS messages,
but also defines several new messages.

1
https://rosindustrial.org/
2
http://www.ros.org/core-components/
3
http://wiki.ros.org/msg
Chapter 2 Background and Related Work 6
Figure 2.1: An example ROS message
Figure 2.2: An example ROS service description
ROS nodes exchange messages through ROS topics4 . A ROS node that generates and
publishes information to a specific ROS topic is called a publisher. Any ROS node that
is interested in the published data of a ROS topic can subscribe to the topic and obtain
information. These ROS nodes are called subscribers. Anonymous publishing and subscribing
semantics decompose the connection between message production and consumption. In
this mode of communications, a publishing ROS node does not know the identities of the
subscribers to its topics. Multiple publishers and subscribers can communicate through a
single topic. ROS messages use both TCP/IP and UDP/IP protocols. For reliability, we
used TCP/IP-based ROS topics in Gilbreth for inter-node communications.
Remote procedure calls (RPC) that provide request and reply interactions are called
ROS services. Some inter-node communications in our Gilbreth application use ROS services.
The RPC interactions in ROS are defined by a pair of ROS messages5 . A provider ROS
node registers the service under a namespace (a directory of names), and a client calls the
service by sending a request message and waiting for a reply. Fig. 2.2 illustrates a ROS
service data structure in the Gilbreth application. The client sends a conveyor belt velocity
to the service and waits for the service to reply. The reply indicates whether or not the
velocity was configured successfully.
The distributed parameter server is a shared dictionary to store parameters. The
parameters stored in the server are static and non-binary, and can be retrieved globally by
all ROS nodes. The parameter server, which runs inside the ROS Master, provides naming
4
http://wiki.ros.org/Topics
5
http://wiki.ros.org/Services
2.1 Background 7
and registration services to the rest of the nodes. The parameter server is implemented using
Extensible Markup Language Remote Procedure Calls (XMLRPC), and can be accessed
via network Application Programming Interfaces (APIs) globally by all ROS nodes. The
Gilbreth application uses the distributed parameter server to store and modify configuration
data.
Gazebo, a robotics simulator, was used to simulate and test the factory environment,
which consists of a Universal Robot UR10 robot6 and multiple sensors. Open Dynamic
Engine (ODE)7 is used in Gazebo for dynamics simulation. High-quality rendering is
supported by the Object-Oriented Graphics Rendering Engine (OGRE)8 . A wide range of
sensors and robotic models are included in the Gazebo library. Various command line tools
are available to users, and customized plugins for robot, sensor and environment control can
be programmed in C++ and integrated into Gazebo.
2.1.2 MoveIt!
MoveIt! [1] is a ROS package that implements state-of-the-art motion planning, manipulation
and control algorithms. The UR10 robot is supported by the MoveIt! package, and thus
we leverage this useful platform to implement our motion planning pipeline. Fig. 2.3 shows
the system architecture for the primary node move group. It combines controllers, sensors,
libraries and all the other components together to provide a set of ROS actions (red lines),
services (blue lines) and topics (green lines).
Normal users may connect to the actions and services through user interfaces written in
C++ and python. Rviz, a ROS Graphical User Interface (GUI) plugin, also has access to
the move group action library. ROS parameter server provides two types of information: (i)
Unified Robot Description Format (URDF)9 file, and Semantic Robot Description Format
(SRDF)10 file for robot simulation, and (ii) parameters for move group configuration, such as
joint limits and kinematics information. Topics and actions are used by the move group node
to communicate with the robot. Joint state information is published on the /joint state
6
https://github.com/ros-industrial/universal robot
7
https://www.ode.org/
8
https://www.ogre3d.org
9
http://wiki.ros.org/urdf
10
http://wiki.ros.org/srdf
Figure 2.3: Move group architecture [1]
ROS topic and transform tree data is published on the /tf topic. The robot state publisher
publishes robot transform tree information. The state of the robot and its surrounding
environment is defined as a planning scene in a scene monitor. The scene is maintained
and updated by the monitor. The motion planning algorithm retrieves the information in
the planning scene to compute the trajectory. Fig. 2.4 illustrates how move group interacts
with the planning scene.
Motion planning plugins allows the user to take advantage of multiple open-source
planning libraries, the Open Motion Planning Library (OMPL) [8] for example. Our
application uses the Rapid-exploring Random Trees Connect (RRT-Connect) [9] approach
for motion planning. RRT-Connect was first purposed by Kuffner and LaValle in 2000.
Although it was first designed to move human arms, which is a 7-DOF (Degrees of Freedom)
motion planning problem, RRT-Connect has been successfully applied to other path-planning
problems. Our Gilbreth application uses a 7-DOF robot arm (specifically UR10) to conduct
pick-and-sort actions, and therefore motion planning for the UR10 is similar to a human-arm
motion-planning problem.
Benchmarks to test motion-planning algorithms also offer good reasons to use RRT-
Connect as our primary motion planner. Moll et al. [10] purposed a benchmark infrastructure
for motion planning algorithms, and stated that bidirectional planners (such as RRT-Connect)
tended to be faster than unidirectional planners (such as RRT). A tutorial in International

2.1 Background 9
Figure 2.4: Planning scene pipeline [1]
Conference on Robotics and Automation (ICRA) 2013 also showed that RRT-Connect has a
high solving-rate success with a relatively small computation time [11]. The result provides
solid evidence that RRT-Connect is a good approach; furthermore these experiments were
conducted with the MoveIt! ROS package.
2.1.3 Convolution Neural Network object recognition
Deep learning with Convolutional Neural Networks (CNN) is widely regarded as a powerful
tool for object detection, classification and recognition [12] [13]. Work of Krizhevsky et
al. [14] showed great success in using CNNs in the ImageNet Large Scale Visual Recognition
Challenge. Razavian [15] tested an existing CNN recognition framework, OverFeat [16], on
various datasets and different tasks, and concluded that deep learning with CNNs can be
considered as primary candidates in visual recognition tasks.
The above-mentioned papers studied two-dimensional (2D) object recognition and classi-
fication, while in real-world robotic applications, robust and accurate three-dimensional (3D)
object recognition is more crucial. RGB-D cameras are often used in robotic applications.
Schwarz et al. [17] proposed an algorithm to conduct object recognition and pose estimation
on RGB-D data with pre-trained CNN features. However, they did not use the original
RGB-D data captured by cameras; instead, they used the depth information to re-render
the RGB 2D image for computation. On the other hand, VoxNet [2], proposed by Maturana
and Scherer, resolved the problem by integrating a volumetric Occupancy Grid [18] [19] rep-
Figure 2.5: VoxNet architecture [2]
resentation with a supervised 3D CNN algorithm. The VoxNet architecture is implemented
in our Gilbreth application.
Fig. 2.5 illustrates the VoxNet architecture. Two major tasks of the system are: (i) use
volumetric grid to represent the spatial geometry of the input object, and (ii) predict a
class label directly from the grid using 3D CNN. The four types of layers in VoxNet CNN
architecture works as follows:
The input layer converts point cloud data into occupancy grids. Occupancy grids
have two advantages for 3D CNN recognition: (i) these grids have simple and efficient
data structures, and (ii) these grids allow for efficient space estimation from range
measurements. To better utilize GPUs, an occupancy grid is represented by a small-
volume dense array (32 × 32 × 32) in VoxNet.
The convolutional layers C(f, d, s) convolve the input with f , the number of the output
feature maps, learned filters of shape d × d × d × f 0 and generates f feature maps,
where the spacial dimension of the input volume is d, and the number of the input
2.2 Related Work 11
feature maps is f 0 . The spatial stride parameter s controls how the filter convolves
around the input volume.
The pooling layer P (m) executes a downsampling process by replacing each m ×
m × m block with the maximum number of voxels inside the block, where m is the
downsampling factor.
The Fully Connected Layers F C(n) have n output neurons, each of which is a linear
combination of all the outputs from the previous layers.
VoxNet uses C(32, 5, 2) - C(32, 3, 1) - P (2) - F C(128) - F C(K) as their model, where K is
the number of classes. In the Gilbreth application, we implemented this VoxNet architecture.
2.2 Related Work
Prior work on conveyor-belt based agile robotics focused on two aspects: (i) object perception
and pick pose estimation, and (ii) multirobot coordination. A. Cowley et al. [20] presented
an application that uses a Willow Garage PR2 robot in a conveyor belt pick-and-place
application. However, PR2 is not a typical industrial robot. Y. Huang et al. [21] presented
a study of multi-robot coordination part-dispatching rules in factory conveyor belt product
lines. Bozma et al. [22] formulated a problem for multirobot coordination in a pick-and-place
application for objects on a conveyor belt. The focus of both papers was on the developments
of methods for optimal application performance. Neither focused on software implementation
of the corresponding application, which is our contribution in this paper.
As mentioned before, agile robotics, a relatively new research area with commercial
value, can benefit from cloud computing. “Cloud robotics” was introduced by Google
researcher James Kuffner in 2010 [23] as an approach to make robots “lighter, cheaper
and smarter.” A major constraint of cloud robotics is the possibility of losing network
connections, which may cause operational failures of robots. However, according to Lorencik
and Sincak [24], when compared to the cost of embedded intelligence on-board robots, the
cost of ensuring reliable network connections is lower. Hu et al. [25] proposed a cloud
robotics system architecture that combines an ad-hoc cloud and an infrastructure cloud.
The adhoc cloud uses machine-to-machine (M2M) communications, while the infrastructure
cloud is supported by machine-to-cloud (M2C) communications. In the adhoc cloud, robots
communicate directly with each other and compute collaboratively. In operations that require
M2C communications, infrastructure clouds provide the required storage and computing
resources. According to Hu et al., the M2C level infrastructure cloud computing has two
main benefits: elastic access to compute resources, and large storage that allows for learning
from “the history of all cloud enabled robots.” To conclude, rich computing resources and
big data are the two major reasons why cloud robotics is gaining popularity.
Many applications have been developed for cloud robotic systems. Rastkar et al. [26]
demonstrated a M2C cloud robotics example with a Parallax BoeBot, a robot that has
very limited on-board resources. All processing-intensive tasks, such as image processing,
were handled by cloud computers. The Parallax BoeBot11 is a tiny car that has ultrasonic
sensors, servos and a Basic-Stamp chip on board. The Basic-Stamp chip is a small Printed
Circuit Board (PCB) that contains only the elements essential to controlling the BoeBot.
Instead of using ROS, Rastkar et al. used Microsoft Windows as the operating system, and
the BoeBot was connected to a Windows server via a wireless hub. This work showed how a
limited-resource robot can benefit from cloud robotics, and how cloud robotics can help keep
the size of robots small while operating at high efficiencies. V. Kumar and N. Michael [27]
stated in their work that the biggest challenge of creating small and completely autonomous
Unmanned Aerial Vehicle (UAVs) stems from size, weight and power constraints. UAV
coordination is a typical M2M cloud robotics application, as the latter helps keep UAVs
small, energy-efficient and intelligent at the same time.
B. Kehoe et al. [28] stated that there are four potential benefits for robot and automation
systems from cloud computing: (i)Big Data, (ii) Cloud Computing, (iii) Collective Robot
Learning, and (iv) Human Computation. Big data helps robots take advantage of machine
learning and deep learning. In their prior work [29], a cloud-based grasping application was
demonstrated by combining online Google object recognition engine and cloud storage with
offline grasp analysis and Computer-Aided Design (CAD) model analysis. Such a system
serves as a good example of how big data and cloud storage can help build artificial intelligent
11
https://www.parallax.com/product/boe-bot-robot
2.2 Related Work 13
(AI) robotic systems. Kiva system [30], an Amazon warehouse system that uses mobile
robots to bring inventory to warehouse workers, is a good candidate to test collective robotic
learning. Hundreds of robots in the Kiva system connect to each other, as well as with a
local central coordinating server, through wireless communications. R. Rahimi et al. [31]
demonstrated how an industrial robotics surface blending application, Godel, could leverage
cloud computing. This work studied the effects of wide-area network communications on
the application.
Adding to this volume of work on applications for cloud robotics, this work offers a new
conveyor-belt-based industrial robotics application that combines object recognition and
motion planning.
Chapter 3
Gilbreth Application
3.1 Introduction
This chapter describes a conveyor-belt based pick-and-sort industrial robotic application,
Gilbreth. Section 3.2 describes the sorting process and the environment set up in a robot
simulation software, Gazebo. Section 3.3 describes the software architecture. Section 3.4
describes the ROS packages and the method used to implemented the application. Section
3.6 describes the ROS message communicating between each ROS node and the network
architecture.
3.2 Gilbreth sorting application
The application works as follows. A conveyor belt in a factory floor moves at a constant
speed. Mixed parts (objects) arrive at random on the conveyor belt. As illustrated in
Fig. 3.1, five types of object are simulated. The object pose (position on the belt and
orientation) is not predefined. In other words, different types of objects, in any pose, can
arrive on the belt, into the work space of the robot arm, at any instant in time.
The UR10 robot arm (which has 6 degrees of freedom) is used in this application. Since
the robot arm is mounted on a linear actuator that runs parallel to the conveyor belt, our
motion planning algorithms have to design joint trajectories for 7 degrees of freedom. A
vacuum gripper is connect to the end of the robot arm as our end effector. The robot arm is
14
3.2 Gilbreth sorting application 15
Figure 3.1: Gilbreth setup: sensors, objects, conveyor belt and UR10 robot arm
instructed to pick up an object arriving on the conveyor belt, move to the position of the
bin corresponding to the object type, and place the object in that bin. Thus the objects
arriving on the conveyor belt are sorted, by type, into separate bins. For example, all piston
rods are placed in one bin and all gears are placed in another bin.
Since each arriving object can be of a different type, sensors are used to capture point
cloud images of the objects, and object recognition algorithm is used to identify the object
type.
The application uses two sensors: (i) Break beam sensor and (ii) Kinect sensor. A break
beam sensor is placed such that the laser beam crosses the conveyor belt at a low height
above the belt. When an object arrives on the conveyor belt, the break beam sensor is
triggered. This signal starts a chain of computation, which is described in the next section.
A Kinect sensor is placed above the conveyor belt in order to capture point cloud images of
the arriving objects. The Kinect sensor is mounted slightly before the break beam. When
an arriving object triggers the break beam, the Kinect sensor can capture the whole point
cloud data of the object and sent the data for image processing and object identification.
For each identified object, five end effector poses: (i) Waiting pose, (ii) Pick-approach
pose, (iii) Pick pose, (iv) Pick-retreat pose and (v) Place pose, are calculated to complete
Chapter 3 Gilbreth Application 16
one picking assignment. A motion planning algorithm computes valid trajectories between
various poses of the end effector, also known as tool poses, in order to move the robot arm
on the linear actuator and execute its end effector to pick and place the object.
In summary, the tasks of point cloud image capture, image transfer, object segmentation,
object recognition, computation of robot arm end effector poses, robot arm linear actuator
positions, and motion planning to generate joint trajectories, need to be completed in
real-time. The faster the velocity of the conveyor belt, the shorter the duration for all this
computation, which makes this Gilbreth application a good candidate for high-performance
computing and high-speed communications.
3.3 Gilbreth software architecture
The Gilbreth software architecture, illustrated in Fig. 3.2a, consists of multiple ROS nodes,
which are described below. Each ROS node defines various ROS messages and communicates
between different ROS topics. ROS messages and ROS topics are described in section 3.6.
Gilbreth manager The main purpose of this ROS node is to monitor the status of an
object as it is handled by the various ROS nodes of the workflow. The main data exchanges
occur directly between ROS nodes, while only status updates are sent to the Gilbreth
manager. Each ROS node checks for interrupts from the manager in case there are problems
with the actions of the previous ROS node in the workflow. This complexity is captured in
the per-object Finite State Machine (FSM) that is maintained by the Gilbreth manager, as
illustrated in Fig. 3.2c.
An alternative design is to have all inter-ROS node data transfers pass through the
Gilbreth manager, as illustrated in Fig. 3.2b. This design would be simpler as it avoids the
need for interrupts. However, large size data, for example point cloud image, will transported
multiple times between the data center and the factory. This design would be less suitable
than our current design for a distributed execution in which the lower set of ROS nodes
shown in Fig. 3.2a are run on hosts at remote cloud-computing data centers.
3.3 Gilbreth software architecture 17
(a) Gilbreth software architecture
(b) Software architecture with main workflow manager
(c) Per-object finite state machine
Figure 3.2: Gilbreth software: Per-object workflow starts at Object-Arrival Detection

Object-Arrival Detection This ROS node receives the break beam disruption signal
from the sensor, which occurs every time a new object cross the break beam, i.e., enters
the workspace of a robot arm. Upon receiving the disruption signal, this ROS node sends a
signal to the Kinect sensor and records the time stamp, also known as the detectionTime,
to identify objects until a proper object identifier is assigned to the object, which can only
happen after completing the object recognition phase.
Kinect Publisher This node receives the signal from the Object-Arrival Detection node
and sends out messages containing the point cloud data. The Kinect camera, mounted
above the conveyor belt, continuously captures images of the conveyor belt at 15 Hz. In real
word, the camera sends this data to the host to which it is connected via a USB port. In
our simulation, the captured image is sent to a ROS topic to be subscribed by other ROS
nodes. To avoid unnecessary data communications and computation, the Kinect Publisher
collects the point cloud data and publishes the data only when it receives a signal sent
from Object-Arrival Detection ROS node. The detectionTime parameter is passed along
with the point cloud data. In addition to publishing the PointCloud ROS topic for future
computation, the Kinect Publisher sends a message with detectionTime to a ROS topic,
subscribed by the Gilbreth manager (the point cloud data is not sent here since this data
could be large and is not required by the Gilbreth manager).
Object Segmentation Upon receiving point cloud data from the Kinect Publisher, the
Object Segmentation ROS node executes object segmentation algorithms. It first removes
background information from the collected point cloud data by filtering out the unnecessary
data out of a certain area. The desired area is specified in a configuration file, which is shown
in Table 3.1. The Object Segmentation ROS node next down samples the filtered point cloud
data by a down sample parameter. Image size is largely reduced by this process. After the
down sampling, we use euclidean cluster extraction to extract point cloud blobs. If multiple
blobs are detected, the Object Segmentation node will queue these data blobs, and send the
blobs in succession, along with the detectionTime parameter, to the Object Recognition
ROS node in a ROS topic. The Object Segmentation node also sends the detectionTime
3.3 Gilbreth software architecture 19
with blob identifiers to a ROS topic which is subscribed by the Gilbreth manager (the point
cloud data is not sent here since this data could be large and is not required by the Gilbreth
manager). Fig. 3.10 shows the point cloud image after object segmentation of five object
type, illustrating that the background scene information is removed after filtering.
ROS node Parameter Value

Object Segmentation camera roi x:[-0.225, 0.192], y:[-0.287, 0.284], z:[0.0,
0.7162]
Object Segmentation down sample 0.01
Table 3.1: Object segmentation ROS node configuration parameters
Object Recognition Correspondance grouping [32] is used in the Object Recognition
node to compare each received object blob with the point cloud data stored for known objects
in its database to find the closest match. If the object-recognition algorithm identifies the
type of the arriving object with reasonable accuracy, it uses iterative closet point algorithm,
also known as ICP, to compute and align the objectPickPose, which is a valid pickup pose
for grasping the object. This computation requires the object-arrival pose (position and
orientation on the conveyor belt), and the stored value of the “pickPoint” for grasping the
object if it had arrived in the same orientation as the object for which the stored PCD data
was created.
If no match is found, the Object Recognition ROS node will not publish any ROS
message, the object is just allowed to pass through the robot workspace with no action
executed by the robot. In practice, such misses should not occur if the database is populated
well; however, such error-tolerant procedures are required to avoid adverse consequences.
The Object Recognition ROS node assigns a unique objectIdentifier to each recog-
nized object. The ROS node then publishes the detectionTime parameter, objectIdentifier,
objectType, and the objectPickPose, for each recognized object, to the ObjectDetection
ROS topic, to which the Robot Poses Planner ROS node subscribes. At the same time, the
Object Recognition ROS node sends the detectionTime parameter, and object type and
identifier via a ROS topic to the Gilbreth manager. This allows the latter to track the state
of the object as it passes through various ROS nodes.

Robot Poses Planner This ROS node has a database indicating the bin location cor-
responding to each object type and a fixed base position (which is also the object pickup
position). Each object type has its unique lift and drop offset since the size of object various.
Using the objectType in the received message on the ObjectDetection ROS topic, this
Robot Poses Planner ROS node consults the database to find the dropoff position and the
pick-and-place offset. The node then combines the fixed base position, dropoff position,
and the objectPickPose information in the received message to compute five poses for the
robot arm end effector: pick approach pose, pick pose, pick retreat pose, place pose and
home pose. The computed robot poses, along with detectionTime and objectIdentifier,
are sent on a ROS topic to the Robot Motion Planner ROS node. In addition, a message is
sent to the Gilbreth manager via a ROS topic to update the latter about the status of the
object in the handling process.
Robot Motion Planner This ROS node receives the tool poses and computes valid joint
trajectories to move the robot arm and end effector between the various robot poses. It
also computes the right time instant at which the vacuum gripper attached to the robot
arm should be enabled or disabled by combining the object detectionTime and the offset
stored in the database. RRTConnect algorithm is used to compute the trajectory. After the
computation, a serious of joint trajectories is queued for robot to execute.
Robot Execution This ROS node interacts with the Robot Controller to execute the
planned motion. Robot trajectories are queued and executed one-by-one to enable the robot
arm to pick up the target object, move to the dropoff position, place the object in the bin
located at the dropoff position, and finally move back to the home position.
3.4 Gilbreth software prototype
The Gilbreth application software is available at: https://github.com/swri-robotics/gilbreth.
Our starting point was to download and install ROS, specifically the Kinetic version.
Since we do not have access to a real factory floor with a conveyor belt and a UR10 robot
3.4 Gilbreth software prototype 21
arm, we used Gazebo, a dynamic multi-robot simulator for 3D enviroments1 , to develop the
Gilbreth application.
In addition to ROS and Gazebo, we installed the following packages:
ARIAC2 : The ARIAC osrf gear package includes a world file that is composed of
many models, such as a work cell model (which simulates the factory floor), sensor
models, and robot models, with corresponding plugins for Gazebo. Of these models
and plugins, for our Gilbreth application, we downloaded the source files (C code) for
the conveyor belt, laser break beam sensor (called ProximityRay plugin), and vacuum
gripper end effector, and corresponding model files, which specify input parameters
required by the plugins in Simulation Description Format (SDF)3 .
Gazebo ROS4 : Gazebo ROS and related packages provide wrappers for Gazebo to
support interfacing between external ROS nodes and Gazebo. One issue is that ROS
uses Universal Robotic Description Format (URDF), an XML file format to describe
all elements of a robot, while Gazebo uses SDF. However tutorials are provided to
enable the use of URDF files in Gazebo5 .
MoveIT!6 : This package offers state-of-art robot manipulation, motion planning and
control software. We used this package in our Gilbreth application for motion planning,
with collision detection, of the UR10 robot arm.
Universal Robot7 : This ROS-Industrial (ROS-I) package contains the models, configu-
rations and control systems required for Universal Robots. In addition to the robot
drivers (written in Python), kinematics software (written in C++), configuration files to
interface with the MoveIT! package, YAML files and XML based launch files required
to simulate a UR10 robot arm in Gazebo, are included. This package supported
integration with the MoveIT! package.

1
http://gazebosim.org/; https://bitbucket.org/osrf/gazebo
2
https://bitbucket.org/osrf/ariac/src
3
https://bitbucket.org/osrf/sdformat
4
http://wiki.ros.org/gazebo ros pkgs
5
http://gazebosim.org/tutorials/?tut=ros urdf
6
http://moveit.ros.org/
7
https://github.com/ros-industrial/universal robot
Point Cloud Library (PCL) [33] includes several packages that were used in our
object segmentation and object recognition ROS nodes. For example, we used the
Hough3DGrouping package [32], which implements a 3D correspondence grouping
algorithm to recognize objects in a scene.
Gilbreth environment (world) in Gazebo Our first task was to create a factory envi-
ronment in Gazebo for our Gilbreth application. Fig. 3.1 shows the simulated environment.
The break beam sensor can be seen as a blue line across the conveyor belt. The 3D camera,
Kinect sensor, is the gray box shown mounted above the belt. Relative to the Kinect sensor
position, the break beam is positioned such that the object is within the range of the camera
when the beam is crossed by the object, thus a full image of the object is captured. The
linear actuator on which the UR10 robot arm moves can be seen in white parallel to the
belt. The vacuum gripper (end effector) is shown at the end of the robot arm. The bins into
which different types of objects are dropped by the robot arm are mounted along the linear
actuator.
Five objects, each of a different type, are shown arriving on the conveyor belt in Fig. 3.1.
We implemented an external ROS node called Conveyor-Spawner to spawn objects of
different types on to the belt in the Gilbreth world in Gazebo. Object type, inter-object
spacing, and object orientation are selected at random. The Conveyor-Spawner node also
recycle the object dropped into the bin and reached to the end of the conveyor belt.
Object-Arrival Detection and Kinect Publisher We implemented new software for
both these ROS nodes. Both nodes subscribe to sensor topics as described in Section 3.6.
Since these sensors are simulated inside Gazebo, these two ROS nodes receive messages from
Gazebo.
Object Segmentation and Object Recognition These ROS nodes use the Correspon-
dence Grouping (CG) algorithm; more specifically, we used the Hough voting [32] algorithm
implemented in Point Cloud Library (PCL). Fig. 3.3 illustrates the various steps. Models
are object Point Cloud Data (PCD) files that are saved on disk. These files provide 3D
descriptions of each object type. The Models pipeline is performed only once for each object
Figure 3.3: Object segmentation and object recognition
type to generate features and frames. This pipeline includes normal computation, key points
extraction, key points descriptor computation, and key points reference frame computation.
The Scene block represents 3D images captured by the Kinect sensor. Any single image,
in addition to the object of interest, also includes background information, such as the
conveyor belt and warehouse floor, as these items fall with the range of the depth camera.
The segmentation process removes this background information to isolate the image of the
objects on the belt. The rest of the Scene pipeline (i.e., blocks after the Segmentation node)
is the same as the Models pipeline; the only difference is that the Scene pipeline is executed
for each captured scene, while the Models pipeline is executed only once at the start to
calculate necessary data for feature matching and correspondence grouping.
Once the key points descriptors are computed for a captured object PCD, a group
of one-to-one correspondences are checked between stored information in Models and the
captured information in Scene. Next, correspondence grouping is performed to exclude
wrong point-to-point correspondences due to nuisance factors like noise. The object type
in Models with the highest number of correspondences is reported as the type of object in
Scene.
Once object type is identified, Iterative Closet Point (ICP) is used to estimate the pose
(position and orientation) of the object in the Scene. For each type of object, one picking
pose is pre-defined and saved in database. The correspondence grouping procedure also
output an estimated object pose, however it has has low-precision accuracy and therefore
it is not suitable for pick pose estimation. ICP is used to minimize the distance between
Model PCD and Scene PCD and thus producing the an estimated pose for the object. The
estimated pose is used to transfer the pre-defined picking pose.
Robot Poses Planner We considered two pick-up policies for our Gilbreth application:
static-position pickup policy and dynamic-position pickup policy. Fig. 3.4a illustrates the
static-position pickup policy. With this policy, the robot arm always picks up objects
from the belt at the same position. After the robot arm picks up an object, moves to the
appropriate bin (as determined by object type) and places the object in that bin, the arm
returns to its static pickup position and waits for the next object.
(a) Gilbreth static-position pickup policy
(b) Gilbreth dynamic-position pickup policy
Figure 3.4: Gilbreth object picking policies
Fig. 3.4b presents a more complex dynamic-position pickup policy in which the robot
“chases” the next available object after placing the previous object in its appropriate bin.
This policy is only feasible if the linear actuator can move faster than the conveyor belt.
Assume that the conveyor-belt velocity is vbelt , and the linear-actuator velocity is vLA . Thus
the object moves at speed vbelt , and the robot arm moves at speed, vLA . At time tdrop , when
the robot arm drops an object into its bin, let its position be xrobot . Let the position of
the next candidate object O, for pickup at this time tdrop be xO . This ROS node computes
whether pickup of object O is feasible, and if so, it determines the pickup position, xpickup .
xO vLA − xrobot vbelt

xpickup = (3.1)
vLA − vbelt
If xpickup is within the maximum range of the robot arm, then the pickup is feasible; if not,
the robot arm does not chase object O, and instead allows the next robot arm along the
belt to pick up object O.
While the dynamic-position pickup policy is more complex, it is expected to result in
a higher object sorting throughput when compared to the static-position pickup policy.
Our current implementation is for the static-position pickup policy. However, we plan to
implement the dynamic-position pickup policy in a later version of Gilbreth.
We used configuration files to set parameters and limitations relating to the robot arm.
Parameters include per-object-type bin positions, the height that the robot arm should be
lifted after it picks up an object, and the time duration required for transition between
different robot poses. These parameters were tested and tuned to achieve better performance.
Robot Motion Planner The MoveIt! ROS package is used in this ROS node. MoveIt!
software integrates planners from Open Motion Planning Library (OMPL) [8]. Of these
planners, we chose to use the Rapidly-exploring Random Trees (RRT) Connect planner.
RRT Connect planner has high trajectory solving success rate and low computation time8 .
Fig. 3.5 illustrates the pipeline of motion planning process. When the Robot Motion Planner
receives the tool poses, a chain of computation began. It use current joint states of the robot
and target tool poses to compute the robot trajectories. If the robot trajectory computation
failed, the motion planner re-compute the trajectory if it does not reach the max re-planning
iteration parameter. If the robot trajectory computation succeed, the motion planner will
8
http://moveit.ros.org/assets/pdfs/2013/icra2013tutorial/ICRA13 Benchmark.pdf
Figure 3.5: Motion planner pipeline
monitor the execution status of the trajectory. If the trajectories executed successfully, the
ROS node will continue next planning, otherwise, failure status will be published.
3.5 Gilbreth software prototype evaluation
Experimental setup A single host was used to execute all components of the Gilbreth
application. This host has a single 4-core (8 threads) CPU (Intel Core i7-4710MQ@2.50GHz)
and 12 GB RAM. The OS used is Ubuntu 16.04.2 LTS with kernel version 4.10.0-37-generic.
To measure per-ROS-node CPU usage, we wrote a Python script that calls functions in
psutil9 , which is a Python library. These calls are executed in periodic intervals, where
the period is specified by the input frequency parameter. The results are collected in a csv
file and analyzed. In addition, to collect processing times for high-CPU-usage ROS nodes
such as the Object Recognition and Motion Planner nodes, the source code of these nodes
were modified.
Experiments execution To initiate all the ROS nodes of the Gilbreth application,
multiple launch files are used. The environment launch file instantiates the Gilbreth
environment in Gazebo and RViz. After the environment is created, we load parameters and
launch Kinect Publisher, Object Segmentation, Object Recognition, Robot Poses Planner,
and Robot Motion Planner nodes.

9
https://github.com/giampaolo/psutil
3.5 Gilbreth software prototype evaluation 27
Table 3.2 lists the parameter values. The inter-object spawning period is a uniformly
distributed random variable between 7 and 8 sec. The CPU-usage monitoring period is used
by our data-collection Python script. An object yaw variance of 180◦ implies that an object
can arrive in any orientation relative to the belt movement direction. The four object types
shown in Fig. 3.1 are used in this experiment. For each run of the experiment, 500 objects
were spawned, and measurements were collected for the CPU usage and processing times for
each object. A Gazebo real-time factor of 0.4 means that 0.4 sec in Gazebo requires 1 sec of
wall time to simulate. These low values of the real-time factor illustrate a need to parallelize
Gazebo.
Two ROS services are initiated to start spawning random objects on to the moving
conveyor belt. The application execution automatically starts when these two services
are initiated. When all the ROS nodes are running, we launch our Python script to start
collecting data.
Parameter Symbol Value

Inter-object spawning period tspawn [7, 8]sec
CPU-usage monitoring period tcpu 0.3 sec
Object yaw variance ∆deg 180◦
Object type Nobj 4
Number of objects/experiment Nexp 500
Gazebo real time factor R 0.4-0.6
Table 3.2: Values of experiment parameters
Experimental results From the collected data, ROS nodes can be classified into three
groups: (i) high-CPU usage, (ii) low-CPU usage, and (iii) burst-CPU usage. The high-CPU
usage ROS nodes always consume more than 100% CPU (the maximum value is 800% since
our server has 8 threads). The low-CPU usage ROS nodes always consume less than 50%
CPU. Finally, the ROS nodes in the burst-CPU usage group require high CPU resources
only in small bursts.
Table 3.3 shows the classification of all major ROS nodes in Gilbreth. Gazebo requires a
sustained level of approximately 200% CPU cycles. CPU usage is below 50% for the Kinect
Publisher, Robot Motion Planner and Robot Poses Planner. The Robot Motion Planner
creates a Move Group ROS node to compute the robot joint trajectories. The Move Group
node has a burst CPU usage during its computation.
CPU Usage Node Type

High Gazebo
Low Kinect Publisher, Robot Poses Planner, Robot Motion
Planner
Burst Object Recognition, Move Group, Object Segmentation
Table 3.3: ROS-node classification based on CPU usage
(a) CPU usage of selected ROS nodes

(b) Robot motion planner processing time
Figure 3.6: CPU usage and processing time for one ROS node
Fig. 3.6a shows the CPU usage for Move Group (blue), Object Segmentation (green) and
Object Recognition (red) nodes for an experimental run of 30 seconds in which four disk
objects were processed. Of these three ROS nodes, the Object Recognition node consumes
the most CPU cycles. The peaks occur when the code leverages parallel processing for
feature extraction.
Next, we executed runs in which all 500 objects were of the same type in order to
generate the boxplots shown in Figs. 3.6b and 3.7a. Fig. 3.6b shows that motion planning
for a single-robot arm setup is not computationally intensive. However, object recognition
time is significant as seen in Fig. 3.7a. The object recognition times for the four objects are
directly related to the size of the objects, with the pulley being the largest and the piston
rod being the smallest.
Fig. 3.7b shows the total time for the robot arm to execute its motions between the
four poses. These results were obtained with 1000 executions of the Robot Motion Planner
3.6 Gilbreth message description and experiments 29
(a) Object recognition processing time (b) Robot execution time
Figure 3.7: Compare object recognition time with physical robot arm movement time
for each object type. For each type of object, the four poses are the same. However, the
output shows that the trajectories computed by the motion planner is not relatively stable.
The amount of the outliers is very high. We propose an improvement in the next chapter.
Another interesting point to note is that the recognition time for the larger objects has a
median value that is close to, or even larger, than the time taken to execute the robot arm
movement. The implication is that in a multi-robot system, we will require multiple servers
or faster servers to run the object recognition ROS nodes.
3.6 Gilbreth message description and experiments
The ROS framework uses a publish-subscribe method for communications. Publishers
broadcast their messages on ROS topics. Subscribers sign-up for topics a priori based on
their requirements, and thus receive the corresponding published ROS messages. While ROS
topics are used for unidirectional communication, ROS services are used for bidirectional
communication. While each ROS topic is associated with a single message type, ROS services
have two message types (request and reply). Section 3.6.1 describes the Gilbreth message
flow, and Section 3.6.2 describes the experiments we conducted to quantify the sizes of the
Gilbreth ROS messages.

3.6.1 Gilbreth message description
The Gilbreth application is designed for distributed architectures, and hence has numerous
ROS nodes. Fig. 3.8 shows the Gilbreth ROS nodes and ROS topics flow chart characterizing
the communication patterns. Only those ROS topics that transfer data values necessary for
the main pick-and-sort functionality are described below.
Figure 3.8: Gilbreth message communication flow; ROS topics: blue; ROS services: red
Gazebo Gilbreth Module This ROS node simulates the factory environment illus-
trated in Fig. 3.1. Fig. 3.8 shows the five major components in the Gilbreth module of
the Gazebo simulator: Kinect Camera, Break Beam, Vacuum Gripper, UR10 Robot, and
Bins-and-Invisible Wall. The Kinect-camera component monitors the conveyor belt
and publishes captured depth images to the ROS topic /depth camera/depth/point. The
message type for this ROS topic is described in the ROS package sensor msgs/PointCloud2.
The state of the break-beam sensor, illustrated in Fig. 3.11a, is described in ROS package
gilbreth gazebo/Proximity and published to ROS topic /break beam sensor change.
The min range and max range data values indicate the detection range of the break
beam. The Header describes the time when the ROS message is published. The boolean
value object detected changes to True when an object enters the detection range of
the break-beam sensor. Gazebo publishes this ROS message whenever the boolean value
object detected changes state, which occurs both when an object enters the range and
when the object leaves the range of the break-beam sensor.
The vacuum-gripper component publishes its status to ROS topic /gripper/state. The
vacuum gripper state message is described in ROS package gilbreth gazebo/VacuumGripper
State. The message, shown in Fig. 3.11b, characterizes whether or not the vacuum gripper
is enabled, and if enabled, whether or not there is an object attached to the gripper. In
other words, this message indicates whether or not the vacuum gripper has successfully
grasped an object.
The UR10-robot component publishes its messages to ROS topic /joint states. The
associated message type is described in ROS package sensor msgs/JointState. The
message carries data on the states of all the robot joints. The state of each joint, shown in
Fig. 3.11c, is specified by the name, position, velocity and effort of the joint, and the time
when the joint state was recorded. The UR10 robot also communicates with the move group
ROS node via two ROS services: /robot rail controller/ and /robot arm controller.
The messages sent in these services include data on the: (i) status of the robot controllers,
(ii) feedback on the move instructions, and (iii) success or failure of the move operations.
The Bins-and-Invisible-Wall component is built to recycle objects. Five bins are placed
inside the robot’s workspace, and an invisible wall is mounted at the end of the conveyor
belt to catch missed objects. This component publishes the pose and the type for both
captured and missed objects to ROS topic /disposed models10 . The associated message
type, described in ROS package gazebo msgs/ModelStates and illustrated in Fig. 3.11e,
provides information for the conveyor spawner ROS node to spawn new objects within the
Gazebo Gilbreth environment.
Kinect Publisher This ROS node was implemented in a new ROS package called
gilbreth perception. It subscribes to the ROS topics /depth camera/depth/point and
/break beam sensor change, as illustrated in Fig. 3.8. The Point Cloud Data (PCD) cor-
responding to a single image is sent on ROS topic /kinect points, which is described in
10
The term ”model” is synomymous with ”object.”
(a) Piston rod (b) Gasket (c) Pulley (d) Disk (e) Gear
Figure 3.9: Depth images of 5 types of objects
(a) Piston rod (b) Gasket (c) Pulley (d) Disk (e) Gear
Figure 3.10: Depth images of 5 types of objects after object segmentation
package sensor msgs/PointCloud2. The associated message is sent only when the message
received on the /break beam sensor change topic indicates that one-or-more objects en-
tered the detection range of the break-beam sensor. In addition to the PCD, a time stamp
indicating when the break-beam sensor triggered is included in the message.
Robot State Publisher This new ROS node, obtained from its eponymous ROS package,
was integrated into Gilbreth this year. The node reads the robot description parameters from
the ROS parameter server, described in section 2.1.2. The parameters are loaded at the start
of the application execution. This ROS node subscribes to the ROS topic /joint states, as
illustrated in Fig. 3.8. After computing the forward kinematics for the robot, this ROS node
publishes the results via a transform tree, described in ROS package tf2 msgs/TFMessage,
to ROS topic /tf. The transform tree, part of which is illustrated in Fig. 3.11d, is used to
transform points and vectors between two coordinates.
Conveyor Spawner We implemented this new ROS node to spawn objects, whose type
and pose are selected randomly, onto the conveyor belt. The software for this ROS node
is included in the gilbreth gazebo package. This ROS node subscribes to the ROS topic
/disposed models, whose message is illustrated in Fig. 3.11e. This message provides
(a) Break beam sensor change ROS message
(b) Gripper state ROS message
(e) Disposed model ROS message

(c) Joint state ROS message
(f) Tool pose ROS message
(g) Object detection ROS message
(d) Transform tree ROS message
Figure 3.11: Gilbreth ROS messages

information about the objects disposed into the bins or stopped by the invisible wall. For
each disposed model, this ROS node spawns a new object. The node then publishes the
spawned-object type and the object-spawning time stamp in a message to a ROS topic (not
shown in Fig. 3.8).
Object Segmentation This ROS node subscribes to the ROS topic /kinect points.
The segmentation process removes the background information in each received depth image,
and publishes the filtered PCD, described in sensor msgs/PointCloud2, to ROS topic
/segmentation result. Fig. 3.10 shows the depth images after the object-segmentation
processing for each of the five object types shown in Fig. 3.9.
Object Recognition This ROS node subscribes to ROS topics /segmentation result
and /tf. The transform tree data is used to compute the object pick pose from camera coor-
dinates to world coordinates. If the object is recognized, the Object Recognition ROS node
publishes the object detection message, described in ROS package gilbreth msgs/ObjectDe-
tection, to ROS topic /recognition result. The message, shown in Fig. 3.11g, provides
the detection time, the type and the pick point pose of the object.
Tool Planner This ROS node subscribes to the ROS topic /recognition result and pub-
lishes the tool poses message, described in ROS package gilbreth msgs/TargetToolPoses,
to ROS topic /target tool poses. The ROS message, depicted in Fig. 3.11f, shows one of
the four tool poses for an object. The Header field describes the execution deadline and the
pose field describes the position and orientation of the UR10 robot gripper for the particular
object.
Robot Execution This ROS node subscribes to ROS topics /target tool poses and
/gripper/state. The ROS node calls the functions in MoveIt! library to compute robot
trajectories, and the MoveIt! library automatically generates the Move group wrapper
ROS node.
Move Group Wrapper This ROS node is the python wrapper of the C++ move group
library. The Move Group Wrapper ROS node creates the Move Group ROS node and
interacts with the latter via ROS services /execute trajectory/*. The associated message
types are described in ROS package actionlib msgs. The Move Group Wrapper ROS node
monitors the status, and the results of the robot trajectory execution.
Move Group This C++ ROS node Move Group was obtained from ROS package MoveIt!.
This node computes the trajectories between different tool poses for the robot and its
vacuum gripper to move into position, pick up an object from the conveyor belt, and
place the object in a bin based on its type. This node also controls the robot simulated
in the Gazebo Gilbreth Module. The computed trajectories, described in ROS package
trajectory msgs/JointTrajectory, are published to ROS topic /robot arm controller/
follow joint trajectory/ and /robot rail controller/follow joint trajectory/.
The Gazebo Gilbreth Module and the Move group ROS nodes interact through ROS
services /robot arm controller/command and /robot rail controller/command, moni-
toring the status of the robot controllers, the feedback on the move instructions, and the
success or failure of the move operations.
3.6.2 Experiments
We planned and executed experiments to quantify the sizes of the Gilbreth ROS messages.
Experimental setup The experimental setup was as follows. A single host was used
to execute all components of the Gilbreth application. This host has a single 4-core (8
threads) CPU (Intel Core i7-4710MQ@2.50GHz) and 12 GB RAM. The OS used is Ubuntu
16.04.2 LTS with kernel version 4.10.0-37-generic. To measure the sizes of the Gilbreth ROS
messages, we wrote a python script, msg monitor, that subscribes to the desired ROS topics,
and measured the memory usage of each ROS message. We leveraged Pympler11 , a Python
library, specifically, the asizeof module, which measures the memory consumption of each
ROS message.
11
https://github.com/pympler/pympler
Experimental execution and results After launching the Gilbreth application, we
launched the python script msg monitor to subscribe to different ROS topics, and measure
the sizes of ROS messages. The inter-object spawn period was varied randomly in the range
6-8 seconds. Gilbreth ROS messages can be classified into two categories based on their
publishing patterns: (i) triggered messages, and (ii) periodic messages. Table 3.4 shows
the triggered ROS messages. The bandwidth consumed by each triggered message is not
quantified because it largely depends on the inter-object spawning period, and the successful
completion of the previous ROS nodes in the per-object processing timeline. Table 3.5
describes the periodic ROS messages, i.e., those that are published continuously with a
certain frequency. The bandwidth measurement was recorded using a ROS built-in command
line tool rostopic bw. The sizes of both types of ROS messages were recorded in csv files
for later analysis.
The majority of the ROS messages have constant message sizes. For example, the point
cloud image of an object, published by the Kinect Publisher ROS node, is consistently
9.8 MB. The size of the point cloud data is determined by the resolution of the Kinect
Camera. Meanwhile, the Object Segmentation ROS node publishes messages of varying
size. The message size depends upon the PCD being processed, which in turn depends on
the object type and background information. Hundred objects of each type were spawned in
the experiments to compute the average message size per object type.
ROS topic Message type Message Size

break beam change gilbreth gazebo/ 48 B
Proximity
kinect points sensor msgs/ 9.8 MB
PointCloud2
segmentation result sensor msgs/ gear:13.2 KB, disk:18.2 KB, pulley:14.5
PointCloud2 KB, piston rod:5.3 KB
recognition result gilbreth msgs/ 1.3 KB
ObjectDetection
target tool poses gilbreth msgs/ 3.9 KB
TargetToolPoses
disposed models gazebo msgs/ 129 B
ModelStates
Table 3.4: Message size of triggered ROS message

3.7 Conclusions 37
ROS topic Message type Message Size Bandwidth Frequency

gripper/ state gilbreth gazebo/ 2B 321.32 B/s 160 Hz
VacuumGripper-
State
joint state sensor msgs/ 0.38 KB 19.13 KB/s 50 Hz
JointState
tf tf2 msgs/ TFMes- 0.80 KB 24.35 KB/s 30 Hz
sage
Table 3.5: Message size of periodic ROS message
The experimental results showed that the largest triggered ROS message was the object
PCD published by the Kinect Publisher ROS node. The Gazebo-simulated Kinect camera
publishes point cloud data at a frequency of 15 Hz. By implementing the Kinect Publisher
ROS node, which only transmits the PCD upon receiving an indication from the break-beam
sensor of a state change, we significantly decreased the network bandwidth consumed by the
Amongst periodic ROS messages, the tf topic messages consumed the highest bandwidth.
However, the total bandwidth used by the Gilbreth application was still very low.
3.7 Conclusions
This chapter demonstrated the feasibility of implementing a pick-and-sort application in
which computer vision is combined with sophisticated motion planning to enable an industrial
robot arm to position itself and its end effector to successfully pick up objects from a moving
conveyor belt, move to the correct bin (based on object type) and place the object in the
bin. Details are provided on how we combined software from various ROS and ROS-I
packages for rapid application development. Finally, experiments were conducted to evaluate
the prototype. The Gazebo simulator of the factory environment, the object-recognition,
object-segmentation, and Move Group ROS nodes required the most amount of computing
cycles. We found that the robot execution time for picking up, moving and placing an object
in its right bin, takes approximately the same amount of time as the processing time required
for object recognition. This finding shows a need to improve the object-recognition pipeline.
Based on our Gilbreth ROS messaging experiments, it appears that the Gilbreth applica-
tion has a small network bandwidth requirement, and therefore, deployment of Gilbreth ROS
nodes across a Wide Area Network (WAN) may be feasible. However, on closer analysis,
we observe that only some ROS nodes can be located remotely on a cloud-computing data
center that is connected by a WAN to the factory floor.
We identified two types of ROS messages, triggered and periodic. The triggered ROS mes-
sages are mostly of fixed size. The Object Setmentation, Object Recognition and Tool
Planner ROS nodes can be deployed on the cloud to take advantage of high computational
resources. In contrast, the Kinect Publisher ROS node subscribes to a high-bandwidth
ROS topic from the Gazebo Kinect Camera plugin. Therefore, this ROS node should be
deployed near or on the factory floor to avoid unncessary WAN communications. If the
inter-object spawning period is 10 seconds, and the Gazebo Kinect Camera monitors the
conveyor belt and publishes its PCDs at 15 Hz, a total of 150 PCDs are generated for one
object. Among these 150 PCDs, only one PCD is required for object segmentation and
object recognition. If the Kinect Publisher ROS node is run on a cloud computer, 99% of
PCDs sent to the cloud would be unnecessary data.
For the ROS nodes that subscribe to periodic ROS messages, latency is our dominant
concern. Most of the periodic message sizes are much smaller than the Ethernet Maximum
Transmission Unit (MTU) packet size of 1500 bytes. Thus, most periodic ROS messages can
be sent within a single packet. The implication of this observation is that most messages will
require a single TCP round-trip in the Slow Start phase on an open TCP connection. However,
most of the periodic messages have a high frequency; for example, the gripper/state ROS
message has a 160 Hz frequency. Since the Robot Execution, Move Group and Move Group
Wrapper ROS nodes subscribe to these periodic messages, locating these ROS nodes across
the WAN in a cloud computing data center will entail needing more network bandwidth.
Chapter 4
Gilbreth Prototype Improvements

and Evaluation
4.1 Introduction
Improvements were made in the object-recognition and in the motion-planning pipelines.
Fine tunings on object-recognition parameters and improvements in the object-recognition
pipeline were implemented by our collaborators at UTD. We characterized and evaluated
the improvements in isolation. Section 4.2 describes the implemented improvements and our
evaluation.
Fig. 3.6b shows that the motion-planner output is not robust, i.e., although the starting
pose and the ending pose are the same for each trajectory computation, the robot-execution
times had many outliers. To address this problem, we propose an improved motion planning
pipeline, which is described in Section 4.3.
We carried out an experimental evaluation of the overall Gilbreth pick-and-sort process.
The pick-and-sort success rate as a function of the mean inter-object spawning period was
measured and plotted. Section 4.4 provides details of our experiments and presents our
findings. The current implementation, even with these improvements, achieved only 71.3%
pick-and-sort success rate at the best setting that we measured. An analysis of the reasons
for this relatively low success rate showed that approximately 10% of the failure rate was due
39
Chapter 4 Gilbreth Prototype Improvements and Evaluation 40
to excess load (i.e., the robot arm could not keep up with the object arrival rate), and the
remaining failures were occurred in the motion-planning and grasping processes. Methods
for improving these algorithms are described in Chapter 5.
Section 4.5 describes our evaluation of a Convolution Neural Network (CNN) based object-
recognition algorithm called VoxNet, which was implemented by our UTD collaborators. This
solution offers significant improvement in object-recognition times over the CG algorithm
in the original prototype, but this CNN algorithm requires GPU computational resources
for long durations (on the order of hours) for model training. Cloud computing is a good
solution for executing model training processes.
4.2 Object recognition improvements
The processing time of the Correspondence Grouping (CG) algorithm depends upon the
product x×y×z, where x, y, and z represent length, width, and depth of the input object point
cloud data, respectively. To reduce object-recognition time, our UTD collaborators tuned
the object recognition parameters and improved the code by adding a PCD-downsampling
phase at the start, and then slowly increasing resolution until the object could be recognized
and a pick pose could be computed on the object data for the robot-arm end effector to
grasp.
At UVA, we ran two experiments to evaluate the fine-tuned object recognition ROS node,
as described in Section 4.2.1, and we characterized the improved pipeline in Section 4.2.2.
4.2.1 Experiments on fine-tuned object recognition process
Two experiments were conducted to evaluate the fine-tuned object-recognition pipeline.
Experiment 1 was setup on UVA local machine. The experiment evaluate the fine-tuned
object recognition process by computing the average recognition processing time, and
compared it with the processing time of the original parameters. In Experiment 2, we setup
a network environment on the Global Environment for Network Innovations (GENI) [34],
and evaluated if the object-recognition process can benefit from the Cloud technology.
4.2 Object recognition improvements 41
Experiment 1: UVA-local experimental setup and execution A single host was
used to execute all components of the Gilbreth application. This host has a single 4-core
(8-thread) CPU (Intel Core i7-4710MQ@2.50GHz) and 12 GB RAM. The OS used was
Ubuntu 16.04.2 LTS with kernel version 4.10.0-37-generic.
To measure the processing time of each object type, the object recognition ROS node
was modified to publish the processing time to a ROS topic /recognition duration. We
implemented a Python program to record the spawning time of the object, and the spawned
object type from the Conveyor Spawner ROS node in one csv file, and to record the object
recognition processing time, and the recognized object type from the Object Recognition
ROS node in a second csv files.
Table 4.1 lists the parameters used in the experiments. Four types of objects: disk,
pulley, gear and piston rod, were used in this experiment. For each object type, 500 objects
were spawned. The inter-object spawning period was selected randomly from the range 6 to
8 seconds. The Gazebo real time factor was 1.0, indicating a real-time simulation.
Two launch files, as described in Section 3.5, were used to launch the Gilbreth Application.
The python script was launched to record data after the Gilbreth application was launched.

Inter-object spawning period tspawn [6, 8] sec
Object type Nobj 4
Gazebo real time factor R 1.0
Table 4.1: Experiment-1 parameters
Experiment 1 (UVA-local experimental setup and execution) results Fig. 4.1
illustrates box plots of the object-processing time for four different object types (across
the 500 measurements obtained for each type). The object-recognition success rate was
computed by comparing the spawned and recognized object types across the two csv files.
Table 4.2 shows that the recognition success rate was 100.0% for all object types. We
compared these new results with the recognition times reported in Section 3.5 (before the
improvements). The processing time for pulley, disk and piston rod decreased by 62.69%,
46.38% and 50.35% respectively. Comparison of gear model is not provided as we used
larger gear models in the experiment. The experiment showed that the improved object-
recognition pipeline can significantly decrease computational load with negligible impact on
object-recognition success rate for Gilbreth objects.
Figure 4.1: Experiment-1 object-recognition processing times with the improved pipeline
Object Processing Time (before, af- Reduction Success

Type ter) improvements Rate
Pulley 7.8954, 2.9485 62.69% 100%
Disk 6.1979, 3.3235 46.38% 100%
Piston rod 2.0884, 1.0369 50.35% 100%
Table 4.2: Experiment-1 results on the improved object-recognition process
Experiment 2: GENI experimental setup and execution A two-server testbed was
used to test the performance of the improved object recognition ROS node. We took
advantage of GENI to build the network environment. Both servers have the same hardware
configurations: single 1-core (1-thread) CPU (Intel Xeon E5-2459@2.10GHz) and 860MB
memory. The OS used is Ubuntu 16.04.1 LTS with kernel version 4.4.0-75-generic.
The parameters we configured in both testbeds are described in Table 4.3. The experiment
has two steps. First, we launched the entire Gilbreth application on one server and recorded
the processing time collected for object recognition for each type. Second, we modified
the ROS launch file to launch Gazebo, the factory environment, on server A and all the
remaining ROS nodes on server B, as illustrated in Fig. 4.2a. After the application was
4.2 Object recognition improvements 43
(a) Experiment-2 setup in GENI testbed

(b) Comparison of performance on one GENI vs.
on two GENI servers
Figure 4.2: Experiment-2 results on the improved object-recognition pipeline
successfully launched on both servers, a python script was executed to record the processing
time of object recognition on server B. The average processing time for the object recognition
ROS node is calculated after these experiments have concluded.

Inter-object spawning period tspawn [6, 8]sec
Object type Nobj 5
Gazebo real time factor R 1.0
Table 4.3: Experiment-2 parameters
Experiment 2 (GENI experimental setup and execution) results Fig. 4.2b shows
that the reduction in object-recognition processing times when using two servers instead
of a single server. The single-server processing times observed here are longer than the
UVA-local machine processing times due to hardware limitations of the GENI servers. The
disk object type had the longest processing time among the five object types tested.
In conclusion, we propose using a multi-server setup to handle all the ROS nodes of the
4.2.2 Improved object recognition pipeline characterization
We created Fig. 4.3 to characterize the improved pipeline, proposed by our UTD collaborators.
The downsampling process initially decreases the resolution significantly to reduce the size
of the original PCD file. Some local geometry features could be lost during this process.
If the object is successfully recognized, the computation proceeds to determine the pose
for the object pick point (i.e., the point at which the object will be grasped by the robot-
arm end effector) based on the object’s current orientation and pose. However, if the
CG algorithm fails to recognize the object in the downsampled PCD, the downsampling
parameter is reduced and a new PCD image with more local-geometry features is computed.
The downsampling and recognition procedure is repeatedly executed until the object is
either recognized or the computation reaches a maximum iteration number, at which point
failure is declared.
Figure 4.3: Improved object recognition pipeline
4.3 Motion planner improvements
We developed a new pipeline for the motion planning process, which is illustrated in Fig. 4.4.
Our motivation for making this improvement came from observations seen in Fig. 3.6b and
3.7b.
Fig. 3.6b shows that most trajectory computations can be completed within 0.5 seconds,
but Fig. 3.7b shows many outliers. For all objects of a given type, the starting and ending
poses for the robot to pick up an object and place the object in a bin are the same, and
yet we observe many outliers in the motion execution times seen in Fig. 3.7b. We observed
that the main motion-planner algorithm, RRT-Connect, which is part of the Move-It! ROS
package, sometimes generates unusually long trajectories. In these long trajectories, we

4.3 Motion planner improvements 45
Figure 4.4: Improved motion planning pipeline
observed that the robot arm does not move to the target pose directly, but instead wanders,
and even rotates for a while, before reaching the desired pose. A further study of the
relationship between the trajectory-planning computation time and the length of the output
trajectory showed that the longer the computation time, the longer the trajectory.
To avoid unreasonably long trajectories, we developed a new pipeline for the robot
motion planning process. We first set the maximum planning time for the motion planner to
0.5 seconds. If the trajectory cannot be computed within 0.5 seconds, the motion planning
process is considered to have failed. This choice of 0.5 seconds comes from our observation
that in most cases, if motion planning takes longer than 0.5 seconds, the resultant trajectory
is too long.
We placed another constraint on the time threshold for each phase of robot movement.
In general, a successful pick-and-sort process has four phases between five robot-tool poses:
home pose, pick-approach pose, pick pose, pick-retreat pose, and place pose. We measured
the average robot execution time for each these four phases, and configured corresponding
time thresholds as listed in Table 4.4. When motion planning is completed successfully, we
compare the expected trajectory-execution duration with the corresponding time threshold.
If the expected trajectory execution duration is higher than the corresponding time threshold,
the trajectory computation is considered a failure. Then, the time threshold is increased,
and a new trajectory is recomputed. If the motion planner ROS node cannot find a valid
trajectory within 10 iterations, the planning is considered to have failed.

Trajectory Phase Time Threshold

home pose to pick approach pose 3 seconds
pick approach pose to pick pose 2 seconds
pick pose to pick retreat pose 2 seconds
pick retreat pose to place pose 5 seconds
place pose to home pose 5 seconds
Table 4.4: Improved motion planning time threshold
4.4 Experiment on application performance
The Gilbreth application with the improved object recognition pipeline, and an improved
motion planning pipeline was executed, and its object pick-and-sort performance was
evaluated.
Experimental setup A single host was used to execute all components of the Gilbreth
application. This host had a single 4-core (8-thread) CPU (Intel Core i7-4710MQ@2.50GHz)
and 12 GB RAM. The OS used was Ubuntu 16.04.4 LTS with kernel version 4.13.0-45-generic.
To measure the pick-and-sort performance of the Gilbreth Application, we modified
the code of each ROS node and programmed a Gilbreth Monitor ROS node to monitor
the status of each object. The modified Conveyor Spawner ROS node publishes the type,
spawning time, and sequential number of the spawned object to a ROS topic. The Gilbreth
Monitor ROS node subscribes to this ROS topic, and saves all collected information in a
csv file. This csv file contains the essential information about the objects spawned on to the
conveyor belt.
When an object passes the break-beam sensor, a message is received on the break beam
sensor change ROS topic by both the Kinect Publisher node and the Gilbreth Monitor
node. This message carries an object-entry timestamp. The Kinect Publisher node assigns
a unique identifier (id) to the object (which is used throughout the processing by all ROS
nodes), and sends this id along with PCD data and the object-entry timestamp on the
kinect points topic. The Gilbreth Monitor node subscribes to this topic, and is thus
able to associate the object-entry timestamp received on the break beam sensor change
ROS topic with the message received on the kinect points topic.
4.4 Experiment on application performance 47
The Gilbreth Monitor ROS node subscribes to ROS topics published by the Object
Segmentation, Object Recognition, and Tool Planner ROS nodes. For each processed
object, each of these nodes sends a message that carries a record of the time instant at which
the node completed its computation, and whether or not the node successfully completed
its processing actions on the object. The Gilbreth Monitor ROS node saves all this
information in a second csv file.
Collection of information about the processing of an object by the Robot Execution
ROS node is done using a different method. Instead of sending information to the Gilbreth
Monitor ROS node, the Robot Execution node records object-processing information itself
in its own csv file, due to implementation challenges. Specifically, the Robot Execution
node records information about three potential types of failure that could occur in the robot
execution process: (i) motion planning failure, (ii) assignment failure, and (iii) grasping
failure.
In summary, a combination of three csv files, two collected by the Gilbreth Monitor
ROS node and one collected by the Robot Execution ROS node, hold all the necessary
data to determine the overall application pick-and-sort performance.
Experimental Execution The process launch file was modified to automatically launch
the Gilbreth Monitor ROS node while launching all other ROS nodes. The Gilbreth
environment launch file and the process launch file were executed to launch the complete
application.
Table 4.5 lists the configured experimental parameters. Five different object types were
randomly spawned in the application. We evaluated the pick-and-sort success rate for six
different inter-object spawning periods, ranging from 4 seconds to 14 seconds.
To determine the number of objects needed for the pick-and-sort evaluation experiments,
we first ran the whole application with each of the six inter-object spawning periods until
200 objects had been processed. Our assumption was that, for each spawning period, the
pick-and-sort success rate would vary initially, e.g., the rate could be 0 or 100% after the
first object, 0, 50 or 100 % after the second object, etc. We expected the rate to settle into
a steady-state value after a certain number of objects had been processed. The number of
objects needed to reach this steady state could depend on the inter-object spawning period.
After we determined the number of objects that needed to be spawned for each setting of
the inter-object spawning period, we ran the experiment 15 times per inter-object spawning
period. The average pick-and-sort success rate and multiple failure rates were computed
from the data in the recorded csv files.

Parameters Value
Spawning period [4, 6, 8, 10, 12, 14] sec
Time Variance 1 sec
Object yaw variance 180◦
Object lateral variance 0.2 m
Object type 5
Number of object spawned 200
Number of experiments per spawn period 15
Conveyor belt velocity 0.15 m/s
Table 4.5: Values of experimental parameters for evaluating application performance
Experimental Results Since the inter-object spawning period was random, with a
specified mean value, e.g., 7, with a range, e.g., (6, 8), we were interested in understanding
the distribution of this period. Therefore, we plotted the observed inter-object spawning
periods in a histogram. Specifically, we used the settings shown in Table 4.6, and observed
the histogram shown in Fig. 4.5. The shape appears to be a truncated normal distribution.
Parameters Value
Spawning period 7 sec
Time Variance 1 sec
Number of object spawned 700
Table 4.6: Inter-object spawning period parameters
Fig. 4.6 shows the results of our experiment to determine the minimum number of objects
required for each inter-object spawning period at which the pick-and-sort success rate reaches
a steady-state value. Our finding is that the convergence to steady state occurs after roughly
150 objects are spawned for most of the inter-object spawning periods considered. Therefore,
we decided to spawn 200 objects for each experiment. For each inter-object spawning period,
15 experiments were conducted, and the results were recorded and analyzed.
Figure 4.5: Inter-object spawning time distribution
Figure 4.6: Pick-and-sort success rates for different inter-object spawning periods
Fig. 4.7 presents our overall pick-and-sort performance results. The green bars show the
overall success rate, which increases with the mean inter-object spawning period. At its
largest setting of 14 sec, the success rate is 71.3%. In addition, this graph provides rates for
five different types of failure: (i) recognition failure, (ii) tool pose failure, (iii) excess-load
failure, (iii) robot-execution failure, and (iv) grasping failure.
The average object-recognition failure rate for an inter-object arrival time longer than
6 seconds was lower than 0.2%. When the inter-object spawning period was 4 seconds,
the object-recognition failure rate was significantly higher at 6.95%. The likely cause of
this failure is that the break-beam sensor trigger was missed. When the mean inter-object
spawning period was 4 seconds, the range was (3, 5) seconds. The velocity of the conveyor
belt was 0.15 meters per second, which means the minimum inter-object distance was 0.45
meters. If two big objects were spawned successively on to the conveyor belt within a very
short time interval (e.g., 3 seconds), there could be no space between the objects, and
therefore the break beam sensor would only be triggered once. As a result, the sensor would
fail to detect the second object, which in turn would cause the Kinect Publisher ROS
node to publish inaccurate PCD. Without accurate PCD, there will be an object-recognition
failure.
The next failure type, the tool pose failure, was 0.0% for all inter-object spawning
periods.
The excess-load failure occurs when the robot arm is unable to reach the pick pose in
time, which happens when the object arrival rate is larger than the robot service rate. The
latter is in inverse proportion to the time taken for the robot to pick-and-sort an object. In
other words, if the robot is still processing an object when a second object passes under its
arm position, the robot will be unable to pick up the second object. For mean inter-object
spawning periods smaller than 14 seconds, this failure type is the highest contributor to
overall failure. A robot arm cannot move too fast while maintaining a robust and smooth
trajectory. This imposes a minimum rate at which the robot arm can execute all the
trajectories provided to it by the pick-and-sort application. We define the minimum value of
the robot service time as Trobot , and we define the upper limit of the robot service rate as
Srobot .The object arrival rate is defined as Aobject .
1
Srobot = (4.1)
Trobot
If the object arrival rate Aobject is larger than the maximum robot service rate Srobot , the
robot cannot serve all objects in time. For a pre-defined object arrival rate, decreasing the
service time Trobot increases the service rate Srobot , which will in turn reduce the excess-load
failure rate, and correspondingly increase the pick-and-sort success rate.
The average duration between the time when the object triggers the break-beam sensor
and the time when the robot arm returns to the home pose was 26 seconds. The average
distance between the break-beam sensor location and the position on the belt where the
Figure 4.7: Picking evaluation of the Gilbreth application
object was being picked up was 2.0 meters. Hence the duration that an object is carried on
the conveyor belt before pickup can be estimated:
Lconveyorbelt
tobj conveyor = ≈ 13seconds (4.2)
Vconveyorbelt
Subtracting this conveyor-belt movement time (13 sec) from the total average duration
between the break-beam sensor trigger and the return to home pose for the robot arm (26
sec), we see that on average, only 13 seconds were available to the robot arm for completing
its service for each object. Therefore, when the inter-object spawning period is lower than
13 seconds, this excess-load failure rate increases significantly.
A motion-planning failure occurs when the motion planner, described in Section 4.3,
can not find a valid trajectory. We observed several poses at which the motion-planning
algorithm appears to stall, and is unable to find a valid trajectory for a long duration. Our
observation is that when the motion planner gets stuck in this state, it takes at least 6
object-processing times to recover. If the motion-planning algorithm or its pipeline can be

improved further, these unnecessary motion-plan failures can be avoided to increase the
pick-and-sort success rate.
The gripper grasping failure happens when the robot successfully reaches the pick pose
and enables the gripper, but fails to grasp the object. In other words, the object does not
attach to the gripper. We postulatee that either: (i)the pick pose is not perfectly aligned, or
(ii)the robot-arm end effector may be slightly offset from the desired pose. A more precise
grasping process can decrease this failure rate.
4.5 VoxNet-based object-recognition pipeline
As described in Section 3.4, the Correspondence Grouping (CG) algorithm was used for
object recognition. The object-recognition processing time increases linearly with the
number of object types for the CG algorithm. For users with access to GPU resources,
an alternative 3D object recognition algorithm based on Convolutional Neural Networks
(CNNs), called VoxNet, as described in Section 2.1.3, is a better option. Our experimental
evaluation shows that the object-recognition processing time with VoxNet is significantly
smaller than the CG-algorithm-based object recognition. Because the recognition processing
time of CNN algorithms is smaller compared to the CG algorithm. Section 4.5.1 describes
our implementation of a ROS-compatible-version of the VoxNet software for 3D object
recognition. Section 4.5.2 describes the experimental evaluation we conducted on VoxNet-
based object-recognition ROS node.
4.5.1 Implementation
Voxnet [2], described in Section 2.1.3, is implemented in our application. The CNN object
recognition pipeline, illustrated in Fig. 4.8, works as follows:
The object Segmentation ROS node remains unchanged from the CG pipeline. It
removes the background scene and publishes the point cloud data after the segmentation
process.
A voxelizer1 ROS node converts PCD into voxel data needed by the CNN Voxnet
1
https://github.com/dbworth/pcl binvox
4.5 VoxNet-based object-recognition pipeline 53
Figure 4.8: VoxNet based object-recognition pipeline
algorithm, the voxel data is a 32 × 32 × 32 matrix with 0 or 1 as element. The
voxelizer ROS node, written in C++, combines the voxel data and the PCD it receives
into one ROS message and publishes it to a ROS topic.
The CNN ROS node, written in Python, reads voxel data to predict the object type,
packages the predicted object type and the PCD it received into one ROS message,
and publishes the message to a ROS topic. The CNN architecture is described in
Section 2.1.3.
The Alignment Computation ROS node subscribes to the object type and the PCD
as inputs and uses the ICP algorithm to calculate the object pick pose.
4.5.2 Evaluation
These experiments were designed to specifically test the scalability of the VoxNet CNN
algorithm, rather than evaluate the whole VoxNet based object-recognition pipeline.
Experimental Setup A single host was used in the experiment. This host had a single
4-core (8-thread) CPU (Intel Core i7-4710MQ@2.50GHz), 12 GB RAM and one Nvidia
GeForce GTX 850M graphic card with 2GB GPU memory. The OS used was Ubuntu 16.04.5
LTS with kernel version 4.13.0-29-generic. The CUDA compiler version used was 8.0.61. A
command-line toolset pcl binvox2 was used to convert the PCD to binvox data for CNN
computation. The VoxNet CNN implementation and its dependent packages, Theano3 ,
Lasagne4 , scikit-learn5 and libgpuarray6 were downloaded, installed and configured.

2
https://github.com/dbworth/pcl binvox
3
http://deeplearning.net/software/theano/
4
https://lasagne.readthedocs.io/en/latest/index.html
5
http://scikit-learn.org/stable/
6
http://deeplearning.net/software/libgpuarray/index.html
Experimental Execution We ran two sets of experiments: Set1 with 5 object types,
and Set2 with 13 object types. In each set, we ran multiple experiments by varying the
number of PCDs used per object type for CNN model training. We then measured the
training time required for each setting of the number of PCDs per object-type. We also
measured the precision score achieved by the VoxNet CNN algorithm when executed on the
test PCD files.
For the experiments, we required a dataset of PCD files. We generated this dataset
using the Gilbreth application. Specifically, for each object type, 300 objects were randomly
spawned on to the conveyor belt. The Gazebo-simulated Kinect Camera captured PCDs
of these 300 objects for each object type. These PCDs were then processed by the Object
Segmentation ROS node. The output filtered PCDs were saved to disk. The PCD files for
each object type were assigned unique sequential identifiers (e.g., ID 1, ID 2, up to ID 300).
Tables 4.7 and 4.8 show the parameters used in the two experiment sets. Since the
highest number of training PCDs per object type in both experiment sets was 200, the first
200 PCDs for each object type was set aside for training, and the test data PCDs were drawn
from the last 100 PCDs (i.e., PCDs with IDs 201 to 300). When the number of training
PCD files per object type was less than 200, not all the PCDs were used. For example, when
this number was 25, then only PCDs with IDs from 1 to 25 for each object type were used
for CNN training, and PCDs with IDs from 26 to 200 remained unused in that experiment.
Parameter Value
Number of object types 5
Number of PCDs per object type in train- 25, 50, 100, 150, 200
ing data set
Number of PCDs per object type in test- 100
ing data set
Table 4.7: Experiment Set1 parameters
Experimental Results Fig. 4.9 shows the Set1 experimental results. The time required
to train the CNN model increases linearly with the number of training PCDs per object type.
Before we ran these experiments, we hypothesized that in order to have a high recognition
success rate, we would require 100 PCD files per object type. However, the experimental
4.5 VoxNet-based object-recognition pipeline 55
Parameter Value
Number of object types 13
Number of PCDs per object type in train- 20, 25, 30, 35, 50, 100, 150, 200
ing data set
Number of PCDs per object type in test- 100
ing data set
Table 4.8: Experiment Set2 parameters
Figure 4.9: Experiment Set1 results: 5 object types
results show that with 5 object types, a set of 25 PCDs per object type was sufficient
to achieve 99.8% recognition success rate, and with 50 PCDs per object, the success rate
reached 100%.
Fig. 4.10 shows that when the number of object types was increased, the number of
training PCDs per object type required to ensure high recognition-success rates correspond-
ingly increased. For example, when the number of object types was increased from 5 to
13, with 25 PCDs per object type, the recognition-success rate fell from 99.8% to 98%. To
achieve 99.8% with 13 object types, we found that somewhere between 150 to 200 PCDs
per object type were required. The training time correspondingly increases significantly.
For example, with 5 object types, training time with 25 PCDs per object type (with which
99.8% recognition-success rate was achieved) was 0.2 hours (12 mins). In comparison, with
13 object types, training time with 150 PCDs and 200 PCDs was 3.27 and 3.96 hours,
respectively. In other words, to achieve the same 99.8% recognition-success rate with 13
object types, the time required for training was more than 3 hours.
Our UTD collaborator compared the object-recognition times with CG vs. with VoxNet.
Figure 4.10: Experiment Set2 results: 13 object types
The average time across five object types was 2.428 sec with CG but only 0.231 sec with
VoxNet.
Our conclusion is while CNN-based object-recognition saves processing time within the
run-time operation of the Gilbreth application when compared the CG algorithm, the cost
of CNN-based object-recognition is that it requires significant compute cycles for training.
Given that this training can be done offline, the extensive resources of cloud computing can
be leveraged.
4.6 Conclusions
The main conclusions drawn from the work presented in this chapter are as follows. First, we
found that the improvements made to the object-recognition pipeline allowed for a significant
decrease (reaching as high as 62%) in processing time for some objects with negligible impact
on object-recognition success rate for some objects. Second, when the CG algorithm used in
object recognition was replaced by a Convolution Neural Network based algorithm called
VoxNet, the object recognition time was reduced even further. But the VoxNet solution has
a cost, which is high training time, on the order of hours. Third, there were multiple points
in the Gilbreth processing when failures can occur. Overall success rate of pick-and-sort
operation was only 71.3%, which was achieved with the maximum value of mean inter-object
spawning period that we tested, i.e., 14 s. Even though this time is longer than the average
time required for the robot arm to service an object, which was 13 s, there was a 9.95%
4.6 Conclusions 57
excess-load failure rate. The remainder, approximately 19%, was due to motion failure and
grasping failures Therefore, we conclude that each component of the Gilbreth code has room
for improvement to decrease failure rate.

Chapter 5
Conclusions and Future Work
5.1 Conclusions
In this thesis, we presented a flexible-manufacturing pick-and-sort industrial robotics ap-
plication that we named Gilbreth. The factory environment for which this application is
developed has a conveyor belt on which industrial parts are moved to the workcell of a UR10
robot arm (with seven degrees of freedom) for sorting. Typically industrial robots handle
high volumes of a low-mix set of objects; in contrast, Gilbreth enables robots to handle
low-volume high-mix sets of industrial parts.
To support Gilbreth, two sensors are used: a break-beam sensor to sense an object
arriving on the conveyor belt, and a 3D Kinect camera to capture RGB-D images of the
objects. Object recognition is used to identify the object type, and pose of the arriving
object. Motion planning algorithms are used to generate trajectories to move the robot arm
between five poses: home, pick approach, pick, pick retreat, and place. The robot arm waits
at the home pose, picks up the object using a vacuum gripper (end effector) that is attached
to the end of the robot arm, moves on a linear actuator rail to the bin associated with the
identifed object type, places the object in the bin, and returns to home base.
Not having access to a real factory environment, we used Gazebo, a robotics simulation
package. We then implemented a Gilbreth module within Gazebo to simulate our particular
factory environment, along with a set of ROS/ROS-I nodes to perform object segmentation,
object recognition, motion planning, and robot execution, among other functions. The ROS
58
5.1 Conclusions 59
and ROS-I frameworks have created a reusable base of software components. This allowed us
to develop the Gilbreth integrated application faster than would have been possible without
these frameworks. Also, importantly, ROS/ROS-I allows for distributed implementations of
applications, which in turn makes it easy to leverage cloud computing.
Experiments were then conducted to evaluate Gilbreth, and measurements were obtained.
With the original prototoype, object recognition and robot execution consumed the most
amount of time. Since robot execution is limited by the mechanical constraints of how fast
the robot arm joints can be moved, we improved the object recognition code in multiple
ways: (i) changing configuration parameters; (ii) creating a pipeline in which downsampling
was used to decrease processing time but without compromising object recognition time;
and (iii) replacing the Correspondence Grouping algorithm with a new Convolution Neural
Network (CNN) based algorithm called VoxNet (this requires GPU machines). We also
improved the motion-planning pipeline.
Our first conclusion from our evaluation effort is that cloud computing can be leveraged
in many ways by an application such as Gilbreth. Instead of requiring every factory to install
its own datacenter, high-speed networks can be used to move point cloud data from factory
floors where sensors such as 3D Kinect cameras are mounted to cloud data centers to run
object-recognition, motion planning and some of the other ROS-I nodes. In an experimental
evaluation of the object-recognition pipeline, we found a reduction of processing times by
using a 2-server testbed instead of a single server, demonstrating the value of our distributed
ROS-I implementation. When the CG algorithm used in object recognition was replaced by
the CNN VoxNet algorithm, the object recognition time was reduced significantly. But the
VoxNet solution has a cost, which is high training time, that was measured to be on the
order of hours. Given that this training can be done offline, the extensive resources of cloud
computing can be leveraged.
Our second conclusion is that given mechanical constraints, robot-arm joints can only
be moved at a certain rate. Using cloud computing resources, the processing times of even
complex operations such as object recognition and motion planning can be reduced to small
fractions of the robot trajectory-execution times. Therefore, to improve overall productivity,
e.g., the rate at which objects are picked up and sorted (which can be done by reducing
Chapter 5 Conclusions and Future Work 60
inter-object arrival times and/or increasing the speed of the conveyor belt), a multi-robot
platform is required.
Our third conclusion is that the overall pick-and-sort success rate can be improved with
additional enhancements to the vacuum-gripper grasping process and the motion-planning
process.
5.2 Future work
Future work items include the following: (i) designing a more flexible pick pose estimation
for finer adjustments in the grasping process, (ii) developing a more efficient motion planning
algorithm, (iii) designing and implementing a multi-robot pick-and-sort application, and (iv)
further testing of the VoxNet object recognition process.
Lack of flexibility is a weakness in the current pick-pose alignment process, which leads
to lowered precision in the grasping process. In our Gilbreth application, we define one pick
pose for each object type and save the pick pose in a data set. For each identified object, we
use the ICP algorithm to match the saved pick pose with the pose of the arriving object,
and compute the output pick pose. For more agile robotics applications, autonomous pick
pose estimation is better than our current approach. Makhal et al. proposed a real-time
algorithm to compute the grasping pose for unknown-object by super-quadric representations
based on point cloud data [35]. This algorithm could be a good approach because: (i) it is
suitable for single-view objects, (ii) it is a real-time approach, and (iii) it has a good success
rate for simple-geometry objects. Fine control in the grasping pipeline is required in order
to reach a pick pose with high accuracy. A majority of the grasping failures were caused
by inaccurate robot-arm control, which resulted in the actual ending pose being slightly
deviated from the target pose. The deviation causes the vacuum gripper to miss making
contact with the object. A fine-control adjustment in the pipeline can solve this problem
and increase the grasping success rate of Gilbreth.
To decrease the motion-planning failure rate, improvements are required for more efficient
motion planning. In our implementation, when the robot arm reach certain poses, the
motion-planning algorithm has a higher probability of generating successive incorrect (weird)

5.2 Future work 61
trajectories, which results in a sequence of motion-planning failures. Manually configuring
the planning algorithm to avoid certain poses is one choice; however, this approach will likely
not work in a scaled-up factory environment. State-of-the-art motion planning algorithms
should be studied to determine a better approach.
To increase pick-and-sort rates in a factory setting, a multi-robot system is required.
Multi-robot systems require more computational resources and network bandwidth, which
makes such an application a good candidate for an industrial cloud robotics study. First-in-
first-out (FIFO) and shortest-sorting-time (SST) methods have been studied for multi-robot
coordination. Bozma et al. [22] proposed a multi-robot coordination algorithm for a conveyor-
belt pick-and-place task based on non-cooperative game theory. Yu et al. [36] improved
the SST rule for multi-robot coordination and presented a second-shortest-sorting-time
(SSST) rule, which showed optimal pick-and-sort results in conveyor-belt settings. They
also pointed out that in a two-robot environment, arranging two robot arms on different
sides of the conveyor belt leads to a higher pick-and-sort rate than mounting two robot arms
side-by-side.
Finally, our CNN based VoxNet models need further testing, and an online training and
testing pipeline needs to be implemented. Experiments showed that as the number of object
types increases, the number of point cloud images per object type required to train the CNN
model increases. If we place the data storage and run training computations on the cloud,
we need to design a pipeline to automatically adjust the size of the training data sets and
update the object-recognition parameters.

Bibliography
[1] Ioan A. Sucan and Sachin Chitta. “MoveIt!”. [Online]Available:https://moveit.

ros.org.
[2] D. Maturana and S. Scherer. VoxNet: A 3d convolutional neural network for real-time
object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), pages 922–928, Sept 2015.
[3] NIST. Agile robotics for industrial automation competition, September

2016. https://www.nist.gov/el/intelligent-systems-division-73500/
agile-robotics-industrial-automation-competition.
[4] ROS-I. The challenge: Transitioning robotics R&D to the factory floor, May 2017.
[Online]Available:https://rosindustrial.org/the-challenge/.
[5] Robohub. Amazon picking challenge, August 2017. [Online]Available:https:

//robohub.org/tag/amazon-picking-challenge/.
[6] D. Buchholz, S. Winkelbach, and F. M. Wahl. RANSAM for industrial bin-picking.

In ISR 2010 (41st International Symposium on Robotics) and ROBOTIK 2010 (6th
German Conference on Robotics), pages 1–6, June 2010.
[7] Morgan Quigley, Ken Conley, Brian P. Gerkey, Josh Faust, Tully Foote, Jeremy
Leibs, Rob Wheeler, and Andrew Y. Ng. ROS: an open-source robot operating system.
In ICRA Workshop on Open Source Software, 2009.
[8] Ioan A. Şucan, Mark Moll, and Lydia E. Kavraki. The Open Motion Planning Library.
IEEE Robotics & Automation Magazine, 19(4):72–82, December 2012. http://ompl.
kavrakilab.org.
[9] J. J. Kuffner and S. M. LaValle. Rrt-connect: An efficient approach to single-

query path planning. In Proceedings 2000 ICRA. Millennium Conference. IEEE
International Conference on Robotics and Automation. Symposia Proceedings (Cat.
No.00CH37065), volume 2, pages 995–1001 vol.2, April 2000.
[10] M. Moll, I. A. Sucan, and L. E. Kavraki. Benchmarking motion planning algorithms:

An extensible infrastructure for analysis and visualization. IEEE Robotics Automation
Magazine, 22(3):96–102, Sept 2015.
[11] MoveIt! Benchmarking: Modern tools for motion planning, 2013. [Online]
Available:http://moveit.ros.org/assets/pdfs/2013/icra2013tutorial/
ICRA13_Benchmark.pdf.
62
Bibliography 63
[12] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng,
and Trevor Darrell. Decaf: A deep convolutional activation feature for generic visual
recognition. In International conference on machine learning, pages 647–655, 2014.
[13] Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. Learning and transferring
mid-level image representations using convolutional neural networks. In Proceedings
of the IEEE conference on computer vision and pattern recognition, pages 1717–1724,
2014.
[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with
deep convolutional neural networks. In Advances in neural information processing
systems, pages 1097–1105, 2012.
[15] Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. CNN
features off-the-shelf: an astounding baseline for recognition. CoRR, abs/1403.6382,
2014.
[16] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and
Yann LeCun. Overfeat: Integrated recognition, localization and detection using
convolutional networks. CoRR, abs/1312.6229, 2013.
[17] M. Schwarz, H. Schulz, and S. Behnke. Rgb-d object recognition and pose estimation
based on pre-trained convolutional neural network features. In 2015 IEEE Inter-
national Conference on Robotics and Automation (ICRA), pages 1329–1335, May
2015.
[18] Sebastian Thrun. Learning occupancy grid maps with forward sensor models. Au-
tonomous robots, 15(2):111–127, 2003.
[19] Hans Moravec and A. E. Elfes. High resolution maps from wide angle sonar. In
Proceedings of the 1985 IEEE International Conference on Robotics and Automation,
pages 116 – 121, March 1985.
[20] A. Cowley, B. Cohen, W. Marshall, C. J. Taylor, and M. Likhachev. Perception

and motion planning for pick-and-place of dynamic objects. In 2013 IEEE/RSJ
International Conference on Intelligent Robots and Systems, pages 816–823, Nov
2013.
[21] Yanjiang Huang, Ryosuke Chiba, Tamio Arai, Tsuyoshi Ueyama, and Jun Ota. Robust
multi-robot coordination in pick-and-place tasks based on part-dispatching rules.
Robotics and Autonomous Systems, 64(Supplement C):70 – 83, 2015.
[22] H. Il Bozma and M.E. Kalalolu. Multirobot coordination in pick-and-place tasks on a

moving conveyor. Robotics and Computer-Integrated Manufacturing, 28(4):530 – 538,
2012.
[23] E. Guizzo. Robots with their heads in the clouds. IEEE Spectrum, 48(3):16–18, March
2011.
[24] D. Lorencik and P. Sincak. Cloud robotics: Current trends and possible use as a
service. In 2013 IEEE 11th International Symposium on Applied Machine Intelligence
and Informatics (SAMI), pages 85–88, Jan 2013.
Bibliography 64
[25] G. Hu, W. P. Tay, and Y. Wen. Cloud robotics: architecture, challenges and applica-
tions. IEEE Network, 26(3):21–28, May 2012.
[26] Siavash Rastkar, Diego Quintero, Diego Bolivar, and Sabri Tosunoglu. Empowering
robots via cloud robotics: Image processing and decision making boebots. In Pro-
ceedings of the Florida Conference on Recent Advances in Robotics, Boca Raton, FL,
USA, pages 10–11, 2012.
[27] Vijay Kumar and Nathan Michael. Opportunities and challenges with autonomous
micro aerial vehicles. The International Journal of Robotics Research, 31(11):1279–
1291, 2012.
[28] B. Kehoe, S. Patil, P. Abbeel, and K. Goldberg. A survey of research on cloud

robotics and automation. IEEE Transactions on Automation Science and Engineering,
12(2):398–409, April 2015.
[29] B. Kehoe, A. Matsukawa, S. Candido, J. Kuffner, and K. Goldberg. Cloud-based

robot grasping with the google object recognition engine. In 2013 IEEE International
Conference on Robotics and Automation, pages 4263–4270, May 2013.
[30] Raffaello D’Andrea. Guest editorial: A revolution in the warehouse: A retrospective

on kiva systems and the grand challenges ahead. IEEE Transactions on Automation
Science and Engineering, 9(4):638–639, 2012.
[31] R. Rahimi, C. Shao, M. Veeraraghavan, A. Fumagalli, J. Nicho, J. Meyer, S. Ed-

wards, C. Flannigan, and P. Evans. An industrial robotics application with cloud
computing and high-speed networking. In 2017 First IEEE International Conference
on Robotic Computing (IRC), pages 44–51, April 2017.
[32] F. Tombari and L. Di Stefano. Object Recognition in 3D Scenes with Occlusions and
Clutter by Hough Voting. In 2010 Fourth Pacific-Rim Symposium on Image and Video
Technology, pages 349–355, Nov 2010.
[33] Radu Bogdan Rusu and Steve Cousins. 3D is here: Point Cloud Library (PCL).
In IEEE International Conference on Robotics and Automation (ICRA), Shanghai,
China, May 9-13 2011.
[34] Mark Berman, Jeffrey S. Chase, Lawrence Landweber, Akihiro Nakao, Max Ott,
Dipankar Raychaudhuri, Robert Ricci, and Ivan Seskar. Geni: A federated testbed for
innovative network experiments. Computer Networks, 61:5 – 23, 2014. Special issue on
Future Internet Testbeds Part I.
[35] A. Makhal, F. Thomas, and A. P. Gracia. Grasping unknown objects in clutter by

superquadric representation. In 2018 Second IEEE International Conference on
Robotic Computing (IRC), pages 292–299, Jan 2018.
[36] C. Yu, X. Liu, F. Qiao, and F. Xie. Multi-robot coordination for high-speedpick-and-
place tasks. In 2017 IEEE International Conference on Robotics and Biomimetics
(ROBIO), pages 1743–1750, Dec 2017.

Conveyor Belt Based Pick and Sort Industrial Robotics Application

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conveyor Belt Based Pick and Sort Industrial Robotics Application

Uploaded by

Copyright:

Available Formats

Gilbreth: A Conveyor-Belt Based Pick-and-Sort Industrial

the Faculty of the School of Engineering and Applied Science

of the requirements for the Degree

Master of Science (Computer Engineering)

Malathi Veeraraghavan (Advisor) Joanne Bechta Dugan (Chair)

Accepted by the School of Engineering and Applied Science:

Craig H. Benson, Dean, School of Engineering and Applied Science

operations on small-volumes of mixed parts to complement traditional industrial robotics

according to type. The Gilbreth implementation leveraged a number of Robot Operating

incorrect trajectories. Therefore, improvements were made to reduce object recognition

to the motion-planning pipeline. Evaluation of the object recognition improved pipeline

demonstrates that it outperforms the original Correspondence Grouping (CG) method by

required to handle conveyor belts with faster arriving objects.

Finally, we undertook an experiment to evaluate the scalability of the CNN algorithm.

of CNN-based object-recognition is that it requires significant compute cycles for training.

been a guidance towards my life.

without their dedicated work and help.

This work is supported by NSF grant CNS-1531065, and CNS-1624676.

2 Background and Related Work 4

4 Gilbreth Prototype Improvements and Evaluation 39

4.4 Experiment on application performance . . . . . . . . . . . . . . . . . . . . 46

5 Conclusions and Future Work 58

3.1 Object segmentation ROS node configuration parameters . . . . . . . . . . 19

4.1 Experiment-1 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.1 An example ROS message . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.1 Experiment-1 object-recognition processing times with the improved pipeline 42

APC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amazon Picking Challenge

Traditional industrial object-handling robots are dedicated to performing high-volume

customized manufacturing. National Institute of Standard and Technology (NIST) proposed

could be more productive, when compared to traditional high-volume production processes,

be data-driven and to execute collective-learning processes that take advantage of cloud

infrastructures and high-speed networks.

motivated to address this demand by designing a flexible industrial robotics application.

algorithms, and targets a warehouse packing process. In contrast, we focused on a more

sorting the objects by placing different object types in different bins.

applied to picking a single object-type randomly positioned in a container. Some of the

This challenge provided us the motivation for creating Gilbreth: an application in

robot execution through robot controllers.

networks, and for leveraging computing cloud infrastructures.

1.3 Key contributions

of a Convolutional Neural Network (CNN) object recognition process.

1.4 Thesis organization

Gilbreth application, software architecture, message description and prototype of Gilbreth,

a conveyor-belt based pick-and-sort application. Experimental results are presented and

Background and Related Work

Section 2.2 reviews other related work.

recognition algorithm implemented in VoxNet [2], which is used in an improved version of

our Gilbreth application.

2.1.1 Robotic Operating System and Gazebo

rapidly by leveraging existing modules.

middleware provides four capabilities for inter-process communication between nodes2 :

Message passing via publishing and subscribing systems

Message recording and playback

Remote procedure calls (RPC) in request-response systems

Distributed parameter server.

A simplified message description language, which helps ROS to automatically generate

but also defines several new messages.

Figure 2.1: An example ROS message

Figure 2.2: An example ROS service description

semantics decompose the connection between message production and consumption. In

used TCP/IP-based ROS topics in Gilbreth for inter-node communications.

velocity was configured successfully.

The distributed parameter server is a shared dictionary to store parameters. The