You are on page 1of 75

Gilbreth: A Conveyor-Belt Based Pick-and-Sort Industrial

Robotics Application

A Thesis

Presented to

the Faculty of the School of Engineering and Applied Science

University of Virginia

In Partial Fulfillment

of the requirements for the Degree

Master of Science (Computer Engineering)

by

Yizhe Zhang

December 2018

c 2018 Yizhe Zhang
Approvals

This thesis is submitted in partial fulfillment of the requirements for the degree of

Master of Science

Computer Engineering

Yizhe Zhang

Approved:

Malathi Veeraraghavan (Advisor) Joanne Bechta Dugan (Chair)

Jack W. Davidson

Accepted by the School of Engineering and Applied Science:

Craig H. Benson, Dean, School of Engineering and Applied Science

October 2018
Abstract

There is growing interest in creating agile industrial robotics applications for autonomous

operations on small-volumes of mixed parts to complement traditional industrial robotics

that handle large-volume, single-part operations. Cloud robotics, which leverages cloud

computing, cloud storage and high-speed networks (between factory floors and data centers),

is seen as a technological approach to help build such agile industrial robotics applications.

This thesis describes an agile industrial robotics application, named Gilbreth, for picking

up objects of different types from a moving conveyor belt and sorting the object into bins

according to type. The Gilbreth implementation leveraged a number of Robot Operating

System (ROS) and ROS-Industrial (ROS-I) packages. Gazebo, a robotics simulation package,

is used to simulate a factory environment that consists of a moving conveyor belt, a break

beam sensor, a 3D Kinect camera, a UR10 industrial robot arm mounted on a linear actuator

with a vacuum gripper, and different types of object such as pulleys, disks, gears and piston

rods.

Experimental studies were undertaken to measure the CPU usage and processing time

of different ROS nodes. These experiments found that object recognition time and robot

execution time were similar in magnitude, and that motion planning sometimes yielded

incorrect trajectories. Therefore, improvements were made to reduce object recognition

time, using a Convolutional Neural Network (CNN) method and with a new pipeline, and

to the motion-planning pipeline. Evaluation of the object recognition improved pipeline

demonstrates that it outperforms the original Correspondence Grouping (CG) method by

reducing execution time even while achieving the same success rate. Experiments were

conducted to evaluate the pick-and-sort success rate of the Gilbreth application after the

d
Abstract e

improved pipelines were incorporated. Specifically, we found that objects should be spaced

at least 14 sec apart from each other on the conveyor belt. Multiple robot workcells are

required to handle conveyor belts with faster arriving objects.

Finally, we undertook an experiment to evaluate the scalability of the CNN algorithm.

Our conclusion is while CNN-based object-recognition saves processing time within the

run-time operation of the Gilbreth application when compared the CG algorithm, the cost

of CNN-based object-recognition is that it requires significant compute cycles for training.

Given that this training can be done offline, the extensive resources of cloud computing can

be leveraged.
Acknowledgements

I would like to take the opportunity to show my gratitude to many people. This thesis

would have not been possible without their support. I would like to thank my advisor

Professor Malathi Veeraraghavan for her expert advice and support in this project. She

not only taught me the knowledge in network area, but also the way to work and conduct

research. Her attitude towards the work is more than respectable for me and has always

been a guidance towards my life.

I would like to thank my collaborators from the University of Texas at Dallas (UTD) and

Southwest Research Institute (SwRI): Lianjun Li (UTD), Professor Andrea Fumagalli (UTD),

Michael Ripperger (SwRI) and Jorge Nicho (SwRI). This work can not be accomplished

without their dedicated work and help.

I would like to thank my committee members, Professor Joanne Bechta Dugan and

Professor Jack W. Davidson for taking time to review my thesis and provide great suggestions.

I would like to thank my parents and my friends for supporting throughout my graduation

years. Thanks to my colleagues in High-speed Networks research group for their friendship

in the lab.

This work is supported by NSF grant CNS-1531065, and CNS-1624676.

f
Contents

Abstract d

Acknowledgements f

Contents g
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . j

1 Introduction 1
1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Key contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background and Related Work 4


2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Robotic Operating System and Gazebo . . . . . . . . . . . . . . . . 4
2.1.2 MoveIt! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Convolution Neural Network object recognition . . . . . . . . . . . . 9
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Gilbreth Application 14
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Gilbreth sorting application . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Gilbreth software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Gilbreth software prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Gilbreth software prototype evaluation . . . . . . . . . . . . . . . . . . . . . 26
3.6 Gilbreth message description and experiments . . . . . . . . . . . . . . . . . 29
3.6.1 Gilbreth message description . . . . . . . . . . . . . . . . . . . . . . 30
3.6.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Gilbreth Prototype Improvements and Evaluation 39


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Object recognition improvements . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Experiments on fine-tuned object recognition process . . . . . . . . . 40
4.2.2 Improved object recognition pipeline characterization . . . . . . . . 43
4.3 Motion planner improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 44

g
Contents h

4.4 Experiment on application performance . . . . . . . . . . . . . . . . . . . . 46


4.5 VoxNet-based object-recognition pipeline . . . . . . . . . . . . . . . . . . . 52
4.5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Conclusions and Future Work 58


5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Bibliography 62
List of Tables

3.1 Object segmentation ROS node configuration parameters . . . . . . . . . . 19


3.2 Values of experiment parameters . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 ROS-node classification based on CPU usage . . . . . . . . . . . . . . . . . 28
3.4 Message size of triggered ROS message . . . . . . . . . . . . . . . . . . . . . 36
3.5 Message size of periodic ROS message . . . . . . . . . . . . . . . . . . . . . 37

4.1 Experiment-1 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


4.2 Experiment-1 results on the improved object-recognition process . . . . . . 42
4.3 Experiment-2 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Improved motion planning time threshold . . . . . . . . . . . . . . . . . . . 46
4.5 Values of experimental parameters for evaluating application performance . 48
4.6 Inter-object spawning period parameters . . . . . . . . . . . . . . . . . . . . 48
4.7 Experiment Set1 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.8 Experiment Set2 parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

i
List of Figures

2.1 An example ROS message . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6


2.2 An example ROS service description . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Move group architecture [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Planning scene pipeline [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 VoxNet architecture [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Gilbreth setup: sensors, objects, conveyor belt and UR10 robot arm . . . . 15
3.2 Gilbreth software: Per-object workflow starts at Object-Arrival Detection . 17
3.3 Object segmentation and object recognition . . . . . . . . . . . . . . . . . . 23
3.4 Gilbreth object picking policies . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Motion planner pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 CPU usage and processing time for one ROS node . . . . . . . . . . . . . . 28
3.7 Compare object recognition time with physical robot arm movement time . 29
3.8 Gilbreth message communication flow; ROS topics: blue; ROS services: red 30
3.9 Depth images of 5 types of objects . . . . . . . . . . . . . . . . . . . . . . . 32
3.10 Depth images of 5 types of objects after object segmentation . . . . . . . . 32
3.11 Gilbreth ROS messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1 Experiment-1 object-recognition processing times with the improved pipeline 42


4.2 Experiment-2 results on the improved object-recognition pipeline . . . . . . 43
4.3 Improved object recognition pipeline . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Improved motion planning pipeline . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Inter-object spawning time distribution . . . . . . . . . . . . . . . . . . . . 49
4.6 Pick-and-sort success rates for different inter-object spawning periods . . . 49
4.7 Picking evaluation of the Gilbreth application . . . . . . . . . . . . . . . . . . 51
4.8 VoxNet based object-recognition pipeline . . . . . . . . . . . . . . . . . . . 53
4.9 Experiment Set1 results: 5 object types . . . . . . . . . . . . . . . . . . . . 55
4.10 Experiment Set2 results: 13 object types . . . . . . . . . . . . . . . . . . . . 56

j
List of Abbreviations

APC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amazon Picking Challenge


API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Programming Interface
CAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computer-Aided Design
CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correspondence Grouping
CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convolutional Neural Network
DOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Degree of Freedom
FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First-in-first-out
GENI . . . . . . . . . . . . . . . . . . . . . . . . . . . . Global Environment for Network Innovations
GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphics Processing Unit
GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphical User Interface
M2C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Machine-to-Cloud
M2M . . . . . . . . . . . . . . . . . . . . . . . . . . . . Machine-to-Machine
MTU . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum Transmission Unit
NIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . National Institute of Standard and Technology
OGRE . . . . . . . . . . . . . . . . . . . . . . . . . . . Object-Oriented Graphics Rendering Engine
OMPL . . . . . . . . . . . . . . . . . . . . . . . . . . . Open Motion Planning Library
PCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Printed Circuit Board
PCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Point Cloud Data
RGBD . . . . . . . . . . . . . . . . . . . . . . . . . . . Red Green Blue Depth
ROS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robot Operating System
ROS-I . . . . . . . . . . . . . . . . . . . . . . . . . . . . ROS-Industrial
RRTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rapid-exploring Random Trees
SRDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . Semantic Robot Description Format
SSST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second-shortest-sorting-time
SST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shortest-sorting-time
SwRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . Southwest Research Institute
UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User Datagram Protocol
URDF . . . . . . . . . . . . . . . . . . . . . . . . . . . Unified Robot Description Format
UTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The University of Texas at Dallas
WAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wide Area Network
XMLRPC . . . . . . . . . . . . . . . . . . . . . . . Extensible Markup Language Remote Procedure

Call
Chapter 1

Introduction

Traditional industrial object-handling robots are dedicated to performing high-volume

production processes, such as retrieving the same part repeatedly from a known position

and orientation, and performing some action on the part. This approach is ideal for large-

scale manufacturing operations. However, these methods are unsuitable for small-scale and

customized manufacturing. National Institute of Standard and Technology (NIST) proposed

a two-year competition to test the applicability of agile industrial robotic systems [3] for

small-scale customized manufacturing. Such agile systems need to be more autonomous and

could be more productive, when compared to traditional high-volume production processes,

and hence could benefit small-scale manufacturers. An agile robotic system would need to

be data-driven and to execute collective-learning processes that take advantage of cloud

infrastructures and high-speed networks.

1.1 Objective

The objective of this work was to develop an agile industrial robotic system that simulates

a real day-to-day factory manufacturing process. Specifically, our goal was to develop

an application that leverages cloud computing, cloud storage, and high-speed network

technologies.

1
Chapter 1 Introduction 2

1.2 Motivation

Industrial robotics have not evolved much to execute new tasks over the last 20 years [4].

Thus, the demand of agile and versatile autonomous robotic systems is increasing. We were

motivated to address this demand by designing a flexible industrial robotics application.

The Amazon Picking Challenge (APC) [5] focused on pick-and-stow operations wherein a

robot recognizes target items, picks items from shelves and places them in shipping boxes.

The APC challenge combined state-of-the-art object recognition, grasp and path planning

algorithms, and targets a warehouse packing process. In contrast, we focused on a more

typical manufacturing process: picking objects from a moving factory conveyor belt, and

sorting the objects by placing different object types in different bins.

Industrial random bin-picking methods have been proposed in the past [6]. The state-of-

the-art bin-picking algorithms have focused on the use of sensing and grasping technologies

applied to picking a single object-type randomly positioned in a container. Some of the

major industrial robot manufacturers, such as ABB, Adept, Fanuc, Motoman and Kuka,

offer some form of random bin picking, either directly or through integrator partners. To the

best of our knowledge, implementations from the robot manufacturers are currently limited

to operating with only one object type with easily distinguishable features, and the objects

are randomly placed in a bin or traveling on a conveyor belt. The limitation to a single,

easily distinguishable object type has prevented widespread adoption of the technology.

This challenge provided us the motivation for creating Gilbreth: an application in

which objects of different types randomly arriving on a moving conveyor belt are first

automatically identified, then picked up by an industrial UR10 robot arm, and finally

sorted into predisposed bins based on the object type. This application combines a number

of tasks that include object detection, object segmentation and recognition, picking pose

alignment and robot poses planning, robot motion planning to generate trajectories, and

robot execution through robot controllers.

The architecture of the system is designed for distributed execution across high-speed

networks, and for leveraging computing cloud infrastructures.


1.3 Key contributions 3

1.3 Key contributions

Our key contributions are as follows: (i) Gilbreth: a new mixed-parts sorting application in

support of industrial manufacturing; (ii) open-source software for the Gilbreth application,

which could be useful in other future conveyor-belt based applications; (iii) performance

characterization on two distinct test-beds to identify the ROS nodes that require paral-

lelization of their execution in order to achieve high pick-and-sort throughput; (iv) new

improved pipelines for object recognition and motion planning, and (v) an implementation

of a Convolutional Neural Network (CNN) object recognition process.

1.4 Thesis organization

Besides this Introduction, this thesis has four chapters. Chapter 2 provides the reader

requisite background information, and reviews related work. Chapter 3 describes the

Gilbreth application, software architecture, message description and prototype of Gilbreth,

a conveyor-belt based pick-and-sort application. Experimental results are presented and

described in this section to evaluate the CPU usage and processing times for various ROS

nodes. Chapter 4 describes improved pipelines for Gilbreth object recognition and motion

plan computation. Further studies were conducted to evaluate the performance of the

improved pipelines, both independently, and after integration into the Gilbreth application.

In addition, the CNN object recognition implementation was evaluated, and found to provide

improved object recognition performance with faster execution times than the original object

recognition method. Chapter 5 lists the conclusions drawn from this project, and proposes

future-work items.
Chapter 2

Background and Related Work

Section 2.1 provides background information relevant for our Gilbreth application, and

Section 2.2 reviews other related work.

2.1 Background

Section 2.1.1 provides an overview of Robotic Operating System (ROS) [7] and Gazebo, a

robotic simulation software package. MoveIt! [1], a state-of-the-art software package used

for motion planning is described in Section 2.1.2. Section 2.1.3 describes the CNN object

recognition algorithm implemented in VoxNet [2], which is used in an improved version of

our Gilbreth application.

2.1.1 Robotic Operating System and Gazebo

ROS is a flexible collaborative framework for robotic software development, and consists of

a large collection of tools, libraries and templates. Generally speaking, creating a robust

and general-purpose robotic application from scratch is challenging. The ROS community

provides a platform where experts all over the world can collaborate together. Using ROS as

the programming platform, research institutes, laboratories and individuals can contribute

their algorithms - also known as ROS packages - to the community and build software

rapidly by leveraging existing modules.

4
2.1 Background 5

Our Gilbreth application leverages many existing ROS and ROS-Industrial (ROS-I)1

packages. The latter are designed specifically for industrial robotic applications. Supported

by the robotics industry and numerous research institutes, ROS-I takes advantage of the

ROS software, but extends its capabilities to industrial manufacturing [4]. ROS-I seeks to

advance agile industrial robotics to accomplish a wide variety of automation tasks. ROS-I

contains a large number of sensor plugins, robot controller packages (ROS defines a standard

format to describe robots, based on which a number of contributors have developed a wide

range of libraries for programming specific robot types), and planning algorithms. Our

Gilbreth application was developed in a short duration by leveraging these ROS and ROS-I

packages.

In addition to the functional modules, ROS offers abundant toolsets for debugging,

plotting and visualization. A process performing computation is called a ROS node. ROS

middleware provides four capabilities for inter-process communication between nodes2 :

Message passing via publishing and subscribing systems

Message recording and playback

Remote procedure calls (RPC) in request-response systems

Distributed parameter server.

A simplified message description language, which helps ROS to automatically generate

source code for different languages, is used to define ROS messages3 . A ROS message is a list

of data-field descriptions as well as constant definitions. Fig. 2.1 illustrates a typical ROS

message data field description; constant definitions are not used in our Gilbreth application.

The field type listed in the left column is a built-in or self-defined description. The field

name, delimited by a space following the field type, describes the name of the data structure.

The description of a data field is not required, but can be included after a comment sign

(#) as shown in Fig. 2.1. The Gilbreth application leverages many built-in ROS messages,

but also defines several new messages.


1
https://rosindustrial.org/
2
http://www.ros.org/core-components/
3
http://wiki.ros.org/msg
Chapter 2 Background and Related Work 6

Figure 2.1: An example ROS message

Figure 2.2: An example ROS service description

ROS nodes exchange messages through ROS topics4 . A ROS node that generates and

publishes information to a specific ROS topic is called a publisher. Any ROS node that

is interested in the published data of a ROS topic can subscribe to the topic and obtain

information. These ROS nodes are called subscribers. Anonymous publishing and subscribing

semantics decompose the connection between message production and consumption. In

this mode of communications, a publishing ROS node does not know the identities of the

subscribers to its topics. Multiple publishers and subscribers can communicate through a

single topic. ROS messages use both TCP/IP and UDP/IP protocols. For reliability, we

used TCP/IP-based ROS topics in Gilbreth for inter-node communications.

Remote procedure calls (RPC) that provide request and reply interactions are called

ROS services. Some inter-node communications in our Gilbreth application use ROS services.

The RPC interactions in ROS are defined by a pair of ROS messages5 . A provider ROS

node registers the service under a namespace (a directory of names), and a client calls the

service by sending a request message and waiting for a reply. Fig. 2.2 illustrates a ROS

service data structure in the Gilbreth application. The client sends a conveyor belt velocity

to the service and waits for the service to reply. The reply indicates whether or not the

velocity was configured successfully.

The distributed parameter server is a shared dictionary to store parameters. The

parameters stored in the server are static and non-binary, and can be retrieved globally by

all ROS nodes. The parameter server, which runs inside the ROS Master, provides naming
4
http://wiki.ros.org/Topics
5
http://wiki.ros.org/Services
2.1 Background 7

and registration services to the rest of the nodes. The parameter server is implemented using

Extensible Markup Language Remote Procedure Calls (XMLRPC), and can be accessed

via network Application Programming Interfaces (APIs) globally by all ROS nodes. The

Gilbreth application uses the distributed parameter server to store and modify configuration

data.

Gazebo, a robotics simulator, was used to simulate and test the factory environment,

which consists of a Universal Robot UR10 robot6 and multiple sensors. Open Dynamic

Engine (ODE)7 is used in Gazebo for dynamics simulation. High-quality rendering is

supported by the Object-Oriented Graphics Rendering Engine (OGRE)8 . A wide range of

sensors and robotic models are included in the Gazebo library. Various command line tools

are available to users, and customized plugins for robot, sensor and environment control can

be programmed in C++ and integrated into Gazebo.

2.1.2 MoveIt!

MoveIt! [1] is a ROS package that implements state-of-the-art motion planning, manipulation

and control algorithms. The UR10 robot is supported by the MoveIt! package, and thus

we leverage this useful platform to implement our motion planning pipeline. Fig. 2.3 shows

the system architecture for the primary node move group. It combines controllers, sensors,

libraries and all the other components together to provide a set of ROS actions (red lines),

services (blue lines) and topics (green lines).

Normal users may connect to the actions and services through user interfaces written in

C++ and python. Rviz, a ROS Graphical User Interface (GUI) plugin, also has access to

the move group action library. ROS parameter server provides two types of information: (i)

Unified Robot Description Format (URDF)9 file, and Semantic Robot Description Format

(SRDF)10 file for robot simulation, and (ii) parameters for move group configuration, such as

joint limits and kinematics information. Topics and actions are used by the move group node

to communicate with the robot. Joint state information is published on the /joint state
6
https://github.com/ros-industrial/universal robot
7
https://www.ode.org/
8
https://www.ogre3d.org
9
http://wiki.ros.org/urdf
10
http://wiki.ros.org/srdf
Chapter 2 Background and Related Work 8

Figure 2.3: Move group architecture [1]

ROS topic and transform tree data is published on the /tf topic. The robot state publisher

publishes robot transform tree information. The state of the robot and its surrounding

environment is defined as a planning scene in a scene monitor. The scene is maintained

and updated by the monitor. The motion planning algorithm retrieves the information in

the planning scene to compute the trajectory. Fig. 2.4 illustrates how move group interacts

with the planning scene.

Motion planning plugins allows the user to take advantage of multiple open-source

planning libraries, the Open Motion Planning Library (OMPL) [8] for example. Our

application uses the Rapid-exploring Random Trees Connect (RRT-Connect) [9] approach

for motion planning. RRT-Connect was first purposed by Kuffner and LaValle in 2000.

Although it was first designed to move human arms, which is a 7-DOF (Degrees of Freedom)

motion planning problem, RRT-Connect has been successfully applied to other path-planning

problems. Our Gilbreth application uses a 7-DOF robot arm (specifically UR10) to conduct

pick-and-sort actions, and therefore motion planning for the UR10 is similar to a human-arm

motion-planning problem.

Benchmarks to test motion-planning algorithms also offer good reasons to use RRT-

Connect as our primary motion planner. Moll et al. [10] purposed a benchmark infrastructure

for motion planning algorithms, and stated that bidirectional planners (such as RRT-Connect)

tended to be faster than unidirectional planners (such as RRT). A tutorial in International


2.1 Background 9

Figure 2.4: Planning scene pipeline [1]

Conference on Robotics and Automation (ICRA) 2013 also showed that RRT-Connect has a

high solving-rate success with a relatively small computation time [11]. The result provides

solid evidence that RRT-Connect is a good approach; furthermore these experiments were

conducted with the MoveIt! ROS package.

2.1.3 Convolution Neural Network object recognition

Deep learning with Convolutional Neural Networks (CNN) is widely regarded as a powerful

tool for object detection, classification and recognition [12] [13]. Work of Krizhevsky et

al. [14] showed great success in using CNNs in the ImageNet Large Scale Visual Recognition

Challenge. Razavian [15] tested an existing CNN recognition framework, OverFeat [16], on

various datasets and different tasks, and concluded that deep learning with CNNs can be

considered as primary candidates in visual recognition tasks.

The above-mentioned papers studied two-dimensional (2D) object recognition and classi-

fication, while in real-world robotic applications, robust and accurate three-dimensional (3D)

object recognition is more crucial. RGB-D cameras are often used in robotic applications.

Schwarz et al. [17] proposed an algorithm to conduct object recognition and pose estimation

on RGB-D data with pre-trained CNN features. However, they did not use the original

RGB-D data captured by cameras; instead, they used the depth information to re-render

the RGB 2D image for computation. On the other hand, VoxNet [2], proposed by Maturana

and Scherer, resolved the problem by integrating a volumetric Occupancy Grid [18] [19] rep-
Chapter 2 Background and Related Work 10

Figure 2.5: VoxNet architecture [2]

resentation with a supervised 3D CNN algorithm. The VoxNet architecture is implemented

in our Gilbreth application.

Fig. 2.5 illustrates the VoxNet architecture. Two major tasks of the system are: (i) use

volumetric grid to represent the spatial geometry of the input object, and (ii) predict a

class label directly from the grid using 3D CNN. The four types of layers in VoxNet CNN

architecture works as follows:

The input layer converts point cloud data into occupancy grids. Occupancy grids

have two advantages for 3D CNN recognition: (i) these grids have simple and efficient

data structures, and (ii) these grids allow for efficient space estimation from range

measurements. To better utilize GPUs, an occupancy grid is represented by a small-

volume dense array (32 × 32 × 32) in VoxNet.

The convolutional layers C(f, d, s) convolve the input with f , the number of the output

feature maps, learned filters of shape d × d × d × f 0 and generates f feature maps,

where the spacial dimension of the input volume is d, and the number of the input
2.2 Related Work 11

feature maps is f 0 . The spatial stride parameter s controls how the filter convolves

around the input volume.

The pooling layer P (m) executes a downsampling process by replacing each m ×

m × m block with the maximum number of voxels inside the block, where m is the

downsampling factor.

The Fully Connected Layers F C(n) have n output neurons, each of which is a linear

combination of all the outputs from the previous layers.

VoxNet uses C(32, 5, 2) - C(32, 3, 1) - P (2) - F C(128) - F C(K) as their model, where K is

the number of classes. In the Gilbreth application, we implemented this VoxNet architecture.

2.2 Related Work

Prior work on conveyor-belt based agile robotics focused on two aspects: (i) object perception

and pick pose estimation, and (ii) multirobot coordination. A. Cowley et al. [20] presented

an application that uses a Willow Garage PR2 robot in a conveyor belt pick-and-place

application. However, PR2 is not a typical industrial robot. Y. Huang et al. [21] presented

a study of multi-robot coordination part-dispatching rules in factory conveyor belt product

lines. Bozma et al. [22] formulated a problem for multirobot coordination in a pick-and-place

application for objects on a conveyor belt. The focus of both papers was on the developments

of methods for optimal application performance. Neither focused on software implementation

of the corresponding application, which is our contribution in this paper.

As mentioned before, agile robotics, a relatively new research area with commercial

value, can benefit from cloud computing. “Cloud robotics” was introduced by Google

researcher James Kuffner in 2010 [23] as an approach to make robots “lighter, cheaper

and smarter.” A major constraint of cloud robotics is the possibility of losing network

connections, which may cause operational failures of robots. However, according to Lorencik

and Sincak [24], when compared to the cost of embedded intelligence on-board robots, the

cost of ensuring reliable network connections is lower. Hu et al. [25] proposed a cloud

robotics system architecture that combines an ad-hoc cloud and an infrastructure cloud.
Chapter 2 Background and Related Work 12

The adhoc cloud uses machine-to-machine (M2M) communications, while the infrastructure

cloud is supported by machine-to-cloud (M2C) communications. In the adhoc cloud, robots

communicate directly with each other and compute collaboratively. In operations that require

M2C communications, infrastructure clouds provide the required storage and computing

resources. According to Hu et al., the M2C level infrastructure cloud computing has two

main benefits: elastic access to compute resources, and large storage that allows for learning

from “the history of all cloud enabled robots.” To conclude, rich computing resources and

big data are the two major reasons why cloud robotics is gaining popularity.

Many applications have been developed for cloud robotic systems. Rastkar et al. [26]

demonstrated a M2C cloud robotics example with a Parallax BoeBot, a robot that has

very limited on-board resources. All processing-intensive tasks, such as image processing,

were handled by cloud computers. The Parallax BoeBot11 is a tiny car that has ultrasonic

sensors, servos and a Basic-Stamp chip on board. The Basic-Stamp chip is a small Printed

Circuit Board (PCB) that contains only the elements essential to controlling the BoeBot.

Instead of using ROS, Rastkar et al. used Microsoft Windows as the operating system, and

the BoeBot was connected to a Windows server via a wireless hub. This work showed how a

limited-resource robot can benefit from cloud robotics, and how cloud robotics can help keep

the size of robots small while operating at high efficiencies. V. Kumar and N. Michael [27]

stated in their work that the biggest challenge of creating small and completely autonomous

Unmanned Aerial Vehicle (UAVs) stems from size, weight and power constraints. UAV

coordination is a typical M2M cloud robotics application, as the latter helps keep UAVs

small, energy-efficient and intelligent at the same time.

B. Kehoe et al. [28] stated that there are four potential benefits for robot and automation

systems from cloud computing: (i)Big Data, (ii) Cloud Computing, (iii) Collective Robot

Learning, and (iv) Human Computation. Big data helps robots take advantage of machine

learning and deep learning. In their prior work [29], a cloud-based grasping application was

demonstrated by combining online Google object recognition engine and cloud storage with

offline grasp analysis and Computer-Aided Design (CAD) model analysis. Such a system

serves as a good example of how big data and cloud storage can help build artificial intelligent
11
https://www.parallax.com/product/boe-bot-robot
2.2 Related Work 13

(AI) robotic systems. Kiva system [30], an Amazon warehouse system that uses mobile

robots to bring inventory to warehouse workers, is a good candidate to test collective robotic

learning. Hundreds of robots in the Kiva system connect to each other, as well as with a

local central coordinating server, through wireless communications. R. Rahimi et al. [31]

demonstrated how an industrial robotics surface blending application, Godel, could leverage

cloud computing. This work studied the effects of wide-area network communications on

the application.

Adding to this volume of work on applications for cloud robotics, this work offers a new

conveyor-belt-based industrial robotics application that combines object recognition and

motion planning.
Chapter 3

Gilbreth Application

3.1 Introduction

This chapter describes a conveyor-belt based pick-and-sort industrial robotic application,

Gilbreth. Section 3.2 describes the sorting process and the environment set up in a robot

simulation software, Gazebo. Section 3.3 describes the software architecture. Section 3.4

describes the ROS packages and the method used to implemented the application. Section

3.6 describes the ROS message communicating between each ROS node and the network

architecture.

3.2 Gilbreth sorting application

The application works as follows. A conveyor belt in a factory floor moves at a constant

speed. Mixed parts (objects) arrive at random on the conveyor belt. As illustrated in

Fig. 3.1, five types of object are simulated. The object pose (position on the belt and

orientation) is not predefined. In other words, different types of objects, in any pose, can

arrive on the belt, into the work space of the robot arm, at any instant in time.

The UR10 robot arm (which has 6 degrees of freedom) is used in this application. Since

the robot arm is mounted on a linear actuator that runs parallel to the conveyor belt, our

motion planning algorithms have to design joint trajectories for 7 degrees of freedom. A

vacuum gripper is connect to the end of the robot arm as our end effector. The robot arm is

14
3.2 Gilbreth sorting application 15

Figure 3.1: Gilbreth setup: sensors, objects, conveyor belt and UR10 robot arm

instructed to pick up an object arriving on the conveyor belt, move to the position of the

bin corresponding to the object type, and place the object in that bin. Thus the objects

arriving on the conveyor belt are sorted, by type, into separate bins. For example, all piston

rods are placed in one bin and all gears are placed in another bin.

Since each arriving object can be of a different type, sensors are used to capture point

cloud images of the objects, and object recognition algorithm is used to identify the object

type.

The application uses two sensors: (i) Break beam sensor and (ii) Kinect sensor. A break

beam sensor is placed such that the laser beam crosses the conveyor belt at a low height

above the belt. When an object arrives on the conveyor belt, the break beam sensor is

triggered. This signal starts a chain of computation, which is described in the next section.

A Kinect sensor is placed above the conveyor belt in order to capture point cloud images of

the arriving objects. The Kinect sensor is mounted slightly before the break beam. When

an arriving object triggers the break beam, the Kinect sensor can capture the whole point

cloud data of the object and sent the data for image processing and object identification.

For each identified object, five end effector poses: (i) Waiting pose, (ii) Pick-approach

pose, (iii) Pick pose, (iv) Pick-retreat pose and (v) Place pose, are calculated to complete
Chapter 3 Gilbreth Application 16

one picking assignment. A motion planning algorithm computes valid trajectories between

various poses of the end effector, also known as tool poses, in order to move the robot arm

on the linear actuator and execute its end effector to pick and place the object.

In summary, the tasks of point cloud image capture, image transfer, object segmentation,

object recognition, computation of robot arm end effector poses, robot arm linear actuator

positions, and motion planning to generate joint trajectories, need to be completed in

real-time. The faster the velocity of the conveyor belt, the shorter the duration for all this

computation, which makes this Gilbreth application a good candidate for high-performance

computing and high-speed communications.

3.3 Gilbreth software architecture

The Gilbreth software architecture, illustrated in Fig. 3.2a, consists of multiple ROS nodes,

which are described below. Each ROS node defines various ROS messages and communicates

between different ROS topics. ROS messages and ROS topics are described in section 3.6.

Gilbreth manager The main purpose of this ROS node is to monitor the status of an

object as it is handled by the various ROS nodes of the workflow. The main data exchanges

occur directly between ROS nodes, while only status updates are sent to the Gilbreth

manager. Each ROS node checks for interrupts from the manager in case there are problems

with the actions of the previous ROS node in the workflow. This complexity is captured in

the per-object Finite State Machine (FSM) that is maintained by the Gilbreth manager, as

illustrated in Fig. 3.2c.

An alternative design is to have all inter-ROS node data transfers pass through the

Gilbreth manager, as illustrated in Fig. 3.2b. This design would be simpler as it avoids the

need for interrupts. However, large size data, for example point cloud image, will transported

multiple times between the data center and the factory. This design would be less suitable

than our current design for a distributed execution in which the lower set of ROS nodes

shown in Fig. 3.2a are run on hosts at remote cloud-computing data centers.
3.3 Gilbreth software architecture 17

(a) Gilbreth software architecture

(b) Software architecture with main workflow manager

(c) Per-object finite state machine

Figure 3.2: Gilbreth software: Per-object workflow starts at Object-Arrival Detection


Chapter 3 Gilbreth Application 18

Object-Arrival Detection This ROS node receives the break beam disruption signal

from the sensor, which occurs every time a new object cross the break beam, i.e., enters

the workspace of a robot arm. Upon receiving the disruption signal, this ROS node sends a

signal to the Kinect sensor and records the time stamp, also known as the detectionTime,

to identify objects until a proper object identifier is assigned to the object, which can only

happen after completing the object recognition phase.

Kinect Publisher This node receives the signal from the Object-Arrival Detection node

and sends out messages containing the point cloud data. The Kinect camera, mounted

above the conveyor belt, continuously captures images of the conveyor belt at 15 Hz. In real

word, the camera sends this data to the host to which it is connected via a USB port. In

our simulation, the captured image is sent to a ROS topic to be subscribed by other ROS

nodes. To avoid unnecessary data communications and computation, the Kinect Publisher

collects the point cloud data and publishes the data only when it receives a signal sent

from Object-Arrival Detection ROS node. The detectionTime parameter is passed along

with the point cloud data. In addition to publishing the PointCloud ROS topic for future

computation, the Kinect Publisher sends a message with detectionTime to a ROS topic,

subscribed by the Gilbreth manager (the point cloud data is not sent here since this data

could be large and is not required by the Gilbreth manager).

Object Segmentation Upon receiving point cloud data from the Kinect Publisher, the

Object Segmentation ROS node executes object segmentation algorithms. It first removes

background information from the collected point cloud data by filtering out the unnecessary

data out of a certain area. The desired area is specified in a configuration file, which is shown

in Table 3.1. The Object Segmentation ROS node next down samples the filtered point cloud

data by a down sample parameter. Image size is largely reduced by this process. After the

down sampling, we use euclidean cluster extraction to extract point cloud blobs. If multiple

blobs are detected, the Object Segmentation node will queue these data blobs, and send the

blobs in succession, along with the detectionTime parameter, to the Object Recognition

ROS node in a ROS topic. The Object Segmentation node also sends the detectionTime
3.3 Gilbreth software architecture 19

with blob identifiers to a ROS topic which is subscribed by the Gilbreth manager (the point

cloud data is not sent here since this data could be large and is not required by the Gilbreth

manager). Fig. 3.10 shows the point cloud image after object segmentation of five object

type, illustrating that the background scene information is removed after filtering.

ROS node Parameter Value


Object Segmentation camera roi x:[-0.225, 0.192], y:[-0.287, 0.284], z:[0.0,
0.7162]
Object Segmentation down sample 0.01

Table 3.1: Object segmentation ROS node configuration parameters

Object Recognition Correspondance grouping [32] is used in the Object Recognition

node to compare each received object blob with the point cloud data stored for known objects

in its database to find the closest match. If the object-recognition algorithm identifies the

type of the arriving object with reasonable accuracy, it uses iterative closet point algorithm,

also known as ICP, to compute and align the objectPickPose, which is a valid pickup pose

for grasping the object. This computation requires the object-arrival pose (position and

orientation on the conveyor belt), and the stored value of the “pickPoint” for grasping the

object if it had arrived in the same orientation as the object for which the stored PCD data

was created.

If no match is found, the Object Recognition ROS node will not publish any ROS

message, the object is just allowed to pass through the robot workspace with no action

executed by the robot. In practice, such misses should not occur if the database is populated

well; however, such error-tolerant procedures are required to avoid adverse consequences.

The Object Recognition ROS node assigns a unique objectIdentifier to each recog-

nized object. The ROS node then publishes the detectionTime parameter, objectIdentifier,

objectType, and the objectPickPose, for each recognized object, to the ObjectDetection

ROS topic, to which the Robot Poses Planner ROS node subscribes. At the same time, the

Object Recognition ROS node sends the detectionTime parameter, and object type and

identifier via a ROS topic to the Gilbreth manager. This allows the latter to track the state

of the object as it passes through various ROS nodes.


Chapter 3 Gilbreth Application 20

Robot Poses Planner This ROS node has a database indicating the bin location cor-

responding to each object type and a fixed base position (which is also the object pickup

position). Each object type has its unique lift and drop offset since the size of object various.

Using the objectType in the received message on the ObjectDetection ROS topic, this

Robot Poses Planner ROS node consults the database to find the dropoff position and the

pick-and-place offset. The node then combines the fixed base position, dropoff position,

and the objectPickPose information in the received message to compute five poses for the

robot arm end effector: pick approach pose, pick pose, pick retreat pose, place pose and

home pose. The computed robot poses, along with detectionTime and objectIdentifier,

are sent on a ROS topic to the Robot Motion Planner ROS node. In addition, a message is

sent to the Gilbreth manager via a ROS topic to update the latter about the status of the

object in the handling process.

Robot Motion Planner This ROS node receives the tool poses and computes valid joint

trajectories to move the robot arm and end effector between the various robot poses. It

also computes the right time instant at which the vacuum gripper attached to the robot

arm should be enabled or disabled by combining the object detectionTime and the offset

stored in the database. RRTConnect algorithm is used to compute the trajectory. After the

computation, a serious of joint trajectories is queued for robot to execute.

Robot Execution This ROS node interacts with the Robot Controller to execute the

planned motion. Robot trajectories are queued and executed one-by-one to enable the robot

arm to pick up the target object, move to the dropoff position, place the object in the bin

located at the dropoff position, and finally move back to the home position.

3.4 Gilbreth software prototype

The Gilbreth application software is available at: https://github.com/swri-robotics/gilbreth.

Our starting point was to download and install ROS, specifically the Kinetic version.

Since we do not have access to a real factory floor with a conveyor belt and a UR10 robot
3.4 Gilbreth software prototype 21

arm, we used Gazebo, a dynamic multi-robot simulator for 3D enviroments1 , to develop the

Gilbreth application.

In addition to ROS and Gazebo, we installed the following packages:

ARIAC2 : The ARIAC osrf gear package includes a world file that is composed of

many models, such as a work cell model (which simulates the factory floor), sensor

models, and robot models, with corresponding plugins for Gazebo. Of these models

and plugins, for our Gilbreth application, we downloaded the source files (C code) for

the conveyor belt, laser break beam sensor (called ProximityRay plugin), and vacuum

gripper end effector, and corresponding model files, which specify input parameters

required by the plugins in Simulation Description Format (SDF)3 .

Gazebo ROS4 : Gazebo ROS and related packages provide wrappers for Gazebo to

support interfacing between external ROS nodes and Gazebo. One issue is that ROS

uses Universal Robotic Description Format (URDF), an XML file format to describe

all elements of a robot, while Gazebo uses SDF. However tutorials are provided to

enable the use of URDF files in Gazebo5 .

MoveIT!6 : This package offers state-of-art robot manipulation, motion planning and

control software. We used this package in our Gilbreth application for motion planning,

with collision detection, of the UR10 robot arm.

Universal Robot7 : This ROS-Industrial (ROS-I) package contains the models, configu-

rations and control systems required for Universal Robots. In addition to the robot

drivers (written in Python), kinematics software (written in C++), configuration files to

interface with the MoveIT! package, YAML files and XML based launch files required

to simulate a UR10 robot arm in Gazebo, are included. This package supported

integration with the MoveIT! package.


1
http://gazebosim.org/; https://bitbucket.org/osrf/gazebo
2
https://bitbucket.org/osrf/ariac/src
3
https://bitbucket.org/osrf/sdformat
4
http://wiki.ros.org/gazebo ros pkgs
5
http://gazebosim.org/tutorials/?tut=ros urdf
6
http://moveit.ros.org/
7
https://github.com/ros-industrial/universal robot
Chapter 3 Gilbreth Application 22

Point Cloud Library (PCL) [33] includes several packages that were used in our

object segmentation and object recognition ROS nodes. For example, we used the

Hough3DGrouping package [32], which implements a 3D correspondence grouping

algorithm to recognize objects in a scene.

Gilbreth environment (world) in Gazebo Our first task was to create a factory envi-

ronment in Gazebo for our Gilbreth application. Fig. 3.1 shows the simulated environment.

The break beam sensor can be seen as a blue line across the conveyor belt. The 3D camera,

Kinect sensor, is the gray box shown mounted above the belt. Relative to the Kinect sensor

position, the break beam is positioned such that the object is within the range of the camera

when the beam is crossed by the object, thus a full image of the object is captured. The

linear actuator on which the UR10 robot arm moves can be seen in white parallel to the

belt. The vacuum gripper (end effector) is shown at the end of the robot arm. The bins into

which different types of objects are dropped by the robot arm are mounted along the linear

actuator.

Five objects, each of a different type, are shown arriving on the conveyor belt in Fig. 3.1.

We implemented an external ROS node called Conveyor-Spawner to spawn objects of

different types on to the belt in the Gilbreth world in Gazebo. Object type, inter-object

spacing, and object orientation are selected at random. The Conveyor-Spawner node also

recycle the object dropped into the bin and reached to the end of the conveyor belt.

Object-Arrival Detection and Kinect Publisher We implemented new software for

both these ROS nodes. Both nodes subscribe to sensor topics as described in Section 3.6.

Since these sensors are simulated inside Gazebo, these two ROS nodes receive messages from

Gazebo.

Object Segmentation and Object Recognition These ROS nodes use the Correspon-

dence Grouping (CG) algorithm; more specifically, we used the Hough voting [32] algorithm

implemented in Point Cloud Library (PCL). Fig. 3.3 illustrates the various steps. Models

are object Point Cloud Data (PCD) files that are saved on disk. These files provide 3D

descriptions of each object type. The Models pipeline is performed only once for each object
3.4 Gilbreth software prototype 23

Figure 3.3: Object segmentation and object recognition

type to generate features and frames. This pipeline includes normal computation, key points

extraction, key points descriptor computation, and key points reference frame computation.

The Scene block represents 3D images captured by the Kinect sensor. Any single image,

in addition to the object of interest, also includes background information, such as the

conveyor belt and warehouse floor, as these items fall with the range of the depth camera.

The segmentation process removes this background information to isolate the image of the

objects on the belt. The rest of the Scene pipeline (i.e., blocks after the Segmentation node)

is the same as the Models pipeline; the only difference is that the Scene pipeline is executed

for each captured scene, while the Models pipeline is executed only once at the start to

calculate necessary data for feature matching and correspondence grouping.

Once the key points descriptors are computed for a captured object PCD, a group

of one-to-one correspondences are checked between stored information in Models and the

captured information in Scene. Next, correspondence grouping is performed to exclude

wrong point-to-point correspondences due to nuisance factors like noise. The object type

in Models with the highest number of correspondences is reported as the type of object in

Scene.

Once object type is identified, Iterative Closet Point (ICP) is used to estimate the pose

(position and orientation) of the object in the Scene. For each type of object, one picking

pose is pre-defined and saved in database. The correspondence grouping procedure also

output an estimated object pose, however it has has low-precision accuracy and therefore
Chapter 3 Gilbreth Application 24

it is not suitable for pick pose estimation. ICP is used to minimize the distance between

Model PCD and Scene PCD and thus producing the an estimated pose for the object. The

estimated pose is used to transfer the pre-defined picking pose.

Robot Poses Planner We considered two pick-up policies for our Gilbreth application:

static-position pickup policy and dynamic-position pickup policy. Fig. 3.4a illustrates the

static-position pickup policy. With this policy, the robot arm always picks up objects

from the belt at the same position. After the robot arm picks up an object, moves to the

appropriate bin (as determined by object type) and places the object in that bin, the arm

returns to its static pickup position and waits for the next object.

(a) Gilbreth static-position pickup policy

(b) Gilbreth dynamic-position pickup policy

Figure 3.4: Gilbreth object picking policies

Fig. 3.4b presents a more complex dynamic-position pickup policy in which the robot

“chases” the next available object after placing the previous object in its appropriate bin.

This policy is only feasible if the linear actuator can move faster than the conveyor belt.

Assume that the conveyor-belt velocity is vbelt , and the linear-actuator velocity is vLA . Thus
3.4 Gilbreth software prototype 25

the object moves at speed vbelt , and the robot arm moves at speed, vLA . At time tdrop , when

the robot arm drops an object into its bin, let its position be xrobot . Let the position of

the next candidate object O, for pickup at this time tdrop be xO . This ROS node computes

whether pickup of object O is feasible, and if so, it determines the pickup position, xpickup .

xO vLA − xrobot vbelt


xpickup = (3.1)
vLA − vbelt

If xpickup is within the maximum range of the robot arm, then the pickup is feasible; if not,

the robot arm does not chase object O, and instead allows the next robot arm along the

belt to pick up object O.

While the dynamic-position pickup policy is more complex, it is expected to result in

a higher object sorting throughput when compared to the static-position pickup policy.

Our current implementation is for the static-position pickup policy. However, we plan to

implement the dynamic-position pickup policy in a later version of Gilbreth.

We used configuration files to set parameters and limitations relating to the robot arm.

Parameters include per-object-type bin positions, the height that the robot arm should be

lifted after it picks up an object, and the time duration required for transition between

different robot poses. These parameters were tested and tuned to achieve better performance.

Robot Motion Planner The MoveIt! ROS package is used in this ROS node. MoveIt!

software integrates planners from Open Motion Planning Library (OMPL) [8]. Of these

planners, we chose to use the Rapidly-exploring Random Trees (RRT) Connect planner.

RRT Connect planner has high trajectory solving success rate and low computation time8 .

Fig. 3.5 illustrates the pipeline of motion planning process. When the Robot Motion Planner

receives the tool poses, a chain of computation began. It use current joint states of the robot

and target tool poses to compute the robot trajectories. If the robot trajectory computation

failed, the motion planner re-compute the trajectory if it does not reach the max re-planning

iteration parameter. If the robot trajectory computation succeed, the motion planner will
8
http://moveit.ros.org/assets/pdfs/2013/icra2013tutorial/ICRA13 Benchmark.pdf
Chapter 3 Gilbreth Application 26

Figure 3.5: Motion planner pipeline

monitor the execution status of the trajectory. If the trajectories executed successfully, the

ROS node will continue next planning, otherwise, failure status will be published.

3.5 Gilbreth software prototype evaluation

Experimental setup A single host was used to execute all components of the Gilbreth

application. This host has a single 4-core (8 threads) CPU (Intel Core i7-4710MQ@2.50GHz)

and 12 GB RAM. The OS used is Ubuntu 16.04.2 LTS with kernel version 4.10.0-37-generic.

To measure per-ROS-node CPU usage, we wrote a Python script that calls functions in

psutil9 , which is a Python library. These calls are executed in periodic intervals, where

the period is specified by the input frequency parameter. The results are collected in a csv

file and analyzed. In addition, to collect processing times for high-CPU-usage ROS nodes

such as the Object Recognition and Motion Planner nodes, the source code of these nodes

were modified.

Experiments execution To initiate all the ROS nodes of the Gilbreth application,

multiple launch files are used. The environment launch file instantiates the Gilbreth

environment in Gazebo and RViz. After the environment is created, we load parameters and

launch Kinect Publisher, Object Segmentation, Object Recognition, Robot Poses Planner,

and Robot Motion Planner nodes.


9
https://github.com/giampaolo/psutil
3.5 Gilbreth software prototype evaluation 27

Table 3.2 lists the parameter values. The inter-object spawning period is a uniformly

distributed random variable between 7 and 8 sec. The CPU-usage monitoring period is used

by our data-collection Python script. An object yaw variance of 180◦ implies that an object

can arrive in any orientation relative to the belt movement direction. The four object types

shown in Fig. 3.1 are used in this experiment. For each run of the experiment, 500 objects

were spawned, and measurements were collected for the CPU usage and processing times for

each object. A Gazebo real-time factor of 0.4 means that 0.4 sec in Gazebo requires 1 sec of

wall time to simulate. These low values of the real-time factor illustrate a need to parallelize

Gazebo.

Two ROS services are initiated to start spawning random objects on to the moving

conveyor belt. The application execution automatically starts when these two services

are initiated. When all the ROS nodes are running, we launch our Python script to start

collecting data.

Parameter Symbol Value


Inter-object spawning period tspawn [7, 8]sec
CPU-usage monitoring period tcpu 0.3 sec
Object yaw variance ∆deg 180◦
Object type Nobj 4
Number of objects/experiment Nexp 500
Gazebo real time factor R 0.4-0.6

Table 3.2: Values of experiment parameters

Experimental results From the collected data, ROS nodes can be classified into three

groups: (i) high-CPU usage, (ii) low-CPU usage, and (iii) burst-CPU usage. The high-CPU

usage ROS nodes always consume more than 100% CPU (the maximum value is 800% since

our server has 8 threads). The low-CPU usage ROS nodes always consume less than 50%

CPU. Finally, the ROS nodes in the burst-CPU usage group require high CPU resources

only in small bursts.

Table 3.3 shows the classification of all major ROS nodes in Gilbreth. Gazebo requires a

sustained level of approximately 200% CPU cycles. CPU usage is below 50% for the Kinect

Publisher, Robot Motion Planner and Robot Poses Planner. The Robot Motion Planner
Chapter 3 Gilbreth Application 28

creates a Move Group ROS node to compute the robot joint trajectories. The Move Group

node has a burst CPU usage during its computation.

CPU Usage Node Type


High Gazebo
Low Kinect Publisher, Robot Poses Planner, Robot Motion
Planner
Burst Object Recognition, Move Group, Object Segmentation

Table 3.3: ROS-node classification based on CPU usage

(a) CPU usage of selected ROS nodes


(b) Robot motion planner processing time

Figure 3.6: CPU usage and processing time for one ROS node

Fig. 3.6a shows the CPU usage for Move Group (blue), Object Segmentation (green) and

Object Recognition (red) nodes for an experimental run of 30 seconds in which four disk

objects were processed. Of these three ROS nodes, the Object Recognition node consumes

the most CPU cycles. The peaks occur when the code leverages parallel processing for

feature extraction.

Next, we executed runs in which all 500 objects were of the same type in order to

generate the boxplots shown in Figs. 3.6b and 3.7a. Fig. 3.6b shows that motion planning

for a single-robot arm setup is not computationally intensive. However, object recognition

time is significant as seen in Fig. 3.7a. The object recognition times for the four objects are

directly related to the size of the objects, with the pulley being the largest and the piston

rod being the smallest.

Fig. 3.7b shows the total time for the robot arm to execute its motions between the

four poses. These results were obtained with 1000 executions of the Robot Motion Planner
3.6 Gilbreth message description and experiments 29

(a) Object recognition processing time (b) Robot execution time

Figure 3.7: Compare object recognition time with physical robot arm movement time

for each object type. For each type of object, the four poses are the same. However, the

output shows that the trajectories computed by the motion planner is not relatively stable.

The amount of the outliers is very high. We propose an improvement in the next chapter.

Another interesting point to note is that the recognition time for the larger objects has a

median value that is close to, or even larger, than the time taken to execute the robot arm

movement. The implication is that in a multi-robot system, we will require multiple servers

or faster servers to run the object recognition ROS nodes.

3.6 Gilbreth message description and experiments

The ROS framework uses a publish-subscribe method for communications. Publishers

broadcast their messages on ROS topics. Subscribers sign-up for topics a priori based on

their requirements, and thus receive the corresponding published ROS messages. While ROS

topics are used for unidirectional communication, ROS services are used for bidirectional

communication. While each ROS topic is associated with a single message type, ROS services

have two message types (request and reply). Section 3.6.1 describes the Gilbreth message

flow, and Section 3.6.2 describes the experiments we conducted to quantify the sizes of the

Gilbreth ROS messages.


Chapter 3 Gilbreth Application 30

3.6.1 Gilbreth message description

The Gilbreth application is designed for distributed architectures, and hence has numerous

ROS nodes. Fig. 3.8 shows the Gilbreth ROS nodes and ROS topics flow chart characterizing

the communication patterns. Only those ROS topics that transfer data values necessary for

the main pick-and-sort functionality are described below.

Figure 3.8: Gilbreth message communication flow; ROS topics: blue; ROS services: red

Gazebo Gilbreth Module This ROS node simulates the factory environment illus-

trated in Fig. 3.1. Fig. 3.8 shows the five major components in the Gilbreth module of

the Gazebo simulator: Kinect Camera, Break Beam, Vacuum Gripper, UR10 Robot, and

Bins-and-Invisible Wall. The Kinect-camera component monitors the conveyor belt

and publishes captured depth images to the ROS topic /depth camera/depth/point. The

message type for this ROS topic is described in the ROS package sensor msgs/PointCloud2.

The state of the break-beam sensor, illustrated in Fig. 3.11a, is described in ROS package

gilbreth gazebo/Proximity and published to ROS topic /break beam sensor change.

The min range and max range data values indicate the detection range of the break

beam. The Header describes the time when the ROS message is published. The boolean

value object detected changes to True when an object enters the detection range of
3.6 Gilbreth message description and experiments 31

the break-beam sensor. Gazebo publishes this ROS message whenever the boolean value

object detected changes state, which occurs both when an object enters the range and

when the object leaves the range of the break-beam sensor.

The vacuum-gripper component publishes its status to ROS topic /gripper/state. The

vacuum gripper state message is described in ROS package gilbreth gazebo/VacuumGripper

State. The message, shown in Fig. 3.11b, characterizes whether or not the vacuum gripper

is enabled, and if enabled, whether or not there is an object attached to the gripper. In

other words, this message indicates whether or not the vacuum gripper has successfully

grasped an object.

The UR10-robot component publishes its messages to ROS topic /joint states. The

associated message type is described in ROS package sensor msgs/JointState. The

message carries data on the states of all the robot joints. The state of each joint, shown in

Fig. 3.11c, is specified by the name, position, velocity and effort of the joint, and the time

when the joint state was recorded. The UR10 robot also communicates with the move group

ROS node via two ROS services: /robot rail controller/ and /robot arm controller.

The messages sent in these services include data on the: (i) status of the robot controllers,

(ii) feedback on the move instructions, and (iii) success or failure of the move operations.

The Bins-and-Invisible-Wall component is built to recycle objects. Five bins are placed

inside the robot’s workspace, and an invisible wall is mounted at the end of the conveyor

belt to catch missed objects. This component publishes the pose and the type for both

captured and missed objects to ROS topic /disposed models10 . The associated message

type, described in ROS package gazebo msgs/ModelStates and illustrated in Fig. 3.11e,

provides information for the conveyor spawner ROS node to spawn new objects within the

Gazebo Gilbreth environment.

Kinect Publisher This ROS node was implemented in a new ROS package called

gilbreth perception. It subscribes to the ROS topics /depth camera/depth/point and

/break beam sensor change, as illustrated in Fig. 3.8. The Point Cloud Data (PCD) cor-

responding to a single image is sent on ROS topic /kinect points, which is described in
10
The term ”model” is synomymous with ”object.”
Chapter 3 Gilbreth Application 32

(a) Piston rod (b) Gasket (c) Pulley (d) Disk (e) Gear

Figure 3.9: Depth images of 5 types of objects

(a) Piston rod (b) Gasket (c) Pulley (d) Disk (e) Gear

Figure 3.10: Depth images of 5 types of objects after object segmentation

package sensor msgs/PointCloud2. The associated message is sent only when the message

received on the /break beam sensor change topic indicates that one-or-more objects en-

tered the detection range of the break-beam sensor. In addition to the PCD, a time stamp

indicating when the break-beam sensor triggered is included in the message.

Robot State Publisher This new ROS node, obtained from its eponymous ROS package,

was integrated into Gilbreth this year. The node reads the robot description parameters from

the ROS parameter server, described in section 2.1.2. The parameters are loaded at the start

of the application execution. This ROS node subscribes to the ROS topic /joint states, as

illustrated in Fig. 3.8. After computing the forward kinematics for the robot, this ROS node

publishes the results via a transform tree, described in ROS package tf2 msgs/TFMessage,

to ROS topic /tf. The transform tree, part of which is illustrated in Fig. 3.11d, is used to

transform points and vectors between two coordinates.

Conveyor Spawner We implemented this new ROS node to spawn objects, whose type

and pose are selected randomly, onto the conveyor belt. The software for this ROS node

is included in the gilbreth gazebo package. This ROS node subscribes to the ROS topic

/disposed models, whose message is illustrated in Fig. 3.11e. This message provides
3.6 Gilbreth message description and experiments 33

(a) Break beam sensor change ROS message

(b) Gripper state ROS message

(e) Disposed model ROS message


(c) Joint state ROS message

(f) Tool pose ROS message

(g) Object detection ROS message

(d) Transform tree ROS message

Figure 3.11: Gilbreth ROS messages


Chapter 3 Gilbreth Application 34

information about the objects disposed into the bins or stopped by the invisible wall. For

each disposed model, this ROS node spawns a new object. The node then publishes the

spawned-object type and the object-spawning time stamp in a message to a ROS topic (not

shown in Fig. 3.8).

Object Segmentation This ROS node subscribes to the ROS topic /kinect points.

The segmentation process removes the background information in each received depth image,

and publishes the filtered PCD, described in sensor msgs/PointCloud2, to ROS topic

/segmentation result. Fig. 3.10 shows the depth images after the object-segmentation

processing for each of the five object types shown in Fig. 3.9.

Object Recognition This ROS node subscribes to ROS topics /segmentation result

and /tf. The transform tree data is used to compute the object pick pose from camera coor-

dinates to world coordinates. If the object is recognized, the Object Recognition ROS node

publishes the object detection message, described in ROS package gilbreth msgs/ObjectDe-

tection, to ROS topic /recognition result. The message, shown in Fig. 3.11g, provides

the detection time, the type and the pick point pose of the object.

Tool Planner This ROS node subscribes to the ROS topic /recognition result and pub-

lishes the tool poses message, described in ROS package gilbreth msgs/TargetToolPoses,

to ROS topic /target tool poses. The ROS message, depicted in Fig. 3.11f, shows one of

the four tool poses for an object. The Header field describes the execution deadline and the

pose field describes the position and orientation of the UR10 robot gripper for the particular

object.

Robot Execution This ROS node subscribes to ROS topics /target tool poses and

/gripper/state. The ROS node calls the functions in MoveIt! library to compute robot

trajectories, and the MoveIt! library automatically generates the Move group wrapper

ROS node.
3.6 Gilbreth message description and experiments 35

Move Group Wrapper This ROS node is the python wrapper of the C++ move group

library. The Move Group Wrapper ROS node creates the Move Group ROS node and

interacts with the latter via ROS services /execute trajectory/*. The associated message

types are described in ROS package actionlib msgs. The Move Group Wrapper ROS node

monitors the status, and the results of the robot trajectory execution.

Move Group This C++ ROS node Move Group was obtained from ROS package MoveIt!.

This node computes the trajectories between different tool poses for the robot and its

vacuum gripper to move into position, pick up an object from the conveyor belt, and

place the object in a bin based on its type. This node also controls the robot simulated

in the Gazebo Gilbreth Module. The computed trajectories, described in ROS package

trajectory msgs/JointTrajectory, are published to ROS topic /robot arm controller/

follow joint trajectory/ and /robot rail controller/follow joint trajectory/.

The Gazebo Gilbreth Module and the Move group ROS nodes interact through ROS

services /robot arm controller/command and /robot rail controller/command, moni-

toring the status of the robot controllers, the feedback on the move instructions, and the

success or failure of the move operations.

3.6.2 Experiments

We planned and executed experiments to quantify the sizes of the Gilbreth ROS messages.

Experimental setup The experimental setup was as follows. A single host was used

to execute all components of the Gilbreth application. This host has a single 4-core (8

threads) CPU (Intel Core i7-4710MQ@2.50GHz) and 12 GB RAM. The OS used is Ubuntu

16.04.2 LTS with kernel version 4.10.0-37-generic. To measure the sizes of the Gilbreth ROS

messages, we wrote a python script, msg monitor, that subscribes to the desired ROS topics,

and measured the memory usage of each ROS message. We leveraged Pympler11 , a Python

library, specifically, the asizeof module, which measures the memory consumption of each

ROS message.
11
https://github.com/pympler/pympler
Chapter 3 Gilbreth Application 36

Experimental execution and results After launching the Gilbreth application, we

launched the python script msg monitor to subscribe to different ROS topics, and measure

the sizes of ROS messages. The inter-object spawn period was varied randomly in the range

6-8 seconds. Gilbreth ROS messages can be classified into two categories based on their

publishing patterns: (i) triggered messages, and (ii) periodic messages. Table 3.4 shows

the triggered ROS messages. The bandwidth consumed by each triggered message is not

quantified because it largely depends on the inter-object spawning period, and the successful

completion of the previous ROS nodes in the per-object processing timeline. Table 3.5

describes the periodic ROS messages, i.e., those that are published continuously with a

certain frequency. The bandwidth measurement was recorded using a ROS built-in command

line tool rostopic bw. The sizes of both types of ROS messages were recorded in csv files

for later analysis.

The majority of the ROS messages have constant message sizes. For example, the point

cloud image of an object, published by the Kinect Publisher ROS node, is consistently

9.8 MB. The size of the point cloud data is determined by the resolution of the Kinect

Camera. Meanwhile, the Object Segmentation ROS node publishes messages of varying

size. The message size depends upon the PCD being processed, which in turn depends on

the object type and background information. Hundred objects of each type were spawned in

the experiments to compute the average message size per object type.

ROS topic Message type Message Size


break beam change gilbreth gazebo/ 48 B
Proximity
kinect points sensor msgs/ 9.8 MB
PointCloud2
segmentation result sensor msgs/ gear:13.2 KB, disk:18.2 KB, pulley:14.5
PointCloud2 KB, piston rod:5.3 KB
recognition result gilbreth msgs/ 1.3 KB
ObjectDetection
target tool poses gilbreth msgs/ 3.9 KB
TargetToolPoses
disposed models gazebo msgs/ 129 B
ModelStates

Table 3.4: Message size of triggered ROS message


3.7 Conclusions 37

ROS topic Message type Message Size Bandwidth Frequency


gripper/ state gilbreth gazebo/ 2B 321.32 B/s 160 Hz
VacuumGripper-
State
joint state sensor msgs/ 0.38 KB 19.13 KB/s 50 Hz
JointState
tf tf2 msgs/ TFMes- 0.80 KB 24.35 KB/s 30 Hz
sage

Table 3.5: Message size of periodic ROS message

The experimental results showed that the largest triggered ROS message was the object

PCD published by the Kinect Publisher ROS node. The Gazebo-simulated Kinect camera

publishes point cloud data at a frequency of 15 Hz. By implementing the Kinect Publisher

ROS node, which only transmits the PCD upon receiving an indication from the break-beam

sensor of a state change, we significantly decreased the network bandwidth consumed by the

Gilbreth application.

Amongst periodic ROS messages, the tf topic messages consumed the highest bandwidth.

However, the total bandwidth used by the Gilbreth application was still very low.

3.7 Conclusions

This chapter demonstrated the feasibility of implementing a pick-and-sort application in

which computer vision is combined with sophisticated motion planning to enable an industrial

robot arm to position itself and its end effector to successfully pick up objects from a moving

conveyor belt, move to the correct bin (based on object type) and place the object in the

bin. Details are provided on how we combined software from various ROS and ROS-I

packages for rapid application development. Finally, experiments were conducted to evaluate

the prototype. The Gazebo simulator of the factory environment, the object-recognition,

object-segmentation, and Move Group ROS nodes required the most amount of computing

cycles. We found that the robot execution time for picking up, moving and placing an object

in its right bin, takes approximately the same amount of time as the processing time required

for object recognition. This finding shows a need to improve the object-recognition pipeline.
Chapter 3 Gilbreth Application 38

Based on our Gilbreth ROS messaging experiments, it appears that the Gilbreth applica-

tion has a small network bandwidth requirement, and therefore, deployment of Gilbreth ROS

nodes across a Wide Area Network (WAN) may be feasible. However, on closer analysis,

we observe that only some ROS nodes can be located remotely on a cloud-computing data

center that is connected by a WAN to the factory floor.

We identified two types of ROS messages, triggered and periodic. The triggered ROS mes-

sages are mostly of fixed size. The Object Setmentation, Object Recognition and Tool

Planner ROS nodes can be deployed on the cloud to take advantage of high computational

resources. In contrast, the Kinect Publisher ROS node subscribes to a high-bandwidth

ROS topic from the Gazebo Kinect Camera plugin. Therefore, this ROS node should be

deployed near or on the factory floor to avoid unncessary WAN communications. If the

inter-object spawning period is 10 seconds, and the Gazebo Kinect Camera monitors the

conveyor belt and publishes its PCDs at 15 Hz, a total of 150 PCDs are generated for one

object. Among these 150 PCDs, only one PCD is required for object segmentation and

object recognition. If the Kinect Publisher ROS node is run on a cloud computer, 99% of

PCDs sent to the cloud would be unnecessary data.

For the ROS nodes that subscribe to periodic ROS messages, latency is our dominant

concern. Most of the periodic message sizes are much smaller than the Ethernet Maximum

Transmission Unit (MTU) packet size of 1500 bytes. Thus, most periodic ROS messages can

be sent within a single packet. The implication of this observation is that most messages will

require a single TCP round-trip in the Slow Start phase on an open TCP connection. However,

most of the periodic messages have a high frequency; for example, the gripper/state ROS

message has a 160 Hz frequency. Since the Robot Execution, Move Group and Move Group

Wrapper ROS nodes subscribe to these periodic messages, locating these ROS nodes across

the WAN in a cloud computing data center will entail needing more network bandwidth.
Chapter 4

Gilbreth Prototype Improvements


and Evaluation

4.1 Introduction

Improvements were made in the object-recognition and in the motion-planning pipelines.

Fine tunings on object-recognition parameters and improvements in the object-recognition

pipeline were implemented by our collaborators at UTD. We characterized and evaluated

the improvements in isolation. Section 4.2 describes the implemented improvements and our

evaluation.

Fig. 3.6b shows that the motion-planner output is not robust, i.e., although the starting

pose and the ending pose are the same for each trajectory computation, the robot-execution

times had many outliers. To address this problem, we propose an improved motion planning

pipeline, which is described in Section 4.3.

We carried out an experimental evaluation of the overall Gilbreth pick-and-sort process.

The pick-and-sort success rate as a function of the mean inter-object spawning period was

measured and plotted. Section 4.4 provides details of our experiments and presents our

findings. The current implementation, even with these improvements, achieved only 71.3%

pick-and-sort success rate at the best setting that we measured. An analysis of the reasons

for this relatively low success rate showed that approximately 10% of the failure rate was due

39
Chapter 4 Gilbreth Prototype Improvements and Evaluation 40

to excess load (i.e., the robot arm could not keep up with the object arrival rate), and the

remaining failures were occurred in the motion-planning and grasping processes. Methods

for improving these algorithms are described in Chapter 5.

Section 4.5 describes our evaluation of a Convolution Neural Network (CNN) based object-

recognition algorithm called VoxNet, which was implemented by our UTD collaborators. This

solution offers significant improvement in object-recognition times over the CG algorithm

in the original prototype, but this CNN algorithm requires GPU computational resources

for long durations (on the order of hours) for model training. Cloud computing is a good

solution for executing model training processes.

4.2 Object recognition improvements

The processing time of the Correspondence Grouping (CG) algorithm depends upon the

product x×y×z, where x, y, and z represent length, width, and depth of the input object point

cloud data, respectively. To reduce object-recognition time, our UTD collaborators tuned

the object recognition parameters and improved the code by adding a PCD-downsampling

phase at the start, and then slowly increasing resolution until the object could be recognized

and a pick pose could be computed on the object data for the robot-arm end effector to

grasp.

At UVA, we ran two experiments to evaluate the fine-tuned object recognition ROS node,

as described in Section 4.2.1, and we characterized the improved pipeline in Section 4.2.2.

4.2.1 Experiments on fine-tuned object recognition process

Two experiments were conducted to evaluate the fine-tuned object-recognition pipeline.

Experiment 1 was setup on UVA local machine. The experiment evaluate the fine-tuned

object recognition process by computing the average recognition processing time, and

compared it with the processing time of the original parameters. In Experiment 2, we setup

a network environment on the Global Environment for Network Innovations (GENI) [34],

and evaluated if the object-recognition process can benefit from the Cloud technology.
4.2 Object recognition improvements 41

Experiment 1: UVA-local experimental setup and execution A single host was

used to execute all components of the Gilbreth application. This host has a single 4-core

(8-thread) CPU (Intel Core i7-4710MQ@2.50GHz) and 12 GB RAM. The OS used was

Ubuntu 16.04.2 LTS with kernel version 4.10.0-37-generic.

To measure the processing time of each object type, the object recognition ROS node

was modified to publish the processing time to a ROS topic /recognition duration. We

implemented a Python program to record the spawning time of the object, and the spawned

object type from the Conveyor Spawner ROS node in one csv file, and to record the object

recognition processing time, and the recognized object type from the Object Recognition

ROS node in a second csv files.

Table 4.1 lists the parameters used in the experiments. Four types of objects: disk,

pulley, gear and piston rod, were used in this experiment. For each object type, 500 objects

were spawned. The inter-object spawning period was selected randomly from the range 6 to

8 seconds. The Gazebo real time factor was 1.0, indicating a real-time simulation.

Two launch files, as described in Section 3.5, were used to launch the Gilbreth Application.

The python script was launched to record data after the Gilbreth application was launched.

Parameter Symbol Value


Inter-object spawning period tspawn [6, 8] sec
Object yaw variance ∆deg 180◦
Object type Nobj 4
Number of objects/experiment Nexp 500
Gazebo real time factor R 1.0

Table 4.1: Experiment-1 parameters

Experiment 1 (UVA-local experimental setup and execution) results Fig. 4.1

illustrates box plots of the object-processing time for four different object types (across

the 500 measurements obtained for each type). The object-recognition success rate was

computed by comparing the spawned and recognized object types across the two csv files.

Table 4.2 shows that the recognition success rate was 100.0% for all object types. We

compared these new results with the recognition times reported in Section 3.5 (before the

improvements). The processing time for pulley, disk and piston rod decreased by 62.69%,
Chapter 4 Gilbreth Prototype Improvements and Evaluation 42

46.38% and 50.35% respectively. Comparison of gear model is not provided as we used

larger gear models in the experiment. The experiment showed that the improved object-

recognition pipeline can significantly decrease computational load with negligible impact on

object-recognition success rate for Gilbreth objects.

Figure 4.1: Experiment-1 object-recognition processing times with the improved pipeline

Object Processing Time (before, af- Reduction Success


Type ter) improvements Rate
Pulley 7.8954, 2.9485 62.69% 100%
Disk 6.1979, 3.3235 46.38% 100%
Piston rod 2.0884, 1.0369 50.35% 100%

Table 4.2: Experiment-1 results on the improved object-recognition process

Experiment 2: GENI experimental setup and execution A two-server testbed was

used to test the performance of the improved object recognition ROS node. We took

advantage of GENI to build the network environment. Both servers have the same hardware

configurations: single 1-core (1-thread) CPU (Intel Xeon E5-2459@2.10GHz) and 860MB

memory. The OS used is Ubuntu 16.04.1 LTS with kernel version 4.4.0-75-generic.

The parameters we configured in both testbeds are described in Table 4.3. The experiment

has two steps. First, we launched the entire Gilbreth application on one server and recorded

the processing time collected for object recognition for each type. Second, we modified

the ROS launch file to launch Gazebo, the factory environment, on server A and all the

remaining ROS nodes on server B, as illustrated in Fig. 4.2a. After the application was
4.2 Object recognition improvements 43

(a) Experiment-2 setup in GENI testbed


(b) Comparison of performance on one GENI vs.
on two GENI servers

Figure 4.2: Experiment-2 results on the improved object-recognition pipeline

successfully launched on both servers, a python script was executed to record the processing

time of object recognition on server B. The average processing time for the object recognition

ROS node is calculated after these experiments have concluded.

Parameter Symbol Value


Inter-object spawning period tspawn [6, 8]sec
Object yaw variance ∆deg 180◦
Object type Nobj 5
Number of objects/experiment Nexp 100
Gazebo real time factor R 1.0

Table 4.3: Experiment-2 parameters

Experiment 2 (GENI experimental setup and execution) results Fig. 4.2b shows

that the reduction in object-recognition processing times when using two servers instead

of a single server. The single-server processing times observed here are longer than the

UVA-local machine processing times due to hardware limitations of the GENI servers. The

disk object type had the longest processing time among the five object types tested.

In conclusion, we propose using a multi-server setup to handle all the ROS nodes of the

Gilbreth application.

4.2.2 Improved object recognition pipeline characterization

We created Fig. 4.3 to characterize the improved pipeline, proposed by our UTD collaborators.

The downsampling process initially decreases the resolution significantly to reduce the size
Chapter 4 Gilbreth Prototype Improvements and Evaluation 44

of the original PCD file. Some local geometry features could be lost during this process.

If the object is successfully recognized, the computation proceeds to determine the pose

for the object pick point (i.e., the point at which the object will be grasped by the robot-

arm end effector) based on the object’s current orientation and pose. However, if the

CG algorithm fails to recognize the object in the downsampled PCD, the downsampling

parameter is reduced and a new PCD image with more local-geometry features is computed.

The downsampling and recognition procedure is repeatedly executed until the object is

either recognized or the computation reaches a maximum iteration number, at which point

failure is declared.

Figure 4.3: Improved object recognition pipeline

4.3 Motion planner improvements

We developed a new pipeline for the motion planning process, which is illustrated in Fig. 4.4.

Our motivation for making this improvement came from observations seen in Fig. 3.6b and

3.7b.

Fig. 3.6b shows that most trajectory computations can be completed within 0.5 seconds,

but Fig. 3.7b shows many outliers. For all objects of a given type, the starting and ending

poses for the robot to pick up an object and place the object in a bin are the same, and

yet we observe many outliers in the motion execution times seen in Fig. 3.7b. We observed

that the main motion-planner algorithm, RRT-Connect, which is part of the Move-It! ROS

package, sometimes generates unusually long trajectories. In these long trajectories, we


4.3 Motion planner improvements 45

Figure 4.4: Improved motion planning pipeline

observed that the robot arm does not move to the target pose directly, but instead wanders,

and even rotates for a while, before reaching the desired pose. A further study of the

relationship between the trajectory-planning computation time and the length of the output

trajectory showed that the longer the computation time, the longer the trajectory.

To avoid unreasonably long trajectories, we developed a new pipeline for the robot

motion planning process. We first set the maximum planning time for the motion planner to

0.5 seconds. If the trajectory cannot be computed within 0.5 seconds, the motion planning

process is considered to have failed. This choice of 0.5 seconds comes from our observation

that in most cases, if motion planning takes longer than 0.5 seconds, the resultant trajectory

is too long.

We placed another constraint on the time threshold for each phase of robot movement.

In general, a successful pick-and-sort process has four phases between five robot-tool poses:

home pose, pick-approach pose, pick pose, pick-retreat pose, and place pose. We measured

the average robot execution time for each these four phases, and configured corresponding

time thresholds as listed in Table 4.4. When motion planning is completed successfully, we

compare the expected trajectory-execution duration with the corresponding time threshold.

If the expected trajectory execution duration is higher than the corresponding time threshold,

the trajectory computation is considered a failure. Then, the time threshold is increased,

and a new trajectory is recomputed. If the motion planner ROS node cannot find a valid

trajectory within 10 iterations, the planning is considered to have failed.


Chapter 4 Gilbreth Prototype Improvements and Evaluation 46

Trajectory Phase Time Threshold


home pose to pick approach pose 3 seconds
pick approach pose to pick pose 2 seconds
pick pose to pick retreat pose 2 seconds
pick retreat pose to place pose 5 seconds
place pose to home pose 5 seconds

Table 4.4: Improved motion planning time threshold

4.4 Experiment on application performance

The Gilbreth application with the improved object recognition pipeline, and an improved

motion planning pipeline was executed, and its object pick-and-sort performance was

evaluated.

Experimental setup A single host was used to execute all components of the Gilbreth

application. This host had a single 4-core (8-thread) CPU (Intel Core i7-4710MQ@2.50GHz)

and 12 GB RAM. The OS used was Ubuntu 16.04.4 LTS with kernel version 4.13.0-45-generic.

To measure the pick-and-sort performance of the Gilbreth Application, we modified

the code of each ROS node and programmed a Gilbreth Monitor ROS node to monitor

the status of each object. The modified Conveyor Spawner ROS node publishes the type,

spawning time, and sequential number of the spawned object to a ROS topic. The Gilbreth

Monitor ROS node subscribes to this ROS topic, and saves all collected information in a

csv file. This csv file contains the essential information about the objects spawned on to the

conveyor belt.

When an object passes the break-beam sensor, a message is received on the break beam

sensor change ROS topic by both the Kinect Publisher node and the Gilbreth Monitor

node. This message carries an object-entry timestamp. The Kinect Publisher node assigns

a unique identifier (id) to the object (which is used throughout the processing by all ROS

nodes), and sends this id along with PCD data and the object-entry timestamp on the

kinect points topic. The Gilbreth Monitor node subscribes to this topic, and is thus

able to associate the object-entry timestamp received on the break beam sensor change

ROS topic with the message received on the kinect points topic.
4.4 Experiment on application performance 47

The Gilbreth Monitor ROS node subscribes to ROS topics published by the Object

Segmentation, Object Recognition, and Tool Planner ROS nodes. For each processed

object, each of these nodes sends a message that carries a record of the time instant at which

the node completed its computation, and whether or not the node successfully completed

its processing actions on the object. The Gilbreth Monitor ROS node saves all this

information in a second csv file.

Collection of information about the processing of an object by the Robot Execution

ROS node is done using a different method. Instead of sending information to the Gilbreth

Monitor ROS node, the Robot Execution node records object-processing information itself

in its own csv file, due to implementation challenges. Specifically, the Robot Execution

node records information about three potential types of failure that could occur in the robot

execution process: (i) motion planning failure, (ii) assignment failure, and (iii) grasping

failure.

In summary, a combination of three csv files, two collected by the Gilbreth Monitor

ROS node and one collected by the Robot Execution ROS node, hold all the necessary

data to determine the overall application pick-and-sort performance.

Experimental Execution The process launch file was modified to automatically launch

the Gilbreth Monitor ROS node while launching all other ROS nodes. The Gilbreth

environment launch file and the process launch file were executed to launch the complete

application.

Table 4.5 lists the configured experimental parameters. Five different object types were

randomly spawned in the application. We evaluated the pick-and-sort success rate for six

different inter-object spawning periods, ranging from 4 seconds to 14 seconds.

To determine the number of objects needed for the pick-and-sort evaluation experiments,

we first ran the whole application with each of the six inter-object spawning periods until

200 objects had been processed. Our assumption was that, for each spawning period, the

pick-and-sort success rate would vary initially, e.g., the rate could be 0 or 100% after the

first object, 0, 50 or 100 % after the second object, etc. We expected the rate to settle into
Chapter 4 Gilbreth Prototype Improvements and Evaluation 48

a steady-state value after a certain number of objects had been processed. The number of

objects needed to reach this steady state could depend on the inter-object spawning period.

After we determined the number of objects that needed to be spawned for each setting of

the inter-object spawning period, we ran the experiment 15 times per inter-object spawning

period. The average pick-and-sort success rate and multiple failure rates were computed

from the data in the recorded csv files.


Parameters Value
Spawning period [4, 6, 8, 10, 12, 14] sec
Time Variance 1 sec
Object yaw variance 180◦
Object lateral variance 0.2 m
Object type 5
Number of object spawned 200
Number of experiments per spawn period 15
Conveyor belt velocity 0.15 m/s

Table 4.5: Values of experimental parameters for evaluating application performance

Experimental Results Since the inter-object spawning period was random, with a

specified mean value, e.g., 7, with a range, e.g., (6, 8), we were interested in understanding

the distribution of this period. Therefore, we plotted the observed inter-object spawning

periods in a histogram. Specifically, we used the settings shown in Table 4.6, and observed

the histogram shown in Fig. 4.5. The shape appears to be a truncated normal distribution.

Parameters Value
Spawning period 7 sec
Time Variance 1 sec
Number of object spawned 700

Table 4.6: Inter-object spawning period parameters

Fig. 4.6 shows the results of our experiment to determine the minimum number of objects

required for each inter-object spawning period at which the pick-and-sort success rate reaches

a steady-state value. Our finding is that the convergence to steady state occurs after roughly

150 objects are spawned for most of the inter-object spawning periods considered. Therefore,

we decided to spawn 200 objects for each experiment. For each inter-object spawning period,

15 experiments were conducted, and the results were recorded and analyzed.
4.4 Experiment on application performance 49

Figure 4.5: Inter-object spawning time distribution

Figure 4.6: Pick-and-sort success rates for different inter-object spawning periods

Fig. 4.7 presents our overall pick-and-sort performance results. The green bars show the

overall success rate, which increases with the mean inter-object spawning period. At its

largest setting of 14 sec, the success rate is 71.3%. In addition, this graph provides rates for

five different types of failure: (i) recognition failure, (ii) tool pose failure, (iii) excess-load

failure, (iii) robot-execution failure, and (iv) grasping failure.

The average object-recognition failure rate for an inter-object arrival time longer than

6 seconds was lower than 0.2%. When the inter-object spawning period was 4 seconds,

the object-recognition failure rate was significantly higher at 6.95%. The likely cause of

this failure is that the break-beam sensor trigger was missed. When the mean inter-object

spawning period was 4 seconds, the range was (3, 5) seconds. The velocity of the conveyor
Chapter 4 Gilbreth Prototype Improvements and Evaluation 50

belt was 0.15 meters per second, which means the minimum inter-object distance was 0.45

meters. If two big objects were spawned successively on to the conveyor belt within a very

short time interval (e.g., 3 seconds), there could be no space between the objects, and

therefore the break beam sensor would only be triggered once. As a result, the sensor would

fail to detect the second object, which in turn would cause the Kinect Publisher ROS

node to publish inaccurate PCD. Without accurate PCD, there will be an object-recognition

failure.

The next failure type, the tool pose failure, was 0.0% for all inter-object spawning

periods.

The excess-load failure occurs when the robot arm is unable to reach the pick pose in

time, which happens when the object arrival rate is larger than the robot service rate. The

latter is in inverse proportion to the time taken for the robot to pick-and-sort an object. In

other words, if the robot is still processing an object when a second object passes under its

arm position, the robot will be unable to pick up the second object. For mean inter-object

spawning periods smaller than 14 seconds, this failure type is the highest contributor to

overall failure. A robot arm cannot move too fast while maintaining a robust and smooth

trajectory. This imposes a minimum rate at which the robot arm can execute all the

trajectories provided to it by the pick-and-sort application. We define the minimum value of

the robot service time as Trobot , and we define the upper limit of the robot service rate as

Srobot .The object arrival rate is defined as Aobject .

1
Srobot = (4.1)
Trobot

If the object arrival rate Aobject is larger than the maximum robot service rate Srobot , the

robot cannot serve all objects in time. For a pre-defined object arrival rate, decreasing the

service time Trobot increases the service rate Srobot , which will in turn reduce the excess-load

failure rate, and correspondingly increase the pick-and-sort success rate.

The average duration between the time when the object triggers the break-beam sensor

and the time when the robot arm returns to the home pose was 26 seconds. The average

distance between the break-beam sensor location and the position on the belt where the
4.4 Experiment on application performance 51

Figure 4.7: Picking evaluation of the Gilbreth application

object was being picked up was 2.0 meters. Hence the duration that an object is carried on

the conveyor belt before pickup can be estimated:

Lconveyorbelt
tobj conveyor = ≈ 13seconds (4.2)
Vconveyorbelt

Subtracting this conveyor-belt movement time (13 sec) from the total average duration

between the break-beam sensor trigger and the return to home pose for the robot arm (26

sec), we see that on average, only 13 seconds were available to the robot arm for completing

its service for each object. Therefore, when the inter-object spawning period is lower than

13 seconds, this excess-load failure rate increases significantly.

A motion-planning failure occurs when the motion planner, described in Section 4.3,

can not find a valid trajectory. We observed several poses at which the motion-planning

algorithm appears to stall, and is unable to find a valid trajectory for a long duration. Our

observation is that when the motion planner gets stuck in this state, it takes at least 6

object-processing times to recover. If the motion-planning algorithm or its pipeline can be


Chapter 4 Gilbreth Prototype Improvements and Evaluation 52

improved further, these unnecessary motion-plan failures can be avoided to increase the

pick-and-sort success rate.

The gripper grasping failure happens when the robot successfully reaches the pick pose

and enables the gripper, but fails to grasp the object. In other words, the object does not

attach to the gripper. We postulatee that either: (i)the pick pose is not perfectly aligned, or

(ii)the robot-arm end effector may be slightly offset from the desired pose. A more precise

grasping process can decrease this failure rate.

4.5 VoxNet-based object-recognition pipeline

As described in Section 3.4, the Correspondence Grouping (CG) algorithm was used for

object recognition. The object-recognition processing time increases linearly with the

number of object types for the CG algorithm. For users with access to GPU resources,

an alternative 3D object recognition algorithm based on Convolutional Neural Networks

(CNNs), called VoxNet, as described in Section 2.1.3, is a better option. Our experimental

evaluation shows that the object-recognition processing time with VoxNet is significantly

smaller than the CG-algorithm-based object recognition. Because the recognition processing

time of CNN algorithms is smaller compared to the CG algorithm. Section 4.5.1 describes

our implementation of a ROS-compatible-version of the VoxNet software for 3D object

recognition. Section 4.5.2 describes the experimental evaluation we conducted on VoxNet-

based object-recognition ROS node.

4.5.1 Implementation

Voxnet [2], described in Section 2.1.3, is implemented in our application. The CNN object

recognition pipeline, illustrated in Fig. 4.8, works as follows:

The object Segmentation ROS node remains unchanged from the CG pipeline. It

removes the background scene and publishes the point cloud data after the segmentation

process.

A voxelizer1 ROS node converts PCD into voxel data needed by the CNN Voxnet
1
https://github.com/dbworth/pcl binvox
4.5 VoxNet-based object-recognition pipeline 53

Figure 4.8: VoxNet based object-recognition pipeline

algorithm, the voxel data is a 32 × 32 × 32 matrix with 0 or 1 as element. The

voxelizer ROS node, written in C++, combines the voxel data and the PCD it receives

into one ROS message and publishes it to a ROS topic.

The CNN ROS node, written in Python, reads voxel data to predict the object type,

packages the predicted object type and the PCD it received into one ROS message,

and publishes the message to a ROS topic. The CNN architecture is described in

Section 2.1.3.

The Alignment Computation ROS node subscribes to the object type and the PCD

as inputs and uses the ICP algorithm to calculate the object pick pose.

4.5.2 Evaluation

These experiments were designed to specifically test the scalability of the VoxNet CNN

algorithm, rather than evaluate the whole VoxNet based object-recognition pipeline.

Experimental Setup A single host was used in the experiment. This host had a single

4-core (8-thread) CPU (Intel Core i7-4710MQ@2.50GHz), 12 GB RAM and one Nvidia

GeForce GTX 850M graphic card with 2GB GPU memory. The OS used was Ubuntu 16.04.5

LTS with kernel version 4.13.0-29-generic. The CUDA compiler version used was 8.0.61. A

command-line toolset pcl binvox2 was used to convert the PCD to binvox data for CNN

computation. The VoxNet CNN implementation and its dependent packages, Theano3 ,

Lasagne4 , scikit-learn5 and libgpuarray6 were downloaded, installed and configured.


2
https://github.com/dbworth/pcl binvox
3
http://deeplearning.net/software/theano/
4
https://lasagne.readthedocs.io/en/latest/index.html
5
http://scikit-learn.org/stable/
6
http://deeplearning.net/software/libgpuarray/index.html
Chapter 4 Gilbreth Prototype Improvements and Evaluation 54

Experimental Execution We ran two sets of experiments: Set1 with 5 object types,

and Set2 with 13 object types. In each set, we ran multiple experiments by varying the

number of PCDs used per object type for CNN model training. We then measured the

training time required for each setting of the number of PCDs per object-type. We also

measured the precision score achieved by the VoxNet CNN algorithm when executed on the

test PCD files.

For the experiments, we required a dataset of PCD files. We generated this dataset

using the Gilbreth application. Specifically, for each object type, 300 objects were randomly

spawned on to the conveyor belt. The Gazebo-simulated Kinect Camera captured PCDs

of these 300 objects for each object type. These PCDs were then processed by the Object

Segmentation ROS node. The output filtered PCDs were saved to disk. The PCD files for

each object type were assigned unique sequential identifiers (e.g., ID 1, ID 2, up to ID 300).

Tables 4.7 and 4.8 show the parameters used in the two experiment sets. Since the

highest number of training PCDs per object type in both experiment sets was 200, the first

200 PCDs for each object type was set aside for training, and the test data PCDs were drawn

from the last 100 PCDs (i.e., PCDs with IDs 201 to 300). When the number of training

PCD files per object type was less than 200, not all the PCDs were used. For example, when

this number was 25, then only PCDs with IDs from 1 to 25 for each object type were used

for CNN training, and PCDs with IDs from 26 to 200 remained unused in that experiment.

Parameter Value
Number of object types 5
Number of PCDs per object type in train- 25, 50, 100, 150, 200
ing data set
Number of PCDs per object type in test- 100
ing data set

Table 4.7: Experiment Set1 parameters

Experimental Results Fig. 4.9 shows the Set1 experimental results. The time required

to train the CNN model increases linearly with the number of training PCDs per object type.

Before we ran these experiments, we hypothesized that in order to have a high recognition

success rate, we would require 100 PCD files per object type. However, the experimental
4.5 VoxNet-based object-recognition pipeline 55

Parameter Value
Number of object types 13
Number of PCDs per object type in train- 20, 25, 30, 35, 50, 100, 150, 200
ing data set
Number of PCDs per object type in test- 100
ing data set

Table 4.8: Experiment Set2 parameters

Figure 4.9: Experiment Set1 results: 5 object types

results show that with 5 object types, a set of 25 PCDs per object type was sufficient

to achieve 99.8% recognition success rate, and with 50 PCDs per object, the success rate

reached 100%.

Fig. 4.10 shows that when the number of object types was increased, the number of

training PCDs per object type required to ensure high recognition-success rates correspond-

ingly increased. For example, when the number of object types was increased from 5 to

13, with 25 PCDs per object type, the recognition-success rate fell from 99.8% to 98%. To

achieve 99.8% with 13 object types, we found that somewhere between 150 to 200 PCDs

per object type were required. The training time correspondingly increases significantly.

For example, with 5 object types, training time with 25 PCDs per object type (with which

99.8% recognition-success rate was achieved) was 0.2 hours (12 mins). In comparison, with

13 object types, training time with 150 PCDs and 200 PCDs was 3.27 and 3.96 hours,

respectively. In other words, to achieve the same 99.8% recognition-success rate with 13

object types, the time required for training was more than 3 hours.

Our UTD collaborator compared the object-recognition times with CG vs. with VoxNet.
Chapter 4 Gilbreth Prototype Improvements and Evaluation 56

Figure 4.10: Experiment Set2 results: 13 object types

The average time across five object types was 2.428 sec with CG but only 0.231 sec with

VoxNet.

Our conclusion is while CNN-based object-recognition saves processing time within the

run-time operation of the Gilbreth application when compared the CG algorithm, the cost

of CNN-based object-recognition is that it requires significant compute cycles for training.

Given that this training can be done offline, the extensive resources of cloud computing can

be leveraged.

4.6 Conclusions

The main conclusions drawn from the work presented in this chapter are as follows. First, we

found that the improvements made to the object-recognition pipeline allowed for a significant

decrease (reaching as high as 62%) in processing time for some objects with negligible impact

on object-recognition success rate for some objects. Second, when the CG algorithm used in

object recognition was replaced by a Convolution Neural Network based algorithm called

VoxNet, the object recognition time was reduced even further. But the VoxNet solution has

a cost, which is high training time, on the order of hours. Third, there were multiple points

in the Gilbreth processing when failures can occur. Overall success rate of pick-and-sort

operation was only 71.3%, which was achieved with the maximum value of mean inter-object

spawning period that we tested, i.e., 14 s. Even though this time is longer than the average

time required for the robot arm to service an object, which was 13 s, there was a 9.95%
4.6 Conclusions 57

excess-load failure rate. The remainder, approximately 19%, was due to motion failure and

grasping failures Therefore, we conclude that each component of the Gilbreth code has room

for improvement to decrease failure rate.


Chapter 5

Conclusions and Future Work

5.1 Conclusions

In this thesis, we presented a flexible-manufacturing pick-and-sort industrial robotics ap-

plication that we named Gilbreth. The factory environment for which this application is

developed has a conveyor belt on which industrial parts are moved to the workcell of a UR10

robot arm (with seven degrees of freedom) for sorting. Typically industrial robots handle

high volumes of a low-mix set of objects; in contrast, Gilbreth enables robots to handle

low-volume high-mix sets of industrial parts.

To support Gilbreth, two sensors are used: a break-beam sensor to sense an object

arriving on the conveyor belt, and a 3D Kinect camera to capture RGB-D images of the

objects. Object recognition is used to identify the object type, and pose of the arriving

object. Motion planning algorithms are used to generate trajectories to move the robot arm

between five poses: home, pick approach, pick, pick retreat, and place. The robot arm waits

at the home pose, picks up the object using a vacuum gripper (end effector) that is attached

to the end of the robot arm, moves on a linear actuator rail to the bin associated with the

identifed object type, places the object in the bin, and returns to home base.

Not having access to a real factory environment, we used Gazebo, a robotics simulation

package. We then implemented a Gilbreth module within Gazebo to simulate our particular

factory environment, along with a set of ROS/ROS-I nodes to perform object segmentation,

object recognition, motion planning, and robot execution, among other functions. The ROS

58
5.1 Conclusions 59

and ROS-I frameworks have created a reusable base of software components. This allowed us

to develop the Gilbreth integrated application faster than would have been possible without

these frameworks. Also, importantly, ROS/ROS-I allows for distributed implementations of

applications, which in turn makes it easy to leverage cloud computing.

Experiments were then conducted to evaluate Gilbreth, and measurements were obtained.

With the original prototoype, object recognition and robot execution consumed the most

amount of time. Since robot execution is limited by the mechanical constraints of how fast

the robot arm joints can be moved, we improved the object recognition code in multiple

ways: (i) changing configuration parameters; (ii) creating a pipeline in which downsampling

was used to decrease processing time but without compromising object recognition time;

and (iii) replacing the Correspondence Grouping algorithm with a new Convolution Neural

Network (CNN) based algorithm called VoxNet (this requires GPU machines). We also

improved the motion-planning pipeline.

Our first conclusion from our evaluation effort is that cloud computing can be leveraged

in many ways by an application such as Gilbreth. Instead of requiring every factory to install

its own datacenter, high-speed networks can be used to move point cloud data from factory

floors where sensors such as 3D Kinect cameras are mounted to cloud data centers to run

object-recognition, motion planning and some of the other ROS-I nodes. In an experimental

evaluation of the object-recognition pipeline, we found a reduction of processing times by

using a 2-server testbed instead of a single server, demonstrating the value of our distributed

ROS-I implementation. When the CG algorithm used in object recognition was replaced by

the CNN VoxNet algorithm, the object recognition time was reduced significantly. But the

VoxNet solution has a cost, which is high training time, that was measured to be on the

order of hours. Given that this training can be done offline, the extensive resources of cloud

computing can be leveraged.

Our second conclusion is that given mechanical constraints, robot-arm joints can only

be moved at a certain rate. Using cloud computing resources, the processing times of even

complex operations such as object recognition and motion planning can be reduced to small

fractions of the robot trajectory-execution times. Therefore, to improve overall productivity,

e.g., the rate at which objects are picked up and sorted (which can be done by reducing
Chapter 5 Conclusions and Future Work 60

inter-object arrival times and/or increasing the speed of the conveyor belt), a multi-robot

platform is required.

Our third conclusion is that the overall pick-and-sort success rate can be improved with

additional enhancements to the vacuum-gripper grasping process and the motion-planning

process.

5.2 Future work

Future work items include the following: (i) designing a more flexible pick pose estimation

for finer adjustments in the grasping process, (ii) developing a more efficient motion planning

algorithm, (iii) designing and implementing a multi-robot pick-and-sort application, and (iv)

further testing of the VoxNet object recognition process.

Lack of flexibility is a weakness in the current pick-pose alignment process, which leads

to lowered precision in the grasping process. In our Gilbreth application, we define one pick

pose for each object type and save the pick pose in a data set. For each identified object, we

use the ICP algorithm to match the saved pick pose with the pose of the arriving object,

and compute the output pick pose. For more agile robotics applications, autonomous pick

pose estimation is better than our current approach. Makhal et al. proposed a real-time

algorithm to compute the grasping pose for unknown-object by super-quadric representations

based on point cloud data [35]. This algorithm could be a good approach because: (i) it is

suitable for single-view objects, (ii) it is a real-time approach, and (iii) it has a good success

rate for simple-geometry objects. Fine control in the grasping pipeline is required in order

to reach a pick pose with high accuracy. A majority of the grasping failures were caused

by inaccurate robot-arm control, which resulted in the actual ending pose being slightly

deviated from the target pose. The deviation causes the vacuum gripper to miss making

contact with the object. A fine-control adjustment in the pipeline can solve this problem

and increase the grasping success rate of Gilbreth.

To decrease the motion-planning failure rate, improvements are required for more efficient

motion planning. In our implementation, when the robot arm reach certain poses, the

motion-planning algorithm has a higher probability of generating successive incorrect (weird)


5.2 Future work 61

trajectories, which results in a sequence of motion-planning failures. Manually configuring

the planning algorithm to avoid certain poses is one choice; however, this approach will likely

not work in a scaled-up factory environment. State-of-the-art motion planning algorithms

should be studied to determine a better approach.

To increase pick-and-sort rates in a factory setting, a multi-robot system is required.

Multi-robot systems require more computational resources and network bandwidth, which

makes such an application a good candidate for an industrial cloud robotics study. First-in-

first-out (FIFO) and shortest-sorting-time (SST) methods have been studied for multi-robot

coordination. Bozma et al. [22] proposed a multi-robot coordination algorithm for a conveyor-

belt pick-and-place task based on non-cooperative game theory. Yu et al. [36] improved

the SST rule for multi-robot coordination and presented a second-shortest-sorting-time

(SSST) rule, which showed optimal pick-and-sort results in conveyor-belt settings. They

also pointed out that in a two-robot environment, arranging two robot arms on different

sides of the conveyor belt leads to a higher pick-and-sort rate than mounting two robot arms

side-by-side.

Finally, our CNN based VoxNet models need further testing, and an online training and

testing pipeline needs to be implemented. Experiments showed that as the number of object

types increases, the number of point cloud images per object type required to train the CNN

model increases. If we place the data storage and run training computations on the cloud,

we need to design a pipeline to automatically adjust the size of the training data sets and

update the object-recognition parameters.


Bibliography

[1] Ioan A. Sucan and Sachin Chitta. “MoveIt!”. [Online]Available:https://moveit.


ros.org.

[2] D. Maturana and S. Scherer. VoxNet: A 3d convolutional neural network for real-time
object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), pages 922–928, Sept 2015.

[3] NIST. Agile robotics for industrial automation competition, September


2016. https://www.nist.gov/el/intelligent-systems-division-73500/
agile-robotics-industrial-automation-competition.

[4] ROS-I. The challenge: Transitioning robotics R&D to the factory floor, May 2017.
[Online]Available:https://rosindustrial.org/the-challenge/.

[5] Robohub. Amazon picking challenge, August 2017. [Online]Available:https:


//robohub.org/tag/amazon-picking-challenge/.

[6] D. Buchholz, S. Winkelbach, and F. M. Wahl. RANSAM for industrial bin-picking.


In ISR 2010 (41st International Symposium on Robotics) and ROBOTIK 2010 (6th
German Conference on Robotics), pages 1–6, June 2010.

[7] Morgan Quigley, Ken Conley, Brian P. Gerkey, Josh Faust, Tully Foote, Jeremy
Leibs, Rob Wheeler, and Andrew Y. Ng. ROS: an open-source robot operating system.
In ICRA Workshop on Open Source Software, 2009.

[8] Ioan A. Şucan, Mark Moll, and Lydia E. Kavraki. The Open Motion Planning Library.
IEEE Robotics & Automation Magazine, 19(4):72–82, December 2012. http://ompl.
kavrakilab.org.

[9] J. J. Kuffner and S. M. LaValle. Rrt-connect: An efficient approach to single-


query path planning. In Proceedings 2000 ICRA. Millennium Conference. IEEE
International Conference on Robotics and Automation. Symposia Proceedings (Cat.
No.00CH37065), volume 2, pages 995–1001 vol.2, April 2000.

[10] M. Moll, I. A. Sucan, and L. E. Kavraki. Benchmarking motion planning algorithms:


An extensible infrastructure for analysis and visualization. IEEE Robotics Automation
Magazine, 22(3):96–102, Sept 2015.

[11] MoveIt! Benchmarking: Modern tools for motion planning, 2013. [Online]
Available:http://moveit.ros.org/assets/pdfs/2013/icra2013tutorial/
ICRA13_Benchmark.pdf.

62
Bibliography 63

[12] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng,
and Trevor Darrell. Decaf: A deep convolutional activation feature for generic visual
recognition. In International conference on machine learning, pages 647–655, 2014.

[13] Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. Learning and transferring
mid-level image representations using convolutional neural networks. In Proceedings
of the IEEE conference on computer vision and pattern recognition, pages 1717–1724,
2014.

[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with
deep convolutional neural networks. In Advances in neural information processing
systems, pages 1097–1105, 2012.

[15] Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. CNN
features off-the-shelf: an astounding baseline for recognition. CoRR, abs/1403.6382,
2014.

[16] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and
Yann LeCun. Overfeat: Integrated recognition, localization and detection using
convolutional networks. CoRR, abs/1312.6229, 2013.

[17] M. Schwarz, H. Schulz, and S. Behnke. Rgb-d object recognition and pose estimation
based on pre-trained convolutional neural network features. In 2015 IEEE Inter-
national Conference on Robotics and Automation (ICRA), pages 1329–1335, May
2015.

[18] Sebastian Thrun. Learning occupancy grid maps with forward sensor models. Au-
tonomous robots, 15(2):111–127, 2003.

[19] Hans Moravec and A. E. Elfes. High resolution maps from wide angle sonar. In
Proceedings of the 1985 IEEE International Conference on Robotics and Automation,
pages 116 – 121, March 1985.

[20] A. Cowley, B. Cohen, W. Marshall, C. J. Taylor, and M. Likhachev. Perception


and motion planning for pick-and-place of dynamic objects. In 2013 IEEE/RSJ
International Conference on Intelligent Robots and Systems, pages 816–823, Nov
2013.

[21] Yanjiang Huang, Ryosuke Chiba, Tamio Arai, Tsuyoshi Ueyama, and Jun Ota. Robust
multi-robot coordination in pick-and-place tasks based on part-dispatching rules.
Robotics and Autonomous Systems, 64(Supplement C):70 – 83, 2015.

[22] H. Il Bozma and M.E. Kalalolu. Multirobot coordination in pick-and-place tasks on a


moving conveyor. Robotics and Computer-Integrated Manufacturing, 28(4):530 – 538,
2012.

[23] E. Guizzo. Robots with their heads in the clouds. IEEE Spectrum, 48(3):16–18, March
2011.

[24] D. Lorencik and P. Sincak. Cloud robotics: Current trends and possible use as a
service. In 2013 IEEE 11th International Symposium on Applied Machine Intelligence
and Informatics (SAMI), pages 85–88, Jan 2013.
Bibliography 64

[25] G. Hu, W. P. Tay, and Y. Wen. Cloud robotics: architecture, challenges and applica-
tions. IEEE Network, 26(3):21–28, May 2012.

[26] Siavash Rastkar, Diego Quintero, Diego Bolivar, and Sabri Tosunoglu. Empowering
robots via cloud robotics: Image processing and decision making boebots. In Pro-
ceedings of the Florida Conference on Recent Advances in Robotics, Boca Raton, FL,
USA, pages 10–11, 2012.

[27] Vijay Kumar and Nathan Michael. Opportunities and challenges with autonomous
micro aerial vehicles. The International Journal of Robotics Research, 31(11):1279–
1291, 2012.

[28] B. Kehoe, S. Patil, P. Abbeel, and K. Goldberg. A survey of research on cloud


robotics and automation. IEEE Transactions on Automation Science and Engineering,
12(2):398–409, April 2015.

[29] B. Kehoe, A. Matsukawa, S. Candido, J. Kuffner, and K. Goldberg. Cloud-based


robot grasping with the google object recognition engine. In 2013 IEEE International
Conference on Robotics and Automation, pages 4263–4270, May 2013.

[30] Raffaello D’Andrea. Guest editorial: A revolution in the warehouse: A retrospective


on kiva systems and the grand challenges ahead. IEEE Transactions on Automation
Science and Engineering, 9(4):638–639, 2012.

[31] R. Rahimi, C. Shao, M. Veeraraghavan, A. Fumagalli, J. Nicho, J. Meyer, S. Ed-


wards, C. Flannigan, and P. Evans. An industrial robotics application with cloud
computing and high-speed networking. In 2017 First IEEE International Conference
on Robotic Computing (IRC), pages 44–51, April 2017.

[32] F. Tombari and L. Di Stefano. Object Recognition in 3D Scenes with Occlusions and
Clutter by Hough Voting. In 2010 Fourth Pacific-Rim Symposium on Image and Video
Technology, pages 349–355, Nov 2010.

[33] Radu Bogdan Rusu and Steve Cousins. 3D is here: Point Cloud Library (PCL).
In IEEE International Conference on Robotics and Automation (ICRA), Shanghai,
China, May 9-13 2011.

[34] Mark Berman, Jeffrey S. Chase, Lawrence Landweber, Akihiro Nakao, Max Ott,
Dipankar Raychaudhuri, Robert Ricci, and Ivan Seskar. Geni: A federated testbed for
innovative network experiments. Computer Networks, 61:5 – 23, 2014. Special issue on
Future Internet Testbeds Part I.

[35] A. Makhal, F. Thomas, and A. P. Gracia. Grasping unknown objects in clutter by


superquadric representation. In 2018 Second IEEE International Conference on
Robotic Computing (IRC), pages 292–299, Jan 2018.

[36] C. Yu, X. Liu, F. Qiao, and F. Xie. Multi-robot coordination for high-speedpick-and-
place tasks. In 2017 IEEE International Conference on Robotics and Biomimetics
(ROBIO), pages 1743–1750, Dec 2017.

You might also like